On this page:
17.3.1 Implementing with State
17.3.2 Optimizations
17.3.3 Analysis

17.3 Union-Find

    17.3.1 Implementing with State

    17.3.2 Optimizations

    17.3.3 Analysis

We have previously [Checking Component Connectedness] seen how to check connectedness of components, but found that solution unsatisfactory. Recall that it comes down to two set operations: we want to construct the unions of sets, and then determine whether two elements are in the same set.

We will now see how to do this using state. We will try to keep things as similar to the previous version as possible, to enhance comparison.

17.3.1 Implementing with State

First, we have to update the definition of an element, making the parent field be mutable:

data Element:
  | elt(val, ref parent :: Option<Element>)
end

To determine whether two elements are in the same set, we will still rely on fynd. However, as we will soon see, fynd no longer needs to be given the entire set of elements. Because the only reason is-in-same-set consumed that set was to pass it on to fynd, we can remove it from here. Nothing else changes:

fun is-in-same-set(e1 :: Element, e2 :: Element) -> Boolean:
  s1 = fynd(e1)
  s2 = fynd(e2)
  identical(s1, s2)
end

Updating is now the crucial difference: we use mutation to change the value of the parent:

fun update-set-with(child :: Element, parent :: Element):
  child!{parent: some(parent)}
end

In parent: some(parent), the first parent is the name of the field, while the second one is the parameter name. In addition, we must use some to satisfy the option type. Naturally, it is not none because the entire point of this mutation is to change the parent to be the other element, irrespective of what was there before.

Given this definition, union also stays largely unchanged, other than the change to the return type. Previously, it needed to return the updated set of elements; now, because the update is performed by mutation, there is no longer any need to return anything:

fun union(e1 :: Element, e2 :: Element):
  s1 = fynd(e1)
  s2 = fynd(e2)
  if identical(s1, s2):
    s1
  else:
    update-set-with(s1, s2)
  end
end

Finally, fynd. Its implementation is now remarkably simple. There is no longer any need to search through the set. Previously, we had to search because after union operations have occurred, the parent reference might have no longer been valid. Now, any such changes are automatically reflected by mutation. Hence:

fun fynd(e :: Element) -> Element:
  cases (Option) e!parent:
    | none => e
    | some(p) => fynd(p)
  end
end

17.3.2 Optimizations

Look again at fynd. In the some case, the element bound to e is not the set name; that is obtained by recursively traversing parent references. As this value returns, however, we don’t do anything to reflect this new knowledge! Instead, the next time we try to find the parent of this element, we’re going to perform this same recursive traversal all over again.

Using mutation helps address this problem. The idea is as simple as can be: compute the value of the parent, and update it.

fun fynd(e :: Element) -> Element:
  cases (Option) e!parent block:
    | none => e
    | some(p) =>
      new-parent = fynd(p)
      e!{parent: some(new-parent)}
      new-parent
  end
end

Note that this update will apply to every element in the recursive chain to find the set name. Therefore, applying fynd to any of those elements the next time around will benefit from this update. This idea is called path compression.

There is one more interesting idea we can apply. This is to maintain a rank of each element, which is roughly the depth of the tree of elements for which that element is their set name. When we union two elements, we then make the one with larger rank the parent of the one with the smaller rank. This has the effect of avoiding growing very tall paths to set name elements, instead tending towards “bushy” trees. This too reduces the number of parents that must be traversed to find the representative.

17.3.3 Analysis

This optimized union-find data structure has a remarkble analysis. In the worst case, of course, we must traverse the entire chain of parents to find the name element, which takes time proportional to the number of elements in the set. However, once we apply the above optimizations, we never need to traverse that same chain again! In particular, if we conduct an amortized analysis over a sequence of set equality tests after a collection of union operations, we find that the cost for subsequent checks is very small—indeed, about as small a function can get without being constant. The actual analysis is quite sophisticated; it is also one of the most remarkable algorithm analyses in all of computer science.