17.3 Union-Find
We have previously [Checking Component Connectedness] seen how to check connectedness of components, but found that solution unsatisfactory. Recall that it comes down to two set operations: we want to construct the unions of sets, and then determine whether two elements are in the same set.
We will now see how to do this using state. We will try to keep things as similar to the previous version as possible, to enhance comparison.
17.3.1 Implementing with State
parent field be mutable:
data Element:
  | elt(val, ref parent :: Option<Element>)
endfynd. However, as we will soon see, fynd no
longer needs to be given the entire set of elements. Because the only
reason is-in-same-set consumed that set was to pass it on to
fynd, we can remove it from here. Nothing else changes:
fun is-in-same-set(e1 :: Element, e2 :: Element) -> Boolean:
  s1 = fynd(e1)
  s2 = fynd(e2)
  identical(s1, s2)
endfun update-set-with(child :: Element, parent :: Element):
  child!{parent: some(parent)}
endparent: some(parent), the first parent is the name of
the field, while the second one is the parameter name. In addition, we
must use some to satisfy the option type. Naturally, it is not
none because the entire point of this mutation is to change the
parent to be the other element, irrespective of what was there before.union also stays largely unchanged,
other than the change to the return type. Previously, it needed to
return the updated set of elements; now, because the update is
performed by mutation, there is no longer any need to return anything:
fun union(e1 :: Element, e2 :: Element):
  s1 = fynd(e1)
  s2 = fynd(e2)
  if identical(s1, s2):
    s1
  else:
    update-set-with(s1, s2)
  end
endfynd. Its implementation is now remarkably
simple. There is no longer any need to search through the
set. Previously, we had to search because after union operations have
occurred, the parent reference might have no longer been valid. Now,
any such changes are automatically reflected by mutation. Hence:
fun fynd(e :: Element) -> Element:
  cases (Option) e!parent:
    | none => e
    | some(p) => fynd(p)
  end
end17.3.2 Optimizations
Look again at fynd. In the some case, the element bound
to e is not the set name; that is obtained by recursively
traversing parent references. As this value returns, however,
we don’t do anything to reflect this new knowledge! Instead, the next
time we try to find the parent of this element, we’re going to perform
this same recursive traversal all over again.
fun fynd(e :: Element) -> Element:
  cases (Option) e!parent block:
    | none => e
    | some(p) =>
      new-parent = fynd(p)
      e!{parent: some(new-parent)}
      new-parent
  end
endfynd to
any of those elements the next time around will benefit from
this update. This idea is called path compression.There is one more interesting idea we can apply. This is to maintain a rank of each element, which is roughly the depth of the tree of elements for which that element is their set name. When we union two elements, we then make the one with larger rank the parent of the one with the smaller rank. This has the effect of avoiding growing very tall paths to set name elements, instead tending towards “bushy” trees. This too reduces the number of parents that must be traversed to find the representative.
17.3.3 Analysis
This optimized union-find data structure has a remarkble analysis. In
the worst case, of course, we must traverse the entire chain of
parents to find the name element, which takes time proportional to the
number of elements in the set. However, once we apply the above
optimizations, we never need to traverse that same chain again! In
particular, if we conduct an amortized analysis over a sequence
of set equality tests after a collection of union operations, we find
that the cost for subsequent checks is very small—