Dynamic Connectivity

Goal: Given a series of pairwise connectedness declarations, determine if two items are connected.

Two operations:

  • connect(p, q): connect items p and q
  • isConnected(p, q): return if p and q are connected

Note: There’s no way to add items, or unlink items.

The Disjoint Sets ADT

public interface DisjointSets {
  /** Connects two items P and Q. */
  void connect(int p, int q);

  /** Checks to see if two items are connected. */
   boolean isConnected(int p, int q);

Goal: Design an efficient DisjointSets implementation.

  • Number of elements N can be huge.
  • Number of method calls M can be huge.
  • Calls to methods may be interspersed.

The Naive Approach

  • Connect two things: Record every single connecting line in some data structure.
  • Cheking connectedness: do some sort of iteration over the lines to see if one thing can be reached from another.

It turns out this approach is more complex than it needs to me.

Better Approach – Connected Components

Rather an writing down every single line, we can probably just record down which “club” each item belongs to. We don’t have to explicitly state that 1, 3 are connected, we can have a way to denote that they belong to the same set.

For example, {0, 1, 3, 5}, {2, 4}, {6}, {7}, {8}

Downside: We can’t disconnect, but that’s not something we’re trying to do here.

Quick Find

Question: What data structure should we pick to support the tracking of sets?

Solution: Using an array, each index represent an item, and the value of the index represent the group in which it is in.

  • Use int[] id where the $i^{\text{th}}$ entry is the set number of the item i
  • connect(p, q): Change entries that equal id[p] to id[q]

Downside: Connecting two items takes linear time, which is too slow.

Quick Union

Idea: Assign each node a parent (instead of an id).

Before connect

Suppose we want to connect(5, 2), what is the array going to end up being?

We could change the boss of 5 to 2. However, this makes the tree more spindly.

After connect

So, we should change the boss of 5 to the boss of 2 instead, like this:

Better way to connect

And this is the Quick Union approach.


  • Finding the boss is expensive, especially for more spindly trees.
  • For N items, the worst case runtime for connect(p, q) is \(\Theta(N)\).
  • The worst case runtime for isConnected(p, q) is \(\Theta(N)\).

Is there any way we can avoid getting tall, spindly trees? Yes there is!

Weighted Quick Union


  • Track tree size (number of elements)
  • When connecting, link root of smaller tree to larger tree.

One question to ponder: You might be thinking, why not link the root of the shorter tree to the taller tree (Heighted Quick Union)? It turns out, both works, and both give the same performance, but weighing by size is easier to implement.

Implementing Weighted Quick Union

Very few changes need to be made based on Quick Union. We only need another int[] array called size, and change connect(p, q) to update size[] every time we connect.

public void connect(int p, int q) {
	int i = find(p);
	int j = find(q);
	if (i == j) return;
	if (size[i] < size[j]) { parent[i] = j; size[j] += size[i]; }
	else { parent[j] = i; size[i] += size[j]; }

This new approach still requires time that is proportional to the depth of the tree.

However, what’s amazing is that with this approach, the maximum depth of any item is always \(\log N\).

Implementation Constructor connect isConnected
QuickFindDS \(\Theta(N)\) \(\Theta(N)\) \(\Theta(1)\)
QuickUnionDS \(\Theta(N)\) \(\Theta(N)\) \(\Theta(N)\)
WeightedQuickUnionDS \(\Theta(N)\) \(\Theta(\log N)\) \(\Theta(\log N)\)

Performing M operations on a DisojointSet object with N elements:

  • Runtime ranges from \(O(MN)\) to \(O(N + M log N)\).

Path Compression: A Clever Idea

Idea: Whenever we do isConnected(p, q), as we go, we tie all the nodes we’ve seen to the root.

In other words, when doing any kind of query involving a node, set that node’s parent to the root.

This results in a union/connected operation that is very very close to amortized constant time.

  • M operations on N nodes is \(O(N + M \log^* N)\) (\(\log^*\) is the Iterative Logarithm)
  • \(\log^*\) is less than 5 for any realistic input.
  • A tighter bound: \(O(N + M \alpha(N))\)

The end result of our iterative design process is the standard

Hey, kudos for making it this far! If you've liked this, you might also like Sorting.