Dynamic Connectivity
Goal: Given a series of pairwise connectedness declarations, determine if two items are connected.
Two operations:
connect(p, q)
: connect itemsp
andq
isConnected(p, q)
: return ifp
andq
are connected
Note: There’s no way to add items, or unlink items.
The Disjoint Sets ADT
public interface DisjointSets {
/** Connects two items P and Q. */
void connect(int p, int q);
/** Checks to see if two items are connected. */
boolean isConnected(int p, int q);
}
Goal: Design an efficient DisjointSets implementation.
- Number of elements
N
can be huge. - Number of method calls
M
can be huge. - Calls to methods may be interspersed.
The Naive Approach
- Connect two things: Record every single connecting line in some data structure.
- Cheking connectedness: do some sort of iteration over the lines to see if one thing can be reached from another.
It turns out this approach is more complex than it needs to me.
Better Approach – Connected Components
Rather an writing down every single line, we can probably just record down which “club” each item belongs to. We don’t have to explicitly state that 1
, 3
are connected, we can have a way to denote that they belong to the same set.
For example, {0, 1, 3, 5}, {2, 4}, {6}, {7}, {8}
Downside: We can’t disconnect, but that’s not something we’re trying to do here.
Quick Find
Question: What data structure should we pick to support the tracking of sets?
Solution: Using an array, each index represent an item, and the value of the index represent the group in which it is in.
- Use
int[] id
where the $i^{\text{th}}$ entry is the set number of the itemi
connect(p, q)
: Change entries that equalid[p]
toid[q]
Downside: Connecting two items takes linear time, which is too slow.
Quick Union
Idea: Assign each node a parent (instead of an id).
Suppose we want to connect(5, 2)
, what is the array going to end up being?
We could change the boss of 5 to 2. However, this makes the tree more spindly.
So, we should change the boss of 5 to the boss of 2 instead, like this:
And this is the Quick Union approach.
Downsides:
- Finding the boss is expensive, especially for more spindly trees.
- For N items, the worst case runtime for
connect(p, q)
is \(\Theta(N)\). - The worst case runtime for
isConnected(p, q)
is \(\Theta(N)\).
Is there any way we can avoid getting tall, spindly trees? Yes there is!
Weighted Quick Union
Idea:
- Track tree size (number of elements)
- When connecting, link root of smaller tree to larger tree.
One question to ponder: You might be thinking, why not link the root of the shorter tree to the taller tree (Heighted Quick Union)? It turns out, both works, and both give the same performance, but weighing by size is easier to implement.
Implementing Weighted Quick Union
Very few changes need to be made based on Quick Union. We only need another int[]
array called size
, and change connect(p, q)
to update size[]
every time we connect.
public void connect(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) return;
if (size[i] < size[j]) { parent[i] = j; size[j] += size[i]; }
else { parent[j] = i; size[i] += size[j]; }
}
This new approach still requires time that is proportional to the depth of the tree.
However, what’s amazing is that with this approach, the maximum depth of any item is always \(\log N\).
Implementation | Constructor | connect |
isConnected |
---|---|---|---|
QuickFindDS | \(\Theta(N)\) | \(\Theta(N)\) | \(\Theta(1)\) |
QuickUnionDS | \(\Theta(N)\) | \(\Theta(N)\) | \(\Theta(N)\) |
WeightedQuickUnionDS | \(\Theta(N)\) | \(\Theta(\log N)\) | \(\Theta(\log N)\) |
Performing M operations on a DisojointSet object with N elements:
- Runtime ranges from \(O(MN)\) to \(O(N + M log N)\).
Path Compression: A Clever Idea
Idea: Whenever we do isConnected(p, q)
, as we go, we tie all the nodes we’ve seen to the root.
In other words, when doing any kind of query involving a node, set that node’s parent to the root.
This results in a union/connected operation that is very very close to amortized constant time.
- M operations on N nodes is \(O(N + M \log^* N)\) (\(\log^*\) is the Iterative Logarithm)
- \(\log^*\) is less than 5 for any realistic input.
- A tighter bound: \(O(N + M \alpha(N))\)
The end result of our iterative design process is the standard
Hey, kudos for making it this far! If you've liked this, you might also like Sorting.