## Dynamic Connectivity

**Goal**: Given a series of **pairwise** connectedness declarations, determine if two items are connected.

Two operations:

`connect(p, q)`

: connect items`p`

and`q`

`isConnected(p, q)`

: return if`p`

and`q`

are connected

Note: There’s no way to add items, or unlink items.

## The Disjoint Sets ADT

```
public interface DisjointSets {
/** Connects two items P and Q. */
void connect(int p, int q);
/** Checks to see if two items are connected. */
boolean isConnected(int p, int q);
}
```

**Goal**: Design an efficient DisjointSets implementation.

- Number of elements
`N`

can be huge. - Number of method calls
`M`

can be huge. - Calls to methods may be interspersed.

### The Naive Approach

- Connect two things: Record every single connecting line in some data structure.
- Cheking connectedness: do some sort of iteration over the lines to see if one thing can be reached from another.

It turns out this approach is more complex than it needs to me.

### Better Approach – Connected Components

Rather an writing down every single line, we can probably just record down which “club” each item belongs to. We don’t have to explicitly state that `1`

, `3`

are connected, we can have a way to denote that they belong to the same set.

For example, `{0, 1, 3, 5}, {2, 4}, {6}, {7}, {8}`

Downside: We can’t disconnect, but that’s not something we’re trying to do here.

### Quick Find

**Question**: What data structure should we pick to support the tracking of sets?

**Solution**: Using an array, **each index represent an item**, and the **value of the index represent the group** in which it is in.

- Use
`int[] id`

where the $i^{\text{th}}$ entry is the set number of the item`i`

`connect(p, q)`

: Change entries that equal`id[p]`

to`id[q]`

**Downside**: Connecting two items takes linear time, which is too slow.

### Quick Union

**Idea**: Assign each node a parent (instead of an id).

Suppose we want to `connect(5, 2)`

, what is the array going to end up being?

We could change the **boss of 5** to **2**. However, this makes the tree more spindly.

So, we should change the **boss of 5** to the **boss of 2** instead, like this:

And this is the Quick Union approach.

**Downsides**:

- Finding the boss is expensive, especially for more spindly trees.
- For N items, the worst case runtime for
`connect(p, q)`

is \(\Theta(N)\). - The worst case runtime for
`isConnected(p, q)`

is \(\Theta(N)\).

Is there any way we can avoid getting tall, spindly trees? Yes there is!

#### Weighted Quick Union

**Idea**:

- Track tree size (number of elements)
- When connecting, link root of
**smaller**tree to**larger**tree.

One question to ponder: You might be thinking, why not link the root of the **shorter** tree to the **taller** tree (*Heighted Quick Union*)? It turns out, both works, and both give the same performance, but weighing by size is easier to implement.

##### Implementing Weighted Quick Union

Very few changes need to be made based on Quick Union. We only need another `int[]`

array called `size`

, and change `connect(p, q)`

to update `size[]`

every time we connect.

```
public void connect(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) return;
if (size[i] < size[j]) { parent[i] = j; size[j] += size[i]; }
else { parent[j] = i; size[i] += size[j]; }
}
```

This new approach still requires time that is proportional to the depth of the tree.

However, what’s amazing is that with this approach, the **maximum depth of any item is always \(\log N\)**.

Implementation | Constructor | `connect` |
`isConnected` |
---|---|---|---|

QuickFindDS | \(\Theta(N)\) | \(\Theta(N)\) | \(\Theta(1)\) |

QuickUnionDS | \(\Theta(N)\) | \(\Theta(N)\) | \(\Theta(N)\) |

WeightedQuickUnionDS | \(\Theta(N)\) | \(\Theta(\log N)\) | \(\Theta(\log N)\) |

Performing M operations on a DisojointSet object with N elements:

- Runtime ranges from \(O(MN)\) to \(O(N + M log N)\).

## Path Compression: A Clever Idea

**Idea**: Whenever we do `isConnected(p, q)`

, as we go, we tie all the nodes we’ve seen to the root.

In other words, **when doing any kind of query involving a node, set that node’s parent to the root**.

This results in a union/connected operation that is **very very close to amortized constant time**.

- M operations on N nodes is \(O(N + M \log^* N)\) (\(\log^*\) is the
**Iterative Logarithm**) - \(\log^*\) is less than 5 for any realistic input.
- A tighter bound: \(O(N + M \alpha(N))\)

The end result of our iterative design process is the standard

Hey, kudos for making it this far! If you've liked this, you might also like Sorting.