Union-Find
Union-Find (also called Disjoint Set Union or DSU) is a data structure that tracks elements partitioned into non-overlapping sets. It supports two operations efficiently: find which set an element belongs to, and union to merge two sets. It's the backbone of Kruskal's MST algorithm and the fastest way to solve connectivity problems.
What Is Union-Find?#
Imagine a social network where people form groups. Union-Find answers two questions in near-constant time:
- Are these two people in the same group? (Find)
- Merge these two groups into one. (Union)
Real-World Analogies#
- Country Alliances: Countries form alliances (union); checking if two countries are allies (find) should be instant
- Network Connectivity: Routers connect to form networks; check if two devices can communicate
- Image Processing: Group adjacent pixels of similar color into regions
- Equivalence Classes: Group variables that must be equal based on a set of equations
Key Characteristics#
| Property | Description |
|---|---|
| Find(x) | Return the representative (root) of the set containing x |
| Union(x, y) | Merge the sets containing x and y into one |
| Near-constant time | Both operations run in O(α(n)) amortized — practically O(1) |
| Path Compression | Flatten the tree during Find so future lookups are faster |
| Union by Rank | Attach smaller trees under larger ones to keep trees shallow |
Basic Implementation#
The simplest Union-Find uses a parent array where parent[i] points to the parent of element i. The root of a set points to itself.
Basic Union-Find (No Optimizations)
Simple parent-pointer implementation to illustrate the concept
The Problem with Basic Union-Find#
Without optimizations, the tree can degenerate into a long chain — making find take O(n) per call. Union the elements 0→1→2→3→...→n and every find(0) walks the entire chain. Two optimizations fix this.
Optimized Union-Find#
Path Compression#
During find, make every visited node point directly to the root. This flattens the tree so future lookups on any of those nodes are O(1).
Union by Rank#
Always attach the shorter tree under the taller tree's root. This prevents the tree from growing unnecessarily tall. The rank is an upper bound on tree height.
Full Optimized Implementation#
With both optimizations, every operation runs in O(α(n)) amortized time, where α is the inverse Ackermann function — a value that is at most 4 for any conceivable practical input size. For all practical purposes, this is constant time.
Optimized Union-Find
Union-Find with path compression and union by rank
Number of Connected Components#
Problem: Given n nodes and a list of edges, find the number of connected components in the graph.
This is the canonical Union-Find application — each union merges two components, and the final count gives the answer.
Number of Connected Components
Count connected components using Union-Find
Detecting Cycles in Undirected Graphs#
Problem: Given an undirected graph, determine if it contains a cycle.
If union(u, v) finds that u and v are already in the same set, the edge (u, v) creates a cycle — both endpoints are already reachable from each other.
Cycle Detection (Undirected Graph)
Detect cycles by checking if union connects already-connected nodes
Kruskal's Minimum Spanning Tree#
Problem: Find the minimum spanning tree (MST) of a weighted, connected graph — the subset of edges that connects all vertices with minimum total weight.
Kruskal's algorithm sorts edges by weight and greedily adds the lightest edge that doesn't create a cycle. Union-Find makes the cycle check nearly instantaneous.
Kruskal's MST
Find minimum spanning tree using sorted edges and Union-Find
Number of Islands (Grid Problem)#
Problem: Given a 2D grid of '1's (land) and '0's (water), count the number of islands. An island is surrounded by water and formed by connecting adjacent land cells horizontally or vertically.
While DFS/BFS is the more common approach, Union-Find handles this elegantly and extends naturally to dynamic versions (where cells flip between land and water).
Number of Islands with Union-Find
Count islands by unioning adjacent land cells in a grid
Union-Find vs Other Approaches#
| Task | Union-Find | DFS/BFS | When to Prefer Union-Find |
|---|---|---|---|
| Connected components | O(E × α(V)) | O(V + E) | Edges arrive one at a time (streaming) |
| Cycle detection (undirected) | O(E × α(V)) | O(V + E) | Only need yes/no, not the cycle path |
| MST (Kruskal's) | O(E log E) | N/A (use Prim's) | Sparse graphs (E ≈ V) |
| Dynamic connectivity | O(α(n)) per query | O(V + E) per query | Many connectivity queries over time |
| Grid problems | O(R × C × α(RC)) | O(R × C) | Online/dynamic changes to the grid |
When to Choose Union-Find#
- Online queries: Edges/connections arrive one at a time and you need to answer "connected?" after each
- Kruskal's MST: Union-Find is the standard building block
- No path needed: You only care about connectivity, not the actual path between nodes
- Multiple queries: Many "are X and Y connected?" queries on a static graph — build once, query O(1)
When DFS/BFS Is Better#
- Finding actual paths: Union-Find tells you if nodes are connected, not how
- Shortest path: Union-Find doesn't track distances
- Directed graphs: Standard Union-Find is for undirected connectivity only
- One-time traversal: If you only need one traversal, DFS/BFS is simpler
Reviewing AI-Generated Code#
What to Verify#
| Issue | What to Check | Common AI Mistake |
|---|---|---|
| Missing path compression | find() should update parent pointers | Iterative find without flattening |
| Missing union by rank | union() should compare ranks | Always attaching to one side |
| Connected check before union | Avoid redundant unions | Not checking connected() in cycle detection |
| Off-by-one in grid mapping | 2D-to-1D index: row * cols + col | Using row * rows + col |
| Not tracking component count | Decrement count on successful union | Forgetting to update count |
Questions to Ask AI#
When AI generates Union-Find code, verify:
- "Does find use path compression?"
- "Does union use rank (or size) to keep trees balanced?"
- "Is the 2D-to-1D index mapping correct for grid problems?"
- "Is the component count maintained correctly?"
Summary#
| Application | Time | Key Insight |
|---|---|---|
| Connected Components | O(E × α(V)) | Each union merges two components; final count is the answer |
| Cycle Detection | O(E × α(V)) | If union finds endpoints already connected, edge creates a cycle |
| Kruskal's MST | O(E log E) | Sort edges by weight; greedily add if no cycle |
| Grid Islands | O(R×C × α(RC)) | Flatten 2D to 1D; union adjacent land cells |
Key Takeaways#
- Two optimizations are essential — path compression and union by rank together give O(α(n)) ≈ O(1) per operation; without them, worst-case is O(n)
- Union-Find answers "connected?" not "how?" — it tracks set membership, not paths. If you need the actual path between nodes, use BFS/DFS
- Cycle detection in one line: if
connected(u, v)is true beforeunion(u, v), you found a cycle - Kruskal's = sort + Union-Find — sort edges by weight, add each edge if it doesn't create a cycle, stop at n-1 edges
- Grid problems flatten 2D to 1D — map
(row, col)torow * cols + coland subtract water cells from the total component count