Union-Find

Union-Find (also called Disjoint Set Union or DSU) is a data structure that tracks elements partitioned into non-overlapping sets. It supports two operations efficiently: find which set an element belongs to, and union to merge two sets. It's the backbone of Kruskal's MST algorithm and the fastest way to solve connectivity problems.

What Is Union-Find?#

Imagine a social network where people form groups. Union-Find answers two questions in near-constant time:

Are these two people in the same group? (Find)
Merge these two groups into one. (Union)

Rendering diagram...

Real-World Analogies#

Country Alliances: Countries form alliances (union); checking if two countries are allies (find) should be instant
Network Connectivity: Routers connect to form networks; check if two devices can communicate
Image Processing: Group adjacent pixels of similar color into regions
Equivalence Classes: Group variables that must be equal based on a set of equations

Key Characteristics#

Property	Description
Find(x)	Return the representative (root) of the set containing x
Union(x, y)	Merge the sets containing x and y into one
Near-constant time	Both operations run in O(α(n)) amortized — practically O(1)
Path Compression	Flatten the tree during Find so future lookups are faster
Union by Rank	Attach smaller trees under larger ones to keep trees shallow

Basic Implementation#

The simplest Union-Find uses a parent array where parent[i] points to the parent of element i. The root of a set points to itself.

Basic Union-Find (No Optimizations)

Simple parent-pointer implementation to illustrate the concept

Basic Union-Find without optimizations

The Problem with Basic Union-Find#

Without optimizations, the tree can degenerate into a long chain — making find take O(n) per call. Union the elements 0→1→2→3→...→n and every find(0) walks the entire chain. Two optimizations fix this.

Optimized Union-Find#

Path Compression#

During find, make every visited node point directly to the root. This flattens the tree so future lookups on any of those nodes are O(1).

Rendering diagram...

Union by Rank#

Always attach the shorter tree under the taller tree's root. This prevents the tree from growing unnecessarily tall. The rank is an upper bound on tree height.

Rendering diagram...

Full Optimized Implementation#

With both optimizations, every operation runs in O(α(n)) amortized time, where α is the inverse Ackermann function — a value that is at most 4 for any conceivable practical input size. For all practical purposes, this is constant time.

Optimized Union-Find

Union-Find with path compression and union by rank

Time:O(α(n)) ≈ O(1) amortized per operationSpace:O(n)Best:O(1)Worst:O(α(n)) amortized

Number of Connected Components#

Problem: Given n nodes and a list of edges, find the number of connected components in the graph.

This is the canonical Union-Find application — each union merges two components, and the final count gives the answer.

Rendering diagram...

Number of Connected Components

Count connected components using Union-Find

Count components by processing edges

Detecting Cycles in Undirected Graphs#

Problem: Given an undirected graph, determine if it contains a cycle.

If union(u, v) finds that u and v are already in the same set, the edge (u, v) creates a cycle — both endpoints are already reachable from each other.

Rendering diagram...

Cycle Detection (Undirected Graph)

Detect cycles by checking if union connects already-connected nodes

Time:O(E × α(V))Space:O(V)Best:O(1)Worst:O(E × α(V))

Kruskal's Minimum Spanning Tree#

Problem: Find the minimum spanning tree (MST) of a weighted, connected graph — the subset of edges that connects all vertices with minimum total weight.

Kruskal's algorithm sorts edges by weight and greedily adds the lightest edge that doesn't create a cycle. Union-Find makes the cycle check nearly instantaneous.

Rendering diagram...

Kruskal's MST

Find minimum spanning tree using sorted edges and Union-Find

Time:O(E log E)Space:O(V)Best:O(E log E)Worst:O(E log E)

Number of Islands (Grid Problem)#

Problem: Given a 2D grid of '1's (land) and '0's (water), count the number of islands. An island is surrounded by water and formed by connecting adjacent land cells horizontally or vertically.

While DFS/BFS is the more common approach, Union-Find handles this elegantly and extends naturally to dynamic versions (where cells flip between land and water).

Number of Islands with Union-Find

Count islands by unioning adjacent land cells in a grid

Grid-based Union-Find: flatten 2D index to 1D

Union-Find vs Other Approaches#

Task	Union-Find	DFS/BFS	When to Prefer Union-Find
Connected components	O(E × α(V))	O(V + E)	Edges arrive one at a time (streaming)
Cycle detection (undirected)	O(E × α(V))	O(V + E)	Only need yes/no, not the cycle path
MST (Kruskal's)	O(E log E)	N/A (use Prim's)	Sparse graphs (E ≈ V)
Dynamic connectivity	O(α(n)) per query	O(V + E) per query	Many connectivity queries over time
Grid problems	O(R × C × α(RC))	O(R × C)	Online/dynamic changes to the grid

When to Choose Union-Find#

Online queries: Edges/connections arrive one at a time and you need to answer "connected?" after each
Kruskal's MST: Union-Find is the standard building block
No path needed: You only care about connectivity, not the actual path between nodes
Multiple queries: Many "are X and Y connected?" queries on a static graph — build once, query O(1)

When DFS/BFS Is Better#

Finding actual paths: Union-Find tells you if nodes are connected, not how
Shortest path: Union-Find doesn't track distances
Directed graphs: Standard Union-Find is for undirected connectivity only
One-time traversal: If you only need one traversal, DFS/BFS is simpler

Reviewing AI-Generated Code#

What to Verify#

Issue	What to Check	Common AI Mistake
Missing path compression	find() should update parent pointers	Iterative find without flattening
Missing union by rank	union() should compare ranks	Always attaching to one side
Connected check before union	Avoid redundant unions	Not checking connected() in cycle detection
Off-by-one in grid mapping	2D-to-1D index: row * cols + col	Using row * rows + col
Not tracking component count	Decrement count on successful union	Forgetting to update count

Questions to Ask AI#

When AI generates Union-Find code, verify:

"Does find use path compression?"
"Does union use rank (or size) to keep trees balanced?"
"Is the 2D-to-1D index mapping correct for grid problems?"
"Is the component count maintained correctly?"

Summary#

Application	Time	Key Insight
Connected Components	O(E × α(V))	Each union merges two components; final count is the answer
Cycle Detection	O(E × α(V))	If union finds endpoints already connected, edge creates a cycle
Kruskal's MST	O(E log E)	Sort edges by weight; greedily add if no cycle
Grid Islands	O(R×C × α(RC))	Flatten 2D to 1D; union adjacent land cells

Key Takeaways#

Two optimizations are essential — path compression and union by rank together give O(α(n)) ≈ O(1) per operation; without them, worst-case is O(n)
Union-Find answers "connected?" not "how?" — it tracks set membership, not paths. If you need the actual path between nodes, use BFS/DFS
Cycle detection in one line: if connected(u, v) is true before union(u, v), you found a cycle
Kruskal's = sort + Union-Find — sort edges by weight, add each edge if it doesn't create a cycle, stop at n-1 edges
Grid problems flatten 2D to 1D — map (row, col) to row * cols + col and subtract water cells from the total component count

PreviousTwo Pointers

NextGreedy Algorithms