Every connected undirected graph has a spanning tree. For example, we can generate one by doing DFS and keeping the “tree edges”.
But let’s consider:
- A connected, undirected graphs with weighted edges
- For example, a road network connecting cities, where the weights are the road segment length in miles
- We have a spanning tree, because we’re a connected, undirected graph.
- And with the weights, there must be some spanning tree with a minimum weight.
Why find a minimum spanning tree? Maybe we want to run a rail network next to the road network and connect all cities but run the minimum amount of rail.
How do we find a MST?
First, if we want a unique MST we need to start with unique edge weights. One way to do that: if we have E edges and integer weights then we can number the weights with unique numbers i = 0..(E-1) and then add i/E to the weight of each edge i.
Generic MST algorithm #
- Maintain an acyclic subgraph F of the input graph G, the “intermediate spanning forest”.
- Initially, F contains every vertex but no edges. (Each vertex is a one vertex tree). All those vertices must appear in the MST.
- We connect trees by adding edges.
- When the algorithm halts, we’ve added enough edges to F that it’s the MST.
- We don’t want to add any edges that won’t be in the final MST.
We want to pay attention two kinds of edges:
- An edge that connects two vertices that are already in a (connected) component of F is useless.
- An edge with exactly one endpoint in a component of F is safe if it’s the minimum weight edge with that property.
- Edges that are neither safe nor useless are undecided.
Lemma (Prim): The minimum spanning tree of G contains every safe edge #
-
We’re dealing with an undirected connected graph with unique weights.
-
Let S be an arbitrary subset of the vertices of G.
-
Let e be the lightest edge with exactly one endpoint in S.
-
Let T be an arbitrary spanning tree that does not contain e.
-
To prove: T is not the MST of G.
-
Because T is connected, it contains a path from one end of e to the other.
-
Because the path starts at a vertex in S and ends at a vertex not in S (by def of e), it must contain at least one edge with exactly one endpoint in S; let e' be any such edge.
-
We can swap e for e’, e had the minimum weight and weights are unique, so this reduces the weight. So e’ wasn’t in the MST, so T wasn’t the MST.
Lemma. The MST contains no useless edge #
Proof: Adding a useless edge would introduce a cycle.
Generic algorithm plan #
- Identify a safe edge.
- Add it to F.
- Repeat until F is connected.
Boruvka’s Algorithm #
- Count the component in the graph, and label each vertex with its component.
- For each component, start with a null safe edge.
- For each edge uv, if the edges are in different components:
- If this is lighter than the previous safe edge each component, set it as the safe edge for that component.
- We’ve found a safe edge for each component. Add them all to F and repeat.
Running time:
- We can count and label components in O(|V|) time.
- Our loop loops through all the edges.
- The graph is connected, so we know |V| in O(|E|).
- Each iteration, at worst, cuts the number of components in half.
- So this takes O(|E| log |V|) time.
Jarnik’s (Prim’s) Algorithm #
- Initialize an empty priority queue Q
- Initialize an empty subtree T.
- Take an arbitrary vertex and add it to T.
- Add its edges to Q
- Repeatedly take an edge from Q, and if it has one end in T:
- Add the edge to T.
- Add the edges of the new vertex to Q.
Kruskal’s Algorithm #
- Sort E by increasing weight.
- Add each vertex to a separate set.
- For each edge uv in E:
- If u and v are in different sets, merge (union) the two sets together.
- Add uv to F.
Because we’re looking at edges in increasing weight order, we’ll never add a heavier edge before a ligher conflicting edge.
Sorting edges takes O(|E| log |E|) time, which exceeds the number of loop iterations.
The problems are:
- Determining if u and v are in the same set.
- Merging together two sets and maintaining that collection of easily-merged sets.
We need a specific data structure, called a disjoint set structure or a union-find structure. It needs to store a set of disjoint sets of vertices, and support three operations efficiently:
- Making a new set.
- Finding the set containing an element.
- Merging (unioning) two sets.
How do we do this?
- We store a forest.
- The trees in the forest are trees with pointers going in the leaf -> root direction. Each node has the id and parent pointer.
- To get fast lookup, we store a hash table of node ID to the tree node. We also store the ID in each node.
- To make a new set we check if it’s already in the hash table. If not, we allocate a new node and put it in the hash table.
- Find is a hash table lookup, and then we follow parent pointers to the root of the tree. The root ID is the ID of that component.
- To merge, we set the parent of one root to be the other root.
- To speed up traversals, we reparent every node to the root of that tree during every traversal.
How fast is this? It’s probably obvious that it’s faster than the sorting, but…
Not sure, but let’s introduce some functions.
def logstar(n):
if n <= 1:
return 0
else:
return 1 + logstar(log(n))
def A(m, n):
if m == 0:
return n + 1
if n == 0:
return A(m-1, 1)
return A(m-1, A(m, n-1))
def AA(n):
return A(n, n)
def IA(n):
x = 0
while AA(x) < n:
x += 1
return x