Kruskal’s Minimum Spanning Tree Algorithm

Shreyans Murkute
4 min readApr 10, 2022

Introduction:

Kruskal’s Algorithm is named after its creator Joseph Kruskal and was first theorized in the American Mathematical Society journal, in 1956. The Algorithm aimed to address an important problem back in the day. Given a complex graph system, how to connect all the components or all the vertices in not only using the minimum number of edges but also while minimizing the cost of each edge. The cost of each edge can be arbitrary or even signify a real-world implication. One of the most common applications of Kruskal’s Algorithm to find an MST (Minimum Spanning Tree), is in the telecommunication or cabling industry. Suppose the Cabling company needs to lay down cables to connect all their customers. There can be actual cost related with each “edge” or each connection between 2 customers. Maybe in some place the underground cable has to be buried deeper meaning the cost of laying it down will be higher. Kruskal’s Algorithm achieves the minimization of this Total Cost for the Resultant Tree in an efficient manner.

A Minimum Spanning Tree (MST). All the vertices are included all while minimizing the Total Edge Costs

The Disjoint Set Union:

Kruskal’s Algorithm is a specific algorithm with an extremely specific goal in mind. Find the Minimum Spanning Tree and that’s it. Therefore, it makes sense to actually use a unique Data Structure itself for simplification purposes. It’s known as the Disjoint Set Union or the Union-Find Data Structure. As the name suggests, its centered around the concept of Disjoint Sets. There are 3 main operations, and their basic function is as follows:

1) make_set(v): Is the first and most basic operation. Initializes the Disjoint Sets. The Disjoint sets only holds a singular item, which is itself, initially.

2) union(a, b): Merges two sets, if and only if the merging of the sets holding a and b doesn’t form a cycle.

3) find_set(v) : Keeping Disjoint Sets in memory individually named and labelled is not logical. Instead, a leader is kept in each Set which acts a label or a Parent for the Set. find_set operation aims to find this Leader. This helps us in telling different Disjoint Sets apart and helps us in the union method as well.

The Disjoint Set Union also known as the Union Find Data Structure in some places, helps us in keeping track of the growth of our Disjoint Sets.

Implementation of Kruskal:

First the edges are sorted keeping in mind the edge weight in an ascending order. After doing this, all the edges of the already made graph are forgotten. Then we will pick the edge with the lowest weight. As this is a type of Greedy Algorithm, we will repeat this step over and over again. Initially each vertex of the graph will be its own separate tree. However, when selecting an edge and adding it to the Minimum Spanning Tree, one extra verification step is necessary. We have to check each time, whether adding the edge to the MST will result in formation of a cycle in the Tree. If an edge indeed forms a cycle, then we will reject that edge and straight away move on to the next edge in the sorted order.

A MST Generated using Kruskal’s Algorithm

Let us understand the Kruskal’s Algorithm using the pseudocode of the Disjoint Set Union operations:

KRUSKAL(graph G)

MST = {}

for each vertex
v belonging G.V:
MAKE-SET(v)

for each (u, v) in G.E ordered by weight(u, v), increasing:
if FIND-SET(u) != FIND-SET(v):
add {(u, v)} to set MST
UNION(u, v)

return MST

Analysis of Kruskal’s Algorithm:

The main operation in Kruskal’s Algorithm is the sorting of the edges from a Time Complexity Standpoint. If E arethe number of edges and V are the number of vertices, then sorting the Edges will take O(ELogE) time. The make_set operation will take O(V) time, while the find_set and unite/union operations take at worst O(LogV) time. As the Find and Union operations are repeated for each edge, the overall complexity is O(ELogV). The Overall time Complexity for the whole Kruskal’s Algorithm, taking into consideration the Sorting as well as the Union, and Find Operations if O(ELogV+ ELogE). From the properties of the Graph, we know n(E)<=n(V²), and hence ELogV and ELogE are similar from a time complexity standpoint. Therefore, we can say that the Overall, Final Time Complexity is either O(ELogE) or O(ELogV).

--

--

Shreyans Murkute
0 Followers

Undergrad Student Interested in all things Data and AI