Since launching CronDose I have created multiple posts on Binary Search Trees, including:
In this guide I’m going to help you to answer the question of: why do binary search trees have to be balanced?
Let’s first walk through a quick review of what a binary search tree is if you’re a little rusty on the topic. A binary search tree is a data structure that allows a program to quickly search through a data set to find a value.
So it looks something like this. And it has a three keys rules associated with it. Including:
- Each node has the ability to have, at most, two child nodes. Which is where it gets the name binary, meaning two.
- The child node on the left has to be less than the parent node.
- The child node on the right has to be greater than or equal to the parent node.
This configuration enables the great performance that BSTs have. Because as you traverse the tree you’re able to break the data collection into smaller and smaller pieces. Allowing a program to ignore irrelevant data.
What is a Balanced Binary Search Tree?
So with that review under our belts, let’s discuss what it means for a BST to be balanced.
In this image we have a small, but balanced, binary search tree. This tree is considered balanced because the difference between heights of the left subtree and right subtree is not more than 1. If that’s a little fuzzy simply look at the right and left hand side of the tree. Notice how the left hand side is only one leaf taller than the right? That means that the tree is balanced.
What is an Unbalanced Binary Search Tree?
Now let’s take a look at what an unbalanced binary search tree looks like. Notice how this tree only has a right hand side. This doesn’t even really look like a tree. However, technically this tree’s right hand side is 6 leaves taller than the left hand side.
Consequently, this results in the tree being unbalanced.
Why do Binary Search Trees Have to be Balanced?
So why do binary search trees have to be balanced? I think the best way to understand the importance is to walk through a base case. And remember that the key reason why a BST offers such great performance is because it allows us to ignore irrelevant values. Thus decreasing the number of comparisons a program has to perform to find a data element.
Let’s look for the value
20 in our unbalanced tree. You’ll notice that this would take
7 comparisons to find the value.
Finally, let’s search for the value
20 in our balanced tree. If you count the comparisons you’ll see that it would only take
3 hops. This means that our search performance increased by over 50% by having a balanced tree compared with a unbalanced structure.
This may not seem like a huge deal for such a small data set. However what if you had millions of elements that you had to search for? The performance gains of a BST are quite significant, which is one of the reasons why binary search trees are considered such a vital tool for computer scientists.
Fixing Unbalanced Trees
With your understanding of why binary search trees need to be balanced, what happens when you have a tree that’s not balanced? There are a number of algorithms that you can use to remedy this issue. Some of the most popular algorithms are:
- Red black trees
- AVL trees
Both of these data structures force a tree to remain balanced, and therefore can guarantee search performance. There have been complete textbook chapters dedicated to these algorithms, so it would be ab it out of scope to cover them in this guide. However I’ve placed links in the show notes to tutorials that can help you walk through them if you want to deepen your knowledge on the subject.
I hope that this has been a helpful guide for answering the question: why do binary search trees have to be balanced? And good luck with the coding interview!