Skip to main content

cs2381 Notes: 17 Balancing a BST

··3 mins

Balancing a Binary Search Tree

Review:

  • A tree of n items is balanced if its height is O(log n)
  • Alt: For some constant k, the longest path from root to leaf will always be no more than k times as long as the shortest path.
  • We can easily get an unbalanced tree by inserting items in order. Thats bad because operations on the tree will then be O(n).

AVL Tree:

Red-Black Tree:

Scapegoat Tree

Plan:

  • We start with a simple unbalanced BST structure.
  • At the root of the tree, we store two extra values: the size of the tree and the maximum size this tree has been since the last full rebalance.
  • On insert and delete, we detect the tree becoming too unbalanced, and rebalance a portion of the tree to fix it.

How to detect unbalanced tree?

  • For a tree of size n, it’s perfectly balanced if no leaf has a depth greater than log2(n).
  • Let’s say it’s close enough if no leaf has a depth greater than 2*log(n).
  • The only time leaves get deeper is on insertion.
  • So when we insert, we keep track of how deep the new node is, and if it’s deeper than 2*log(n) we say it’s too deep and fix it.
double log2(xx) {
    return Math.log(xx) / Math.log(2.0);
}

int maxDepth(nn) {
    return (int)Math.ceil(2.0*log2(nn));
}

If a node is too deep, how do we fix it?

  • We rebalance a portion of the tree containing that node.
  • We can calculate how unbalanced the subtree rooted at a node is by comparing the size of its left and right subtrees.
  • If size(child) / size(node) > 0.7 * maxSize or so then that’s a good candidate for a subtree to rebalance. We call that node the scapegoat.

How do we rebalance?

  • We rebuild the tree rooted at the scapegoat balanced optimally.
  • First we get all the nodes in order.
  • Then we take the middle one, that’s the new root.
  • The left and right subtrees are rooted at the middle of the left and right ranges.
  • Recursively.

Let’s do an example where we insert numbers in order.

  • Max depth is ceil(2*log(n)), so a rebalance isn’t triggered until n = 7 (maxDepth = 6).
  • The scapegoat is the node with three descendents on one side (3/4 > 0.7)
  • The rebalanced tree is rotated by 1.
  • The next rebalance is triggered at n = 9 (maxdepth = 7).
  • Again, we rotate by 1.
  • Then at n = 9, this time we rebuild back one further.
1
 \
  2
   \
    3
     \
      5
     / \
    4   7
       / \
      6   8
           \
            9
             \
              10

becomes

1
 \
  2
   \
    3
     \
       7
     /   \
    5      9
   / \    / \
  4   7  8  10