Notes

Source: 📖 Problem Solving with Algorithms and Data Structures using Python 7.13


Implementing a binary search tree using python

A binary search tree is implemented through the use of two separate classes – a BinarySearchTree class and a TreeNode class. It is implemented this way because we need to be able to handle cases where the binary search tree is empty (a regular binary tree always needs a root object passed to the constructor).

TreeNode class


class TreeNode:

	def __init__(self, key, value, left_child=None, right_child=None, parent=None):
	self.key = key
	self.value = value
	self.left_child = left_child
	self.right_child = right_child
	self.parent = parent

The TreeNode class represents a node within the binary search tree. The key property is used for making comparisons, while the value property is the data we are interested in storing. For simple data types, like int, these two can be the same thing. However, for more complex datatypes like custom class instances, we can provide a specific key to use for comparisons.

As an example, if we have a Person class with last_name, first_name and date_of_birth properties, we can choose any of these properties to make comparisons by providing it to the key parameter during the initiation of the TreeNode object. We can do something like TreeNode(key=person_obj.last_name, value=person_obj) which will create a node out our person object and our binary search tree will use the last_name property to make comparisons to other objects within the tree for sorting purposes.

the child and parent properties will point to other instances of TreeNode in our binary search tree. They can also point to None, when the node in question is a leaf node or root node respectively.


class TreeNode:

	[...]

	def is_left_child(self):
		return self.parent and self.parent.left_child == self

	def is_right_child(self):
		return self.parent and self.parent.right_child == self

The is_left_child method checks to see if the node has a parent, and if the parent's left child is self, the node making the call. Returns True if both conditions are met, indicating that the node in question exists as the left child of another node. The is_right_child performs the same operations but checks if the node is the right child of another node.


class TreeNode:

	[...]

	def is_root(self):
		return not self.parent
	
	def is_leaf(self):
		return not (self.left_child or self.right_child)
	
	def has_any_children(self):
		return self.left_child or self.right_child
	
	def has_both_children(self):
		return self.left_child and self.right_child

These methods are somewhat similar in operation to the is_left_child method described above – they simply check the condition described in the method name by looking at surrounding nodes (if any).


class TreeNode:

	[...]

	def update_left_child(self, new_node):
		self.left_child = new_node
		if new_node:
			new_node.parent = self
	
	def update_right_child(self, new_node):
		self.right_child = new_node
		if new_node:
			new_node.parent = self

update_left_child first overwrites whatever node is in the calling node's left_child property. Then, if the newly added node is not None (indicating that it is an instance of TreeNode), it sets the parent property of the replacement node to the TreeNode object calling the method. update_right_child does the same thing but for updating the right child.


class TreeNode:

	[...]

	def replace_node_data(self, key, value, left_child, right_child):
		self.key = key
		self.value = value
		self.left_child = left_child
		self.right_child = right_child
		if self.has_left_child():
			self.left_child.parent = self
		if self.has_right_child():
			self.right_child.parent = self

replace_node_data replaces all the calling object's data with the provided values.


class TreeNode:
	
	[...]

	def __iter__(self):
		if self:
			if self.has_left_child():
				for node in self.left_child:
					yield node
			yield self
			if self.has_right_child():
				for node in self.right_child:
					yield node

The __iter__ method allows for Inorder tree traversal through the binary search tree. It is particularly interesting because it is deceptively recursive, as the __iter__ dunder method is implicitly called whenever the for keyword is used.

The method begins by checking that self is not None. It then proceeds to check for the existence of a left child – if a left child exists, a for loop is called on the left child. Here is where the recursion happens, because python will implicitly call the left child's __iter__ method. This recursive step will continue down the left branch until the left-most node is reached.

When the left-most node is reached its left child will also be checked, which will be None. At this point, the left child check fails and the method proceeds to yield self, yielding the left-most node in the tree. Next, the same recursive process is followed on the right child.


class TreeNode:

	[...]

	def splice_out(self):
		if self.is_leaf():
			if self.is_left_child():
				self.parent.left_child = None
			else:
				self.parent.right_child = None
		elif self.has_any_children():
			if self.has_left_child():
				if self.is_left_child(): # has left child and is left child
					self.parent.left_child = self.left_child
				else: # has left child and is right child
					self.parent.right_child= self.left_child
				self.left_child.parent = self.parent
			else: # has right child
				if self.is_left_child(): # has right child and is left child
					self.parent.left_child = self.right_child
				else: # has right child and is right child
					self.parent.right_child = self.right_child
				self.right_child.parent = self.parent

splice_out orphans the caller node and points any child/parent nodes to where they belong once it's absent from the tree. This method is used internally to remove a successor

node, so it only handles the removal of a node with zero or one child. ![[Implementing a binary search tree using python#^a2bc10]]

The first step is to check if the caller node is a leaf node. If it's a leaf node, orphaning it is quite easy as there are no children that need to be pointed to a new parent – the only thing that needs to be done is check if the caller node is a left child or a right child, and point the parent's appropriate child property to None to successfully remove the caller node from the tree.

If the caller node has a child, we check which side the child is on. In order for the child node to take the caller node's place in the tree, we need to know which side of the parent the caller node connects to, which we check by making use of the is_left_child and is_right_child methods created earlier.

There are four possible scenarios if the caller node has a child – it has a left child and is (itself) the left child of its parent, it has a left child and is the right child of its parent, it has a right child and is the left child of its parent, or it has a right child and is the right child of its parent. Knowing both of the caller node's connections the the level above and the level below in the tree allows us to connect its child node to the correct side of its parent node, allowing us to safely orphan the caller node without orphaning its child.


class TreeNode:

	[...]

	def find_successor(self):
		successor = None
		if self.has_right_child():
			successor = self.right_child.find_min()
		else:
			if self.parent:
				if self.is_left_child():
					successor = self.parent
				else: # is right child
					self.parent.right_child = None
					successor = self.parent.find_successor()
					self.parent.right_child = self
		return successor
	  
	def find_min(self):
		current_node = self
		while current_node.has_left_child():
			current_node = current_node.left_child
		return current_node

The final two methods of TreeNode are find_successor and find_min. The latter is a helper method called by the former. find_min simply fetches the smallest member of a subtree given a starting node, by repeatedly following the left branch of every node it encounters until it reaches a None value, thereby finding the left-most node in the binary search sub-tree which is always the smallest member.

The successor of a node is another node that can take its place without violating the binary search tree properties. ^a2bc10

We can find the successor of a node by locating the node with the next-largest key in the tree, which is what find_successor does. First we check if the caller node has a right child. If it does, every member of the right child sub-tree is greater than the calling node, so we use find_min to find the smallest node within this tree, thereby giving us the successor of the caller node which will be the left-most node of its right sub-tree.

If the caller node doesn't have a right child, the successor must be found elsewhere, higher up in the tree. If the caller node is a left child, then the successor is its parent (which is of course greater than the caller node, being the parent's left child). To illustrate this point with some examples – any right child of the parent would be bigger than the parent, therefore not the next-largest node; if the parent itself is a left child of another node then the parent's parent is larger than the parent, therefore not the next-largest node; if the parent is a right child then the parent's parent is smaller than the caller node, as the caller node exists in the parent's parent's right sub-tree. We can leverage the properties of a binary search tree to eliminate all nodes other than the direct parent.

If the caller node is a right child, it's a little more complex. First we detach the caller node (along with its sub-tree) from the tree in order to remove these nodes from consideration during the next step, where we find the parent node's successor using a recursive call. Since the parent node now (temporarily) doesn't have a right child, the first section of the control flow (if self.has_right_child()) will not be applicable so the algorithm aims to find a successor higher up in the tree – it will check if the parent is a left child, returning the parent's parent as the successor if that's the case, or it will continue making recursive calls until either a successor is found or the root node is reached. The root node is reached when the if self.parent check fails, meaning that no valid successor exists as there is no node present in the tree that is greater than the original caller node – in this case None is returned up the call stack instead of a successor node.

When a successor is found (or when the root is reached), each call in the call stack puts back the right child sub-tree that it removed during prior recursive calls, reverting the tree to its original state.

BinarySearchTree Class


class BinarySearchTree:

	def __init__(self):
		self.root = None
		self.size = 0
	
	def __len__(self):
		return self.size

The BinarySearchTree class represents a network of TreeNode objects. self.root points to the root node, acting as an entry point into the tree structure where all other nodes can be accessed from.


class BinarySearchTree:

	[...]

	def put(self, key, value):
		if self.root:
			self._put(key, value, self.root)
		else:
			self.root = TreeNode(key, value)
			self.size += 1
	
	def _put(self, key, value, current_node):
		if key < current_node.key: # left branch
			if current_node.left_child:
				self._put(key, value, current_node.left_child)
			else:
				current_node.left_child = TreeNode(key, value, parent=current_node)
		else: # right branch
			if current_node.right_child:
				self._put(key, value, current_node.right_child)
			else:
				current_node.right_child = TreeNode(key, value, parent=current_node)

	def __setitem__(self, key, value):
		self.put(key, value)

There are three methods involved in inserting a new node into the tree. Of these, the _put internal method does all the heavy lifting. It takes three parameters – a key, a value, and a current node. When _put is first called, it is passed the root node to its current_node parameter. It then begins by comparing the provided key argument to the current node's key to identify which branch of the tree the new item belongs in.

If the new node belongs in the left branch, it checks the current node's left child to see if it is already populated. If it is, a recursive call is made, this time proving the left child as the current_node argument. If the left child is not occupied, then the new value can be inserted as the current node's left child by creating a TreeNode instance with the key and value arguments, and setting its parent to the current node. The same process occurs if the new node belongs in the right branch of the tree.

The _put method will recursively call itself, following different branches of the tree with each comparison, until an appropriate place for the new node is found.

The put method exists as a layer of abstraction above the _put method – first, it checks for the existence of a root node in the tree. If a root node exists it calls _put, providing the root as the current_node argument. If no root node exists, a root node is created using the key and value arguments provided.

Finally, the __get__ dunder method allows the user to use dictionary-like syntax to insert new nodes. It will implicitly call put with the provided values when the syntax bin_tree['key'] = 'value' is used, where bin_tree is an instance of BinarySearchTree.


class BinarySearchTree:

	[...]

	def get(self, key):
		if self.root:
			result = self._get(key, self.root)
			if result:
				return result.value
			else:
				return None
		else:
			return None
	
	def _get(self, key, current_node):
		if not current_node:
			return None
		elif current_node.key == key:
			return current_node
		elif key < current_node.key:
			return self._get(key, current_node.left_child)
		else:
			return self._get(key, current_node.right_child)

	def __getitem__(self, key):
		return self.get(key)

	def __contains__(self, key):
		if self._get(key, self.root):
			return True
		else:
			return False

Retrieving an item follows a similar procedure to inserting a new item. There are once again three 'getter' methods, and an extra __contains__ method to check for the existence of a node.

Like _put, the heavy lifting in retrieving a node is done recursively by _get. It is provided two arguments – a key to search for and a node to begin operating on. The first step is to check that the current node is not None, as this would mean that the key has not been found in the tree. Next we check if the current node's key is the key being searched for. If it is, the current node is returned. If it isn't we proceed to make a comparison to the current node's key in order to decide which branch of the tree to continue the search on.

If the key being searched for is less than the current node's key, _get calls itself recursively on the current node's left child. If it's greater than the current node's key, the process follows on the current node's right child. The _get method will continue to make recursive calls until it either finds a matching key, in which case it will return the node where the matching key was found, or until it is passed None as the current node, meaning that a leaf node was reached and still no match was found.

The _get internal method is called by get, which first checks for the existence of a root node. If a root node exists, it begins the recursive _get process from the root and returns the result found (or None).

The two dunder methods allow the user to use python dictionary-like syntax for retrieving nodes and checking for the existence of keys within the tree. __getitem__ allows for syntax like found_node = bin_tree['key'], while __contains__ allows for the use of the in python keyword, like 'key' in bin_tree which evaluates to True or False.


class BinarySearchTree:

	[...]

	def __iter__(self):
		return self.root.__iter__()

The __iter__ dunder method allows for Inorder tree traversal. It calls the root node's __iter__ method, which is described above.


class BinarySearchTree:

	[...]
	
	def delete(self, key):
		if self.size > 1:
			node_to_remove = self._get(key, self.root)
			if node_to_remove:
				self.remove(node_to_remove)
				self.size -= 1
			else:
				raise KeyError('Error, key not in tree')
		elif self.size == 1 and self.root.key == key:
			self.root = None
			self.size -= 1
		else:
			raise KeyError('Error, key not in tree')
	
	def __delitem__(self, key):
		self.delete(key)

The most complex methods of the BinarySearchTree class handle the deletion of a node. It's a complex procedure because we need to make sure that no parts of the tree are orphaned in the deletion process, and any changes we make to the tree need to maintain the binary search tree properties.

The delete method itself is fairly simple. The complexity is handled by the remove method, which is called within.

There are two discrete cases that need to be handled – where the tree has more than one node, and when the tree consists of only a root node. This is because in a tree with more than one node, a node is removed by changing the surrounding node's children and parent attributes, whereas when we are only working with a root node we only need to set self.root to None to delete it.

The delete method first checks if the size of the tree is greater than 1. If so, it tries to find the node node related to the provided key argument by using the previously defined _get internal method. If a matching node is found, it calls remove on this node which removes the node and rearranges the tree around it (this process is described in detail below).

If the tree size is not greater than one, that means we are dealing with just a root node. We check to make sure that the root node matches the key we wish to delete, and set the root node to None if that is the case.

The __delitem__ dunder method allows for the use of the del python keyword like, and it calls the delete function.


class BinarySearchTree:

	[...]
	
	def remove(self, current_node):
		if current_node.is_leaf(): # current_node has no children
			if current_node == current_node.parent.left_child:
				current_node.parent.left_child = None
			else:
				current_node.parent.right_child = None
		elif current_node.has_both_children():
			successor = current_node.find_successor()
			successor.splice_out()
			current_node.key = successor.key
			current_node.value = successor.value
		else: # current_node has one child
			if current_node.has_left_child(): # current_node only has left child
				if current_node.is_left_child():
					current_node.left_child.parent = current_node.parent
					current_node.parent.left_child = current_node.left_child
				elif current_node.is_right_child():
					current_node.right_child.parent = current_node.parent
					current_node.parent.right_child = current_node.right_child
				else: # current_node is root
					current_node.replace_node_data(
						key=current_node.left_child.key,
						value=current_node.left_child.value,
						left_child=current_node.left_child.left_child,
						right_child=current_node.left_child.right_child
					)
			else: # current_node only has right child
				if current_node.is_left_child():
					current_node.right_child.parent = current_node.parent
					current_node.parent.left_child = current_node.right_child
				elif current_node.is_right_child():
					current_node.right_child.parent = current_node.parent
					current_node.parent.right_child = current_node.right_child
				else: # current_node is root
					current_node.replace_node_data(
						key=current_node.right_child.key,
						value=current_node.right_child.value,
						left_child=current_node.right_child.left_child,
						right_child=current_node.right_child.right_child
					)

Finally, the remove method is by far the most complex algorithm in the BinarySearchTree class, as it incorporates many nested clauses to successfully maintain the binary search tree properties.

There are four separate cases to handle:

1. The node being removed is a leaf node (no children)

2. The node being removed has both children

3. The node being removed only has a left child

4. The node being removed only has a right child

The first case is the simplest – we begin by checking if the current node (passed to it as an argument by the delete method) is a leaf node. If so, we check whether it's a left or right child, and update the corresponding parent's left or right child property to None to successfully remove the node.

The second case, where two children are present, requires us to find a successor to replace the node we are removing. When a successor is found, it is spliced out of the tree and the current node is overwritten with the successor's key and value.

The third and the fourth case are handled the same way. Here we will describe the removal a node with a right child (case number 4). In order to proceed, we need to know if the current node itself is a left child, a right child, or neither (indicating that it's the root node).

If it's a left child, we must point its parent's left child attribute (which presently points to the current node) to the current node's right child, and we must also point the right child's parent attribute to the current node's parent. This allows us to successfully 'bypass' the current node, thus orphaning it from the rest of the tree.

If the current node is a right child, a similar procedure is followed except we bypass it by pointing the parent's right child attribute to the current node's right child.

Finally, if the current node is neither a left child or a right child, that means that it's the root node. In this case we simply overwrite the node using replace_node_data as there are no other dependent nodes to take care of.