I’m writing a simple binary tree in Rust as part of my current endeavor to learn this language.

I might eventually extend this into a rewrite of some parts of my Orderbook series, but who knows. The day is young.

Starting off

Make a new cargo package and name it something dumb.

$> cargo new beetree

Navigate into the sweet new repo that cargo setup for you and get into the main.rs file. That’s where all the action is.

Then we’re going to setup some basic types to work with. We want our b-tree to be a set of nodes that can have a left or right node, but must themselves have a value for using during lookups. Traversing our binary tree should reference that value and act accordingly.

So we declare our Node<T> type with a generic. This will let our tree be much more extensible. val will be straightforward and is just going to be of the generic type T. A Node will have a value, a left, and a right field.

Let’s define what left and right are. To be clear about intended use, let’s make this a new type and call it ChildNode<T> and it will also extend our generic use.

// A basic layout of the types as I start that compiles without erros (except dead code linter error)
// 
// Node holds a generic value and can have optional left or right children.
struct Node<T>{
    val: T, // what does &'a mean? It's called a "shared reference" and the 'a part means specifically "the lifetime of a"
    left: ChildNode<T>,
    right: ChildNode<T> 
}
// ChildNode makes it a little cleaner to deal with Node
type ChildNode<T> = Option<Box<Node<T>>>;

The gist behind our binary tree is that we’re going to write our node type to be generic, but we will first implement it for i32.This is mostly because I wanted a simple solution first, and it felt more straightforward to deal with an integer first.

// Node of i32 type for our first implementation.
impl<'a > Node<i32> {
    pub fn insert(&mut self, new_val: i32) {
        // don't duplicate values in tree
        if self.val == new_val {
            return
        }

        // detect if we have left or right children
        let target_node= if new_val < self.val {
            &mut self.left
        } else {
            &mut self.right
        };

Above is where we define our impl for the Node<i32> type. We will implement a generic here with Node<T> at a later time.

// match on the case of the target node and figure out if we're at a leaf node
match target_node {
    // if that node exists, recursively call insert on it
    // subnode needs to be ref mut because it's behind a pointer
    Some(ref mut subnode) => subnode.insert(new_val),
    // if it doesn't exist, then we assign it to what we expect it to be
    None => {
        let new_node = Node { val: new_val, left: None, right: None };
        let boxed_node = Some(Box::new(new_node));
        *target_node = boxed_node
    }
}

And here’s the pattern matching that I really like in Rust. This is one of Rust’s better points. I haven’t fully formed my thoughts about the borrow checker yet but the Option type and pattern matching it enables is really nice to work with and promotes, in my opinion, some really readable code.

Problems

As always, I’m going to pick apart my design because that’s how we improve.

First, this isn’t memory safe, as far as I can tell, because we are sharing references. This is fine for a bit but we will likely want to ensure memory safety so that we can scale up 10,000x performance on inserts to handle all the theoretical traffic my hot crypto startup could see.

Secondly, without a generic implementation this is mostly useless. Right now it’s only useful if you want specifically i32 type binary trees, which is definitely not going to raise a lot of hands in a room.

Thirdly, I want to better understand the pointers that this setup does use, for my own benefit, and I think that cleaning up and implementing the generic case will help with that, too.


Notes

Some notes on problems I encountered and what I learned during this exercise

  • https://stackoverflow.com/questions/47640550/what-is-a-in-rust-language pretty succinctly describes what I almost immediately ran into and had to eventually modify my impl Node<T> declaration to accommodate. My first draft was a violation of the borrow checker. 'a means the lifetime of a and you can reference this value (with &). impl<'a > Node<i32> I had to add the <'a> here to account for that. The lesson here is that when you add an impl you add a lifetime for it implicitly.
  • I had to wrap things in Box to get them to work, because the type of left or right may or may not exist.
  • you can sometimes omit lifetime declarations, which is a little confusing as a beginner, and other times one must be specific about it.
  • You will probably want to get comfortable with the various ref mut syntax. I’m still learning
  • as_xxx type helper methods are nice, I’m looking forward to learning proper usage of those.
  • &str is used because Rust doesn’t know the size of the string, but it does know the size of the & so it uses that. Otherwise you must use the String type which is a pointer with data on the heap.
  • I didn’t actually use the &str type in this example, but I came across it enough when I was first trying to get pointers to work that I incidentally learned about them.
  • Box is like a pointer but can do more things. You can use a Box to create structs with the same struct inside, which is pretty much what we did. Which is how you beat the type recursion that Rust doesn’t allow otherwise.
  • https://dhghomon.github.io/easy_rust/Chapter_53.html is a pretty good run down of Box
  • You can declare values with let, const , and static. Rust won’t infer the data type in these declarations, you need to write it out.
  • Rust programmers typically prefer const if you need a value like this, in CAPITAL_CASE
  • Rust’s Option type and match keyword is really convenient and enables really strong pattern matching.