Orderbook Part II: Concurrency First

Orderbook Part II: Concurrency First
Photo by Austin Distel / Unsplash

After publishing the first part of this series, I wanted to try a different approach to order filling, an effort to solve the same problem in fundamentally different ways to understand the tradeoffs at a deeper level. After considering my first design, I knew what improvements I wanted to make, but also wanted to force myself to experiment with other options.

First of all, I wanted to separate the order trees into buy and sell side trees, so that matching was a more trivial operation. Second, I wanted a concurrency first approach that would fit the asynchronous lifecycle of an order in the market. Third, I wanted to fix some of the Order interface and make it more intuitive.

This post will address the separation of buy and sell side orders, as well as the concurrency first approach, since they deal with the same part of the engine. The Order interface will be covered in a future part.

The corresponding code for this post is at dylanlott/orderbook.

Share memory by communicating

Go preaches sharing memory by communicating rather than sharing memory to communicate. This principle some value to it, but if used prematurely it can result in unnecessarily complex code, so I think it's good to employ it judiciously. It's often the case that channels are a key component of performant code that follows this pattern, which come with their own minimum cognitive load to start understanding. The benefit, however, can be code that is able to eliminate the need for locks. For example, worker patterns are a frequently seen case, where only one worker has access to the shared data structure at a time, but multiple sources can read in and out as necessary with control flow managed by channels, allowing complicated setups to stay relatively simple.

In the case of an exchange like  Orderbook, it may or may not be possible to do it locklessly, because of the read-heavy nature of order matching and the intense competition for fast fills, but it does enable better handling of the Order lifecycle which is inherently asynchronous - there is no case where one wouldn't expect the order to occupy the books for an entirely unknown amount of time, thus concurrency should be a first-class design principle when handling orders.

This is partly why I want to revisit how the Orderbook design defines the Order interface, but that is outside the scope of this post.

Concurrency First Approach

At it's simplest, we need a data structure that has a set of sell side and buy side orders, and allows an atomic update to occur between both of those fields with strong guarantees.

Taking the inspiration from the sharemem example above, we first need a state monitor. Ours ends up being pretty similar.

// StateMonitor returns a channel that outputs OrderStates as its receives them.
func StateMonitor() chan OrderState {
	updates := make(chan OrderState)
	orderStatus := make(map[string]Order)
	ticker := time.NewTicker(1 * time.Second)

	go func() {
		for {
			select {
			case s := <-updates:
				log.Printf("received state update: %+v", s)
				orderStatus[s.Order.ID()] = s.Order
			case <-ticker.C:
				logState(orderStatus)
			}
		}
	}()

	return updates
}

This allows us to essentially middle-man any order's activity. Next we need to actually handle them, though. Our example doesn't do anything with the logState function, but it is nice to see the order tree piped out during testing to feel like you're doing something cool.

Another benefit of this approach is that, in being asynchronous-first, it becomes trivially easy to insert a pub/sub system into this in the future, but it can stay more traditionally pull-based, without need to deeply modify any of the core pieces, which is a sign of strong design in my opinion. The caveat here is that it is slightly more complex than a system that might be more get/set focused, for example.

This worker pattern at its core transacts on the Orderbook struct itself. For this reason, the Worker should only ever interact with atomic methods that are exposed by the Orderbook itself. The Orderbook is our protected resource, with the StateMonitor wrapping it in a system for feeding orders into the system and listening to updates from the Workers. The attemptFill function ends up being the single atomic function that does handles the interaction with the books after validating an order for readiness and capacity.

Here's the actual Worker function. It takes an input channel, an output channel, a pipe to the StatusMonitor, and a pointer to the Orderbook. Anytime it receives an Order, it will kick off a goroutine that pushes it into the books and starts listening for its completion. We can easily tweak the number of Workers that we spawn. One could argue this is a premature optmization, and I'd probably agree, but it is an interesting feature of this design nonetheless.


// Worker pushes an Order into the books and then starts to listen 
// for that Order's completion.
func Worker(in <-chan Order, out chan<- Order, status chan<- OrderState, orderbook *Orderbook) {
	for o := range in {
		// attempt to fill the order
		go func(order Order) {
			log.Printf("received order %+v", order)

			if err := orderbook.Push(order); err != nil {
				log.Printf("failed to push order into books: %v", err)
			}

			// start attempting to fill the order with our Filler
			Filler(out, order, status, orderbook)
		}(o)
	}
}

The Filler function here is also worth discussing. This function itself can be concurrently tweaked by how it handles listening - it could kick off periodic filler checks, it could have an exponential back off, or on the other side of the coin it could easily have no idea at all of the filling process at all. We'll dig into that in a little bit, though. For now, it's where we're going to stop our abstraction and start the concrete filling logic, since all the pipelining is done.

// Filler will attempt to fill the Order until it's filled and then
// reports it to the output channel. It's a blocking function as it's
// meant to return only when the Order is filled.
func Filler(completed chan<- Order, order Order, status chan<- OrderState, book *Orderbook) {
	for order.Open() > 0 {
		_, err := book.attemptFill(order)
		if err != nil {
			log.Printf("attemptFill failed: %+v", err)
			// notify state that we failed to fill this order.
			status <- OrderState{
				Err:    err,
				Order:  order,
				Status: StatusUnfilled,
			}
			return
		}
	}

	// the order is filled here, so let's return
	completed <- order
}

Filler just trawls the books, constantly calling attemptFill on the order until it is filled or errors. It updates the status monitor with our Order's latest status. Crucially, it separates order completion from order status updates. We only actually get an order into the completed channel when it has no open quantity to fill.

Data structures

The key is to look at filling orders, which updates the actual Order object itself, is a distinct task from updating the Order's position in the books. Both are important and must be changed at the right time, but they also are concerned with different domains. Filling is concerned with each Order's owner and related account, matching orders correctly, and atomically updating the state of those orders. On the other hand, the books are only concerned with where an Order is in the priority queue and whether it should be pulled due to a cancellation, an error, or a fill.

The wisdom of separating your interface from your underlying data structure is particularly salient here. Our Orderbook interface boils down to three functions. Our underlying book implementation can (and according to SOLID design principles it must be able to) completely change without requiring any change to the interface or dependencies.

This design accomplishes that by focusing on the two core jobs of the orderbook: pushing and pulling. Filling is the job of the worker and must reconcile the state of the accounts, but the actual organization of the orders in priority is strictly the data structure's domain.

My first design used a btree for price traversal, and that actually worked pretty well because it meant jumps between prices were always ordered and always to the next price point the books had actually seen. The downside was that it was likely a premature optimization, but using a simpler data structure also had surprising downsides. Using a map to key prices for lists of Orders revealed that it was not as clean as one would think because of the necessity to check for higher or lower prices when filling. I had to reach for recursive traversal again, but was able to solve it with minimal fuss. The hash map implementation might be more memory efficient and thus faster, but it wasn't as simple as one might guess from the initial spike to imlement the searching.

type Orderbook interface {
	//  Push inserts the order into our books and immediately starts attempting to fill it.
	Push(o Order) error
    // Pull removes the order with the given ID from the books and returns it or an error.
    Pull(o Order) (Order, error)
	// Fill is called as a goroutine and sends on out when the order is completed.
	// It sends OrderUpdates on status whenever updates to the order occur.
	Fill(o Order, out chan<- Order, status chan<- OrderUpdate)
	// For example, Fill might instead should be a single argument function and out or status
	// channels should be implementation details that we ignore here.
}

One possible improvement here would be to consider removing the out and status channel arguments from the Fill function and consider slimming it down to just take an order at call time. One could argue that the channels for status and output are implementation details, and I think they'd be right, but for now I'm leaving it as is. It at least signals to the user that the function is explicitly channel driven and asynchronous in nature.

Channels in Go allow you a unique way to guard resources. With this setup, the Worker only needs exposure to the three functions of the Orderbook - Push, Pull, and Fill. We're able to keep the Order's information entirely private except for it's ID, which means we can guarantee other properties of the order, such as filled and open quantities, can't be updated except by the Filler. If we build higher levels of abstraction on top of this engine, we get a nice little blackbox of functionality. You feed an order in to this channel and that's it. You don't have to call any additional functionality, you don't have to worry about errors. You get updates on the state of the order from the exchange but even that is just a tiny little screen (the StateMonitor in this analogy) that updates in near-realtime.

This is the type of abstraction for components that you want in systems programming - the complete hiding of work and logic, securely guarding key resources, defining away errors, and building asynchronous handling in from the first step.

Summary

I've written three major revisions of the exchange at this point and would dare say I feel comfortable discussing the finer points of efficient order matching in Go. I started this as a simple project for practicing asynchronous system design and to encourage novel first principles approach thinking to a problem domain I was entirely unfamiliar with and I think that overall it turned out well! I have a functional exchange engine with an Account interface that could easily be wired up to any cryptocurrency or even fiat assets. With proper licensing we could make this a minimum viable product if we wanted. Now, whether our software could ever compete is a completely different question.