Implementing a B-Tree

I am writing a database. At the moment the database can take more than 5 seconds to respond to a query on a table with one million rows.

Each table is stored in its own file and rows are stored in the file one after the next. There is no indexing. If we want to find a particular record in the table we may have to look at every record in the file before we find it.

Search time increases linearly with the number of rows. In a table with 100,000 rows, a query might take 0.5 seconds to complete. In a table with 1,000,000 rows, maybe 5,000 seconds!

One of the things I can do to make searching the database faster is add a non-clustered index. I can use a B-Tree to index the table¹.

I decided to write my own B-Tree in an effort to understand them better. Over the past three days, I have written over 400 lines of code for my B-Tree and I am not sure it is correct.

A conversation with another recurser made me realize searching a B-Tree is a lot like searching a skip list. I got to wondering whether indexing the tables with a skip list might be easier. Evidently, I am not the first to have that thought.

According to William Pugh, the inventor of skip lists,

Skip lists are a probabilistic data structure that seem likely to supplant balanced trees as the implementation method of choice for many applications. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space.

Footnotes

There is a good description of B-Trees in Donald Knuth's The Art of Computer Programming