How to Write a Database - Part 1

Table of Contents

Introduction

In this series of blog posts we are going to write a database. Specifically, we are going to write a document-oriented database with ACID transactions. Let's define those words.

Definitions

Database

When I write "database", I mean "database management system". People use database management systems to store, retrieve, update, and delete data. In particular people expect databases to…

  • Store lots of data.
  • Provide a query language for searching the data.
  • Quickly return results.
  • Remember the data. Even if the system crashes.

Document Oriented

Document-oriented databases store data as documents. A document is a group of key-value pairs. Documents are similar to objects in JavaScript, hashes in Ruby, and dicts in Python. Each of these data structures is a group of key-value pairs. You can lookup any value in the data structure by its key.

Related documents are stored together in a "collection".

Transaction

A transaction is a set of work performed on the database. A set of work can include a combination of reads, writes, updates, and deletes.

ACID

ACID is an acronym where each letter stands for a certain property of a transaction.

A
Atomic
C
Consistent
I
Isolated
D
Durable

Atomic

A transaction is atomic if it is treated as an inseparable unit. Let's say we have a transaction that inserts a document and then inserts another. Suppose the computer crashes while the transaction is running. If the transaction is atomic, the database will be in one of two states when we turn the computer on. Either neither document will have been inserted. Or both documents will have been inserted.

Without atomicity it is possible that one document will be inserted but not the other.

Consistent

If you start with a database in a valid state, apply a transaction, and the database is still valid, that transaction is called consistent. For this series we will say a transaction is consistent if it does not leave any malformed documents. As far as I can tell consistency is more important in relational databases where you have things like constraints.

Isolated

A transaction is isolated if it does not interfere with other transactions. Let's say we have two transactions. One transaction is going to read every document from the database and the other is going to delete every document from the database. If we run the transactions at the same time, the first transaction should read every document despite the other transaction deleting the documents simultaneously.

Durable

A transaction is durable if its affects are permanent once it has been committed. Any changes made by a committed, durable transaction are still there if we restart the database or if the database crashes.

Conclusion

That's what a document-oriented database with ACID transactions is. The rest of the series will focus on how to build one.

Author: Ryan Riddle

Created: 2018-02-07 Wed 21:54

Emacs 24.5.1 (Org mode 8.2.10)

Validate