Today I decided to write a Huffman Encoding program to help me learn Haskell. Structure and Interpretation of Computer Programs has a good section that explains Huffman Encoding Trees.

My Huffman Encoding program has three main parts.

- A function
`makeTree`

that turns a list of nodes into a Huffman Encoding Tree. - A function
`decode`

that takes a Huffman Encoding Tree and a bit string and returns the unencoded text. - A function
`encode`

that takes a Huffman Encoding Tree and a piece of text and returns a bit string.

Here is the program.

Let's look at each part.

A Huffman Encoding Tree is represented by the Node data type. A Node is either a Leaf that has a Symbol and a Weight or an Interior Node that has a left Node, a right Node, a list of Symbols in the tree, and a cummulative Weight.

You can construct a Huffman Encoding Tree by passing a list of Leaf Nodes that is sorted by the Weight of each Leaf to `makeTree`

like this.

`let tree = makeTree [(Leaf 'A' 1), (Leaf 'B' 1), (Leaf 'C' 2)]`

`makeTree`

works by taking the two Nodes with with smallest weights from the list and combining them into a new Node then inserting the new Node into the list in the correct position.

`encode`

works by taking a symbol from the text string and searching for that symbol in the tree. As `encode`

moves down the tree toward the leaves, it appends a bit to the resulting bit string. When `encode`

reaches a Leaf, `encode`

returns to the root of the Huffman Encoding Tree and takes the next symbol from the text string. When `encode`

is out of symbols it returns.

`decode`

works in a similar way. `decode`

takes a bit from the bit string and moves to the left subtree if the bit is 0 and moves to the right subtree otherwise. When `decode`

reaches a Leaf, it pushes the leaf's symbol into the result string, returns to the root of the Huffman Encoding Tree and takes the next bit from the bit string.

Pattern-matching helps keep the definition of `makeTree`

small. The first argument the definition matches is a list containing only one element. This is the base case. The second argument the definition matches is a list containing at least two elements `first`

and `second`

and the `rest`

of the list. The `x:xs`

syntax is called a cons and represents a list composed of an element `x`

and the rest of the elements in the list `xs`

. Pattern-matching `first`

, `second`

, and `rest`

saves us from having to write code to extract `first`

, `second`

, and `rest`

from the list.

There is no `if`

in this program. It branches using pattern-matching instead. For example, instead of writing `if isLeaf tree`

, Haskell can tell when `encode`

is called on a Leaf if we write this `encode codeTree (Leaf s w) (sym:symbols)`

. And instead of having to write code to extract the left and right subtrees from a Node, Haskell will bind the left and right subtrees to l and r respectively if we write this `encode codeTree (Interior l r) (sym:symbols)`

.

Pattern-matching helps keep this code dense. Even though I am not used to reading dense code, I like it because almost everything you need to know about what the program is doing is represented locally as opposed to being scattered around the program in different functions.