02.01.07

Tries, the Perfect Data Structure

Posted in boggle, programming at 12:49 am by danvk

So far, we’ve been using hash tables to store our word and prefix lists. These are O(1), constant time, so there’s no room for improvement, right? Not quite. That O(1) for hash tables has a fairly large, unpredictable constant. It also assumes that hashing a string is a constant time operation, which can’t possibly be true. Sure enough, Ruby’s string.hash method does a loop over the characters in the string, so it’s really O(m), where m is the length of the string. There is room for improvement after all!

What exactly do we need our data structure to do? When we look at a character on the board, we need to know two things: whether it starts a word (so we’ll need to recur) and whether it completes a word (so we’ll need to score it). As it turns out, Tries are a perfect solution for this problem. Here’s a picture of a small Trie dictionary:

bubbles-lines.png

To find the words, read down the lines: K-I-N-D, K-N-I-F-E. Nodes with double circles represent complete words. Suppose the Boggle board looks like this, and we’re looking at the “K”:

graph.png

We need to know which of the surrounding letters will complete a word. We can read this off the trie immediately: none of them do. We also need to know which letters could result in a word if we kept searching. This is also immediate: “N” and “O” are the only children of “K” in the trie that are next to it on the board. So those are the two avenues we need to explore. Let’s take a look at the “N” path. We go down to the “N” node in the trie and look at its children. None are complete words, but “I” continues a word. And so on until we get to “E”:

graph-knife.png   bubbles-knife.png

In effect, we’re doing a parallel search through two trees, using each to prune the other. The trie prunes the board search when we get to a sequence of letters that doesn’t start any words. The board search prunes the trie search when we get stuck in a corner or aren’t adjacent to a particular letter.

The real kicker is that tries let us reduce all the operations in our depth-first search to simple memory accesses in the trie. Have we completed a word? The answer’s right where we’re looking in the trie. Does this character start a word? We check if the current trie node has that character as a child. Just a single memory reference. Compared to hash lookups, memory accesses are blazing fast.

Next time we’ll write a simple Trie structure and take it for a spin. Unfortunately, Ruby doesn’t let us take full advantage of the Trie’s Boggle solving powers. I’ve only been able to realize a 2x speed boost from Tries in Ruby. We’ve reached the end of its road. The gains if we go to C++ are far greater, so that’s where we’ll be heading.

1 Comment

  1. danvk.org » Breaking 3×3 Boggle said,

    August 8, 2009 at 1:09 pm

    [...] get the list of words, we use the same depth-first search strategy as we did to find words on a single board. The wrinkle is that, when we encounter a cell with [...]