01.30.07

Boggle #3: Succeed by not Being Stupid

Posted in boggle, programming at 11:02 pm by danvk

Last time, we wrote a program that found all the words on a Boggle board in just over four minutes per board. But what was it spending all that time doing? Let’s add some debugging output to find out:

def solve(x, y, sofar)
puts "#{x},#{y}: ‘#{sofar}’"

Here’s what happens:

$ ruby fast.rb a b c d e f g h i j k l m n o p       
0,0: ''
0,1: 'a'
0,2: 'ab'
0,3: 'abc'
1,2: 'abcd'
1,1: 'abcdg'
1,0: 'abcdgf'
2,0: 'abcdgfe'
2,1: 'abcdgfei'
2,2: 'abcdgfeij'
1,3: 'abcdgfeijk'
2,3: 'abcdgfeijkh'
3,2: 'abcdgfeijkhl'
3,1: 'abcdgfeijkhlo'
3,0: 'abcdgfeijkhlon'
3,3: 'abcdgfeijkhlo'
3,3: 'abcdgfeijkhl'
3,2: 'abcdgfeijkhlp'
3,1: 'abcdgfeijkhlpo'
3,0: 'abcdgfeijkhlpon'
...

Quick: how many words start with “abcd”? I can’t think of any either. But our program dutifully searches over the entire board for words we know it won’t find. It’s painful to think about all the time our search is wasting exploring paths that can’t possibly start real words. We need to cut off paths like “abcd” at the head. We can do this by creating a list of word stems: sequences of letters that can start legal Boggle words. “a” is on that list. So is “ab”. “abc” probably isn’t, though, and “abcd” certainly isn’t.

We may as well store the list of stems in another hash table. Here’s how:

Words = Hash.new
Stems = Hash.new
IO.foreach("words") do |line|
  line =~ /^([a-z]{3,17})\b/ or next
  wd = $1
  Words[wd] = true
  (1..wd.length-1).each do |len|
    Stems[wd[0,len]] = true
  end
end

Now we modify the solve routine to only make a recursive call when it could result in new words. This is a trivial change:

    solve(x+dx,y+dy,wd) if Stems[wd] # keep going if it'll help!

Now the moment of truth. How much of a win did these four new lines of code buy us?

$ time ruby fast.rb a b c d e f g h i j k l m n o p 
18

real    0m7.831s
user    0m7.651s
sys     0m0.096s

$ time ruby fast.rb c a t d l i n e m a r o p e t s
2338

real    0m7.990s
user    0m7.818s
sys     0m0.094s

WOW! That’s a factor of 30 improvement! If we dig a little deeper, it gets even more interesting:

$ ruby -r profile fast.rb c a t d l i n e m a r o p e t s
2338
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 40.19   213.80    213.80   172235     1.24     2.62  Range#each
 11.24   273.58     59.78        1 59780.00 401400.00  IO#foreach
 11.05   332.36     58.78  1554484     0.04     0.04  Hash#[]=
 10.96   390.64     58.28     8892     6.55    84.27  Array#each
 10.43   446.13     55.49  1381345     0.04     0.04  String#[]
  4.25   468.76     22.63   130010     0.17     0.30  Range#===
  2.99   484.64     15.88   247385     0.06     0.06  Fixnum#<=>
  2.41   497.44     12.80   244423     0.05     0.05  Fixnum#+
  1.40   504.87      7.43   132259     0.06     0.06  Array#[]
  1.37   512.18      7.31     8891     0.82    85.58  Object#solve

The first three methods are called only in dictionary-loading code we just wrote. The vast majority of time is being spent loading the dictionary! There’s not much we can do to improve this, but we can make it less important by changing the problem we’re trying to solve.

Right now we’re just trying to score one board. But if we knew in advance that we were going to score 10 boards, we could load the dictionary once and then score all ten boards using that dictionary. This would mitigate the slow dictionary load. Here’s one way to do it. Input is now taken from STDIN:

# Take whitespace-separated inputs on each line
ARGF.each do |line|
  Found.clear
  letters = line.chomp.split(' ')
  (0..15).each { |i| Bd[i/4][i%4] = letters[i] }
  (0..15).each { |i| solve(i/4, i%4, "") }
  puts Found.keys.inject(0) { |sum,w| sum + Scores[w.length] }
end

There are a few Rubyisms we haven’t seen yet in those lines. ARGF is a great crutch for Perl programmers who miss typing while(<>) {...}. It opens each input file left in ARGV and yields each line. If there’s no input files left, it reads STDIN and yields each line it gets there. Many, many programs do their work in an ARGF loop.

The chomp method in line.chomp removes a trailing newline from line, if it’s there. The split(' ') is a special case of the split method that splits the line on whitespace into an array. These are also Perlisms. Come to think of it, most of the strange, idiosyncratic features of Ruby are ultimately inspired by Perl.

I generated a file with ten random boards in it:

c a t d l i n e m a r o p e t s
a b c d e f g h i j k l m n o p
s i e e u e o o c t r k x o n n 
c n s r e e h m i o r t o i k y 
t v b i t n p u e e o t n t r e 
y f h a s e g i o n m o d t a e 
e e n o l a o s t i a s v m e l 
s t n t e e n t a e e o c p o b 
a s y r p h u v i e r e e u p o 
s qu n g t f y a t b e w r e t e

And now the performance:

$ time ruby multi.rb boards 
2338
18
138
121
190
212
272
211
138
87

real    0m8.454s
user    0m8.264s
sys     0m0.098s

It took 7.75 seconds to process one board earlier in this post, so that means that the solver itself is running at a rate of (8.264-7.75)/9 = 0.057 secs/board. This is fast enough that we need to start inverting our units. We’re getting 17.5 boards/sec, and we’ve improved by a factor of nearly 5000 since our initial attempt!

3 Comments

  1. Steven said,

    April 18, 2008 at 6:09 am

    Sorry your code doesn’t seem to work anymore

  2. Steven said,

    April 18, 2008 at 6:21 am

    OK it does, but why do I need to put spaces in between the letters?

  3. danvk.org » Breaking 3×3 Boggle said,

    August 8, 2009 at 1:13 pm

    [...] from trying each letter on each square. But we’re saved by the same lesson we learned in boggle post #3: the dictionary is exceptionally effective at pruning thorny search trees. If we prune search trees [...]