danvk.org » boggle http://www.danvk.org/wp Keepin' static like wool fabric since 2006 Thu, 09 Oct 2014 15:59:51 +0000 en-US hourly 1 http://wordpress.org/?v=3.9.2 What up with Boggle? http://www.danvk.org/wp/2014-01-25/what-up-with-boggle/ http://www.danvk.org/wp/2014-01-25/what-up-with-boggle/#comments Sat, 25 Jan 2014 20:47:11 +0000 http://www.danvk.org/wp/?p=1128 A Boggle-obsessed friend recently asked how he could read all the posts about that game on my blog. I used to include categories on every post, but removed them when I updated the design of danvk.org. The categories are still there, though! Here’s a link to the Boggle Category. It’s better to read the posts in chronological order, starting from the first.

Here they are for easy clicking:

Stepping back a little bit, what’s the status of my Boggle efforts?

The overall goal is to find the highest-scoring Boggle board and prove that there are no better boards out there.

I’ve solved this problem for 3×3 Boggle boards. Here’s the best board:


Its longest words include REPLATED and REPASTED. You can see all the words here. This board has more points on it than any other board (using the Enable2K word list) and I’ve proved it.

The 4×4 problem is much harder, though, because there are eight billion times more boards to consider! I believe that this board is the best one:


You can see a list of all the words on it here. The longest word is REPLASTERING. I haven’t been able to prove it, though, and I haven’t put any serious time into the problem since 2009.

You can find all the Boggle-related code I’ve written on github and learn all about the algorithms and techniques by browsing posts in the Boggle Category on this blog.

http://www.danvk.org/wp/2014-01-25/what-up-with-boggle/feed/ 2
Updated Online Boggle Solver http://www.danvk.org/wp/2012-07-03/updated-online-boggle-solver/ http://www.danvk.org/wp/2012-07-03/updated-online-boggle-solver/#comments Tue, 03 Jul 2012 17:42:45 +0000 http://www.danvk.org/wp/?p=924 See that Online Boggle Solver link on the right? Bet you’ve never clicked it!

I built the online solver back in 2006, mostly just for fun. In the six years since, I’ve wound up using it almost exclusively on my phone during real-life Boggle games. The iPhone didn’t exist when I built that page, and it is a device for which its fancy design (with hover effects and fixed position elements) is very ill-suited. Minimal designs tend to work better on Mobile, and I’ve just pushed out a new version for phones which is very minimal indeed.

The code behind this boggle solver differs from my other boggle code in that it’s designed for low latency rather than high throughput. This is precisely what you want from a web solver. You’re only finding the words on a single board, so time spent loading the word list dominates time spent finding words on the board. The current version uses a custom dictionary which can be MMAPped. This winds up being about 3x faster than reading a dictionary text file and building a Trie on every invocation.

I haven’t worked on Boggle in the past three years, so the state of the art in finding the highest-scoring boards is still where it was when I made my last blog post about breaking 3×3 Boggle. The interesting development then was that you could get something like a 1000x speedup by upper-bounding whole classes of boards rather than scoring each board individually. This is a big enough speed boost to make 3×3 Boggle “solvable” in a day on a desktop computer, but not enough to make 3×4 or 4×4 solvable, even on a big cluster of computers. I was optimistic that memoization could give some more big wins, but when those failed to materialize I got frustrated and stopped working on the project.

http://www.danvk.org/wp/2012-07-03/updated-online-boggle-solver/feed/ 0
A Few More Boggle Examples http://www.danvk.org/wp/2009-08-11/a-few-more-boggle-examples/ http://www.danvk.org/wp/2009-08-11/a-few-more-boggle-examples/#comments Wed, 12 Aug 2009 04:01:30 +0000 http://www.danvk.org/wp/?p=570 Hopefully the last boggle post for a while! I figured I should throw up a few more examples while I still have time left on my OmniGraffle trial.

There are two main shortcomings to the max/no mark upper bound:

  1. It can’t keep track of which words it’s already found.
  2. It can’t keep track of which choices it’s made on each cell.

These can both be fixed, but only by incurring great computational complexity. Remember the tradeoff between the “tightness” of a bound and the difficulty of computing it?

Our first example:

t i .
{a,e} . .
r . .

Let’s look at the search tree starting with the “t” cell. There are three possible words: “tar”, “tie” and “tier”. Here’s the search tree that max/no mark generates:


(I’m drawing the trees slightly differently than I did in the last post, to make the branching more explicit. Branches inside dotted lines come from multiple possibilities on a cell. Other branches result from the ordinary process of looking in different directions from a cell.)

In this example, max/no mark chooses the “a” when it gets to the “{a,e}” cell directly from the “t”. This makes sense, since it results in one point to the e’s zero. However, when it gets to the “{a,e}” cell through the “ti”, it chooses the “e”. This also makes sense, since it gets two points from the “e” (“tie” and “tier”) vs. zero from the “tia” prefix. By the time it makes this choice, it has no memory of choosing the “a” the last time it encountered this cell. If it did, since the “tie” prefix results in two points to the “ta” prefix’s one, it would make sense to go back and change that choice. But remembering this would require lots of expensive bookkeeping. It’s faster (but yields a looser bound) if we just accept the possibility that we’ll make different choices.

Example number 2 of max/no mark’s failings:

t h .
h e .
. . .


The problem in this case is that max/no mark doesn’t remember which words it’s already found. At first glance, it seems like this problem would be easy to remedy. And it certainly would on this board, in which each cell only has a single possible letter. But if you think about a board with lots of choices, you’ll start to see why it’s best to just give up and accept a looser bound.

http://www.danvk.org/wp/2009-08-11/a-few-more-boggle-examples/feed/ 0
Some max/no mark examples http://www.danvk.org/wp/2009-08-11/some-maxno-mark-examples/ http://www.danvk.org/wp/2009-08-11/some-maxno-mark-examples/#comments Tue, 11 Aug 2009 20:41:57 +0000 http://www.danvk.org/wp/?p=552 After the last post, several people mentioned that they were confused about how the “max/no mark” upper bound on the highest score in a class of Boggle boards worked. With some help from OmniGraffle, I’ve created some instructive examples.

Here’s a class of boards:

f {a,u} r
. . e
. . .

The dots mean “no letters here”. The class contains two different boards:

f a r
. . e
. . .


f u r
. . e
. . .


It contains two boards. The one with an “a” has two points worth of words on it, while the one with a “u” only has one. (We’re only looking at words starting with ‘f’ here.)

The diagrams show that the solver starts with the ‘f’ on each board and explores adjacent cells. When it finds a word, it scores it and passes the total score back up the depth-first search tree.

Here’s how the max/no mark bound sees that board class:


When it gets to the “{a,u}” cell, it tries both possible letters. The “a” tree brings back 2 points, whereas the “u” tree brings back 1 point. So it chooses “a” and counts two points. As it so happens, this is the score of the highest-scoring board. The sum/union bound would have added the 1 and the 2, resulting in an upper bound of 3. The max/no mark bound takes advantage of the fact that this cell can only be one of two possibilities, not both.

Now what if we throw a few more letters on:

f {a,u} r
z . e
z y .

With the new letters, there are more points coming from the u:


The two points going through the ‘a’ are dropped on the floor. sum/union would have resulting in a bound of 4+2. When there are lots of letter choices on every cell, you can see why max/no mark is a much tighter bound.

It’s important to note that there are two sources of branching in these search trees: (1) being able to go in multiple directions from a cell (i.e. f->u->r or z) and (2) having multiple choices on a cell (i.e. f->a or u). The max/no mark bound sums the scores resulting from choices in case (1) and takes the max of choices in case (2). The sum/union bound takes the sum in both cases.

http://www.danvk.org/wp/2009-08-11/some-maxno-mark-examples/feed/ 0
Breaking 3×3 Boggle http://www.danvk.org/wp/2009-08-08/breaking-3x3-boggle/ http://www.danvk.org/wp/2009-08-08/breaking-3x3-boggle/#comments Sat, 08 Aug 2009 17:35:04 +0000 http://www.danvk.org/wp/?p=516 Why is finding the highest-scoring Boggle board so difficult? It’s because there are so many boards to consider: 2^72 for the 4×4 case and 2^40 for the 3×3 case. At 10,000 boards/second the former corresponds to about 2 billion years of compute time, and the latter just two years. Just enumerating all 2^72 boards would take over 100,000 years.

So we have to come up with a technique that doesn’t involve looking at every single board. And I’ve come up with just such a method! This is the “exciting news” I alluded to in the last post.

Here’s the general technique:

  1. Find a very high-scoring board (maybe this way)
  2. Consider a large class of boards
  3. Come up with an upper bound on the highest score achieved by any board in the class.
  4. If it’s lower than the score in step #1, we can eliminate all the boards in the class. If it’s not, subdivide the class and repeat step #2 with each subclass.

Classes of Boards
By “class of boards”, I mean something like this:

{a,e,i,o,u} {a,e,i,o,u} r
{b,c,d,f,g,h} a t
d e {r,s,t,v}

The squares that contain a set of letters can take on any of those letters. So this board is part of that class:

a i r
d a t
d e s
189 points

and so is this:

o u r
f a t
d e t
114 points

All told, there are 5 * 5 * 6 * 4 = 600 boards that are part of this class, each with its own score. Other fun classes of boards include “boards with only vowels” (1,953,125 members) and “boards with only consonants” (794,280,046,581 members).

Follow me past the fold for more…

Upper Bounds
Now on to step #3 of the general technique: calculating an upper bound. This is going to be easier if we introduce some mathematical notation:

b = A boggle board
Score(b) = sum of the scores of all the words contained on b
B = a class of boards, i.e. b \in B
Score(B) = max(\{Score(b) | b \in B\})

An upper bound is a function f(B) such that f(B) \geq Score(B), i.e. f(B) \geq Score(b) \forall b \in B.

There’s one really easy upper bound: Score(B)! This just enumerates all the boards in the class B, scores each and takes the maximum score. It’s very expensive to compute for a large class of boards and hence not very practical. You and I both know that no board in containing only consonants has any points on it. We don’t need to enumerate through all 794 billion such boards to determine this.

With upper bounds, there’s a trade-off between how hard they are to compute and how “tight” they are, i.e. how closely they approximate Score(B). Score(B) is very tight but is hard to compute. At the other end of the spectrum, we know that all the words on a board are in the dictionary. So we could just sum up the scores of all the words in the dictionary and get a number, say 1,000,000. Then f(B) = 1,000,000 is an upper bound. It is very easy to compute, but is not very tight.

The trick is to hit some sort of sweet spot that strikes a good balance between “tightness” and ease of computation. Over the rest of this blog post, I’ll present two upper bounds that do this. Upper bounds have the nice property that if f(B) and g(B) are two upper bounds, then h(B) = min(f(B), g(B)) is also an upper bound. So by finding two bounds, we’ll get a third that’s better than either one alone.

The idea of this bound is to find all the words that can possibly occur in a class of boards. Since each word can only be found once, we can add the scores of all these words to get an upper bound.

To get the list of words, we use the same depth-first search strategy as we did to find words on a single board. The wrinkle is that, when we encounter a cell with multiple possible letters, we have to do a separate depth-first search for each.

At first glance, it doesn’t seem like this would be tractable for a board class like this one (alternating vowels and consonants):

{a,e,i,o,u} {b-d,f-h,j-n,p-t,v-z} {a,e,i,o,u}
{b-d,f-h,j-n,p-t,v-z} {a,e,i,o,u} {b-d,f-h,j-n,p-t,v-z}
{a,e,i,o,u} {b-d,f-h,j-n,p-t,v-z} {a,e,i,o,u}

In addition to the branching from going different directions on each square, there’s also a huge amount of branching from trying each letter on each square. But we’re saved by the same lesson we learned in boggle post #3: the dictionary is exceptionally effective at pruning thorny search trees. If we prune search trees like ‘bqu’ that don’t begin words, then there doesn’t wind up being that much work to do.

We can find all possible words on the above board in just under 1 second. This is about 10,000 times slower than it takes to score a conventional board, but it’s certainly tractable. The resulting score is 195,944. Given that no board scores higher than 545 points, this is a wild overestimate. But at least it’s a better bound than a million!

This technique does especially well on boards like this one, which contains all consonants:

{b-d,f-h,j-n,p-t,v-z} {b-d,f-h,j-n,p-t,v-z} {b-d,f-h,j-n,p-t,v-z}
{b-d,f-h,j-n,p-t,v-z} {b-d,f-h,j-n,p-t,v-z} {b-d,f-h,j-n,p-t,v-z}
{b-d,f-h,j-n,p-t,v-z} {b-d,f-h,j-n,p-t,v-z} {b-d,f-h,j-n,p-t,v-z}

This takes 0.23 seconds to score and results in a bound of 208 points (it contains words like ‘crypt’ and ‘gypsy’). We’ve already found a single board that has 545 points on it. So we can eliminate this entire class of 794 billion boards. That’s a speed of over 3 trillion boards/second! Of course, this board class is not typical.

It’s also worth pointing out why this upper bound isn’t tight. Consider this class of boards:

{a,i} r z
f z z
z z z

You can find both “fir” and “far” on boards in this class, but there aren’t any boards that contain both. So while each “fir and “far” contribute a point to the upper bound, they should only really contribute a single point. The sum/union bound doesn’t take into account the relationships between various letter choices. It’s the best trade-off between computability and “tightness” we’ve seen so far, but it’s not good enough to make the problem tractable.

max/no mark
In the sum/union upper bound, we dealt with multiple possible letters on the same square by trying each and adding the resulting scores (taking care not to count any word twice). But why take the sum of all choices when we know that any given board can only take on one of the possibilities? It would result in a much better bound if we took the max of the scores resulting from each possible choice, rather than the sum. This is the idea behind the “max/no mark” bound.

This is a huge win over sum/union, especially when there are many squares containing many possible letters. It does have one major drawback, though. The sum/union bound took advantage of the fact that each word could only be found once. With the max/no mark bound, the bookkeeping for this becomes completely intractable. The words we find by making a choice on one square may affect the results of a choice somewhere else. We can’t make the choices independently. The optimal set of choices becomes an optimization problem in its own right.

Rather than deal with this, max/no mark just throws up its hands. This is what the “no mark” refers to. In the past, we’ve recorded the words we find by marking the Trie. By not marking the Trie with found words, we accept that we’ll double-count words sometimes. But it still winds up being an excellent upper bound.

Lets try some of our previous examples:

{a,e,i,o,u} {a,e,i,o,u} r
{b,c,d,f,g,h} a t
d e {r,s,t,v}
sum/union: 2880
max/no mark: 1307

Alternating vowels and consonants:

{a,e,i,o,u} {b-d,f-h,j-n,p-t,v-z} {a,e,i,o,u}
{b-d,f-h,j-n,p-t,v-z} {a,e,i,o,u} {b-d,f-h,j-n,p-t,v-z}
{a,e,i,o,u} {b-d,f-h,j-n,p-t,v-z} {a,e,i,o,u}
sum/union: 195944
max/no mark: 15692

A class that can be entirely eliminated:

{b,d,f,g,j,k,m,p,v,w,x,z} a {s,y}
{i,o,u} y a
{s,y} {c,h,l,n,r,t} {c,h,l,n,r,t}
sum/union: 2497
max/no mark: 447

max/no mark isn’t always better than sum/union:

{b,d} a {b,d}
a {b,d} a
{b,d} a {b,d}
sum/union: 9
max/no mark: 132

This is something of a worst-case because, while there are relatively few distinct words, there are many different ways to find them.

Putting it all together
Our two bounds do well in different situations. max/no mark works best when there are lots of choices to be made on particular cells and there are relatively few ways to make any particular word. sum/union works best when there are lots of possibilities but relatively few distinct words. Putting them together results in a bound that’s good enough to find the best 3×3 boggle board using the technique described at the beginning of this post.

Given an initial class of boards, we wind up with what I call a “breaking tree”. If the initial class has an upper bound less than 545 points, then we’re done. Otherwise, we pick a cell to split and try each possibility.

Here’s a relatively small breaking tree that results from running this program:

$ ./3x3/ibucket_breaker --best_score 520 --break_class "bdfgjkmpvwxz a sy iou xyz aeiou sy chlnrt chlnrt"
(     0%) (0;1/1) bdfgjkmpvwxz a sy iou xyz aeiou sy chlnrt chlnrt (820, 77760 reps)
                            split cell 4 (xyz) Will evaluate 3 more boards...
(     0%)  (1;1/3) bdfgjkmpvwxz a sy iou x aeiou sy chlnrt chlnrt (475, 25920 reps)
(33.333%)  (1;2/3) bdfgjkmpvwxz a sy iou y aeiou sy chlnrt chlnrt (703, 25920 reps)
                            split cell 5 (aeiou) Will evaluate 5 more boards...
(33.333%)   (2;1/5) bdfgjkmpvwxz a sy iou y a sy chlnrt chlnrt (447, 5184 reps)
(    40%)   (2;2/5) bdfgjkmpvwxz a sy iou y e sy chlnrt chlnrt (524, 5184 reps)
                            split cell (iou) 3 Will evaluate 3 more boards...
(    40%)    (3;1/3) bdfgjkmpvwxz a sy i y e sy chlnrt chlnrt (346, 1728 reps)
(42.222%)    (3;2/3) bdfgjkmpvwxz a sy o y e sy chlnrt chlnrt (431, 1728 reps)
(44.444%)    (3;3/3) bdfgjkmpvwxz a sy u y e sy chlnrt chlnrt (339, 1728 reps)
(46.667%)   (2;3/5) bdfgjkmpvwxz a sy iou y i sy chlnrt chlnrt (378, 5184 reps)
(53.333%)   (2;4/5) bdfgjkmpvwxz a sy iou y o sy chlnrt chlnrt (423, 5184 reps)
(    60%)   (2;5/5) bdfgjkmpvwxz a sy iou y u sy chlnrt chlnrt (318, 5184 reps)
(66.667%)  (1;3/3) bdfgjkmpvwxz a sy iou z aeiou sy chlnrt chlnrt (509, 25920 reps)

The numbers in parentheses are the upper bounds. When they get below 520 (the parameter I set on the command line), a sub-class is fully broken.

Using this technique and the following partition of the 26 letters:

  • bdfgjvwxz
  • aeiou
  • lnrsy
  • chkmpt

I was able to go through all 262,144 (=4^9) board classes in about six hours on a single machine. This resulted in the boards I listed in the last post. Six hours is a big improvement over two years!

If that same factor (two years to six hours) held for the 4×4 case, then we’d be down to 380 years of compute time to find the best 4×4 boggle board. Or, equivalently, 138 days on 1000 machines. That’s still a lot. We’re not quite there yet, but we’re getting closer!

Code for the program that went through all possible board classes can be found here. While many people have found high-scoring boards, I haven’t found any previous work on this upper bounding approach. So if you have any ideas/suggestions on how to improve the bound, they’re probably novel and useful!

http://www.danvk.org/wp/2009-08-08/breaking-3x3-boggle/feed/ 6
Solving Boggle by Taking Option Three http://www.danvk.org/wp/2009-08-04/solving-boggle-by-taking-option-three/ http://www.danvk.org/wp/2009-08-04/solving-boggle-by-taking-option-three/#comments Wed, 05 Aug 2009 01:18:03 +0000 http://www.danvk.org/wp/?p=503 There is exciting news to report in the land of Boggle!

Last we spoke about Boggle, we used Simulated Annealing to get at the question “what is the highest-scoring Boggle board?”

We found a board with 3625 points on it this way. It would have been nice to say that it was the best of all possible boards, but that would have been too rash. While it is a very good board and I have never seen a higher-scoring one, that doesn’t mean there isn’t one out there. Maybe I’ve just been looking in the wrong places.

To prove that the 3625-pointer is the best board, we’d need to show that every other board scores fewer points. In a previous post, I estimated that there were 2^69 possible boggle boards. At 10,000 boards/second, this would take 1.9 billion years of compute time!

When you run up against a computationally intractable problem, there are a few standard ways to deal with it:

  1. Give up.
  2. Come up with a brilliant new algorithm to solve the problem more quickly.
  3. Solve a simpler problem.

Option 1 is the easiest. Option 2 is the hardest. And option 3 is the compromise we’ll be taking for the next few danvk.org boggle posts. Our simpler problem today: 3×3 boggle.

If we drop seven letters (4×4 – 3×3 = 7), we’re left with only 26^9 / 8 ≅ 2^39 boards to consider. If we can achieve 10,000 boards/sec, this would be just over two years’ worth of computation. That’s still a lot, but it’s much more reasonable than 1.9 billion years!

I believe I have solved the 3×3 boggle problem using an approach that is only slightly more clever than this. I used simulated annealing to find very high-scoring 3×3 boards. This was the best one I found, along with its 4×4 best board buddy:

p e r
l a t
d e s
545 points


p e r s
l a t g
s i n e
t e r s
3625 points

It’s worth noting that the optimal 3×3 board has a 2×3 region in common with the 3625 point 4×4 board.

In the next post we’ll talk about how I showed that this board was higher-scoring than all 2^39 others in significantly less than two years (it took one day). As a teaser, I’ve included all the 3×3 boards with more than 500 points worth of words below the fold.

I’ve made the boards fit on a single line by reading them like a book: left to right, then top to bottom.

Board Score
perlatdes 545
pelsartes 542
delratpes 540
lepsartes 536
tepsarles 528
selratpes 528
legsartes 527
berlatdes 526
relsatpes 524
perlatces 523
derlatpes 522
telsarpes 520
parletdes 520
lersatpes 520
saltipres 518
tepsarled 514
tegsarles 514
tedsalrep 514
repsalted 514
ledsartep 514
pelsatres 513
tegsarlep 511
lepsarteg 511
tapselres 510
serlitpas 508
lecsartep 508
tedsarleg 507
persatles 507
legsarted 507
tepsarleg 505
selratdes 505
legsartep 505
canretdes 505
canretdes 505
pinlatres 504
nirgatres 504
lecsartes 504
serlatpes 503
letrasdep 503
derlitpas 503
derlatbes 502
cerlatpes 502
cerlatpes 502
retgasnil 501
nilgasret 501
metparles 501
merpatles 501
tedsarlep 500
tapselred 500
redseltap 500
pecsartes 500
lepsarted 500

As always, this using the Enable2K word list. Code can be found here

http://www.danvk.org/wp/2009-08-04/solving-boggle-by-taking-option-three/feed/ 1
Sky-High Boggle Scores with Simulated Annealing http://www.danvk.org/wp/2009-02-19/sky-high-boggle-scores-with-simulated-annealing/ http://www.danvk.org/wp/2009-02-19/sky-high-boggle-scores-with-simulated-annealing/#comments Thu, 19 Feb 2009 08:09:45 +0000 http://www.danvk.org/wp/?p=412 Don’t let the sixteen month hiatus fool you. There’s just no end to Boggle posts on danvk.org!

In case you’d forgot, we’ve developed a blazing fast boggle solver capable of scoring 10,000 Boggle boards a second. What to do with this? Other than some interesting analyses, the most interesting question is:

What is the highest-scoring Boggle Board?

In this post, we’ll try to answer that question using Simulated Annealing. Here’s a sneak peak at one of the exceptionally word-rich boards we’ll find:

p e r s
l a t g
s i n e
t e r s
3625 points

Follow me past the fold for more…

Here’s the basic idea of Simulated Annealing:

  1. Generate a random board.
  2. Score it.
  3. Change the board somehow.
  4. Score it.
  5. Based on the scores, decide whether to keep the old board or the new one.
  6. Repeat from step 3.

If we make good decisions in step #5, the boards we keep will be progressively higher scoring.

The big decisions come in steps #3 and #5. What does it mean to change a board?

I decided to use two basic permutations to change the board: re-rolling a die:

a b c d
e f g h
i j k l
m n o p
18 points
a b c d
e f g h
i r k l
m n o p
76 points

and swapping two dice:

a b c d
e f g h
i r k l
m n o p
76 points
a b c r
e f g h
i d k l
m n o p
55 points

Clearly one of these changes was beneficial and the other was not.

Generally speaking, we should accept the changes that increase the score in step #5 and reject the changes that don’t. This is a perfectly reasonable strategy. It’s called Gradient Descent and is widely-used.

With Simulated Annealing, we sometimes go with the lower-scoring board. The idea is to not get stuck in a rut. Maybe our board is pretty good. It’s scores higher than all the boards around it. But there are lots of boards a little more distant that are even higher-scoring. With Simulated Annealing, we can take a short-term loss for a long-term gain.

The exact details aren’t too interesting. If they interest you, take a look at my code.

Now for the fun part! Here are the results of a sample run. I’ve concatenated the letters on each row of the board so that they’ll fit on a single line.

Gen. Score Reign Board
0 19 1 kqiqwhyyjthdsibk
1 4 1 iqiqjhyywthdskbk
2 5 1 ibiqjhyywthdskqk
3 9 1 ibiqjhcywthdskqk
4 10 1 ibiqjychsthdwkqk
5 8 1 ibicjyqhsthdwkqk
6 9 1 ibicjyqdsthywkqk
7 23 1 ibicjygdhtsywkqk
8 16 1 ibncjygdhtsywkqa
9 15 1 ibncyygdhtsymkqa
10 15 1 ibncyygwhtsytkda
11 38 1 ibncyygahtsyhkdw
12 146 2 ibscyogahtnyhkdw
14 140 4 ibscyogahtnbhkdw
18 165 2 ibbcyogahtnshkdw
20 159 1 ibbcdogahynshktw
21 169 1 ibbcdogahynshmtw
22 151 4 ibbcdogahynshbtw
26 150 3 hbbcdogaiynshbtw
29 142 2 hbbcdogagynshbtw
31 148 1 hbbcdogaygnshbtw
32 151 1 hbbqdogaygnshbtw
33 174 7 hbtmdogaygnshbtw
40 175 2 hdtmbogaygnshbtw
42 172 3 hdtmbogaygnstbtw
45 165 1 hdtmbogaygnetbtw
46 274 1 hdtmbogaygnetbts
47 298 5 hdtmbogabgnetyts
52 296 8 hdtmbogabgnswyte
60 387 1 hdtmboeabgnswytg
61 403 7 hdtgboeabmnswytg
68 409 3 hdtpboeabmnswytg
71 413 6 hdtpboeabmnsgytg
77 419 3 hptdboeabmnsgytg
80 416 8 hptdbaeobmnsgytg
88 421 9 phtdbaeobmnsgytg
97 449 40 phtdbaeoymnsgytg
137 454 2 phtsbaeoymnsgytg
139 463 11 phtseaeoymnsgytg
150 581 3 phtsmaeoyensgstg
153 662 26 yhtsmaeopensgstg
179 690 1 yhtsmaespenogstg
180 734 2 lhtsmaespenogstg
182 754 13 lctsoaespenogstg
195 849 1 lctsoasepenogstg
196 912 39 lctseasepenogstg
235 974 10 lctseasepenoistg
245 978 10 lctseasedenoistg
255 1046 25 lctseasedsnoietg
280 1281 53 lctseasedsrointg
333 1511 2 lctseasedsronitg
335 1511 8 lctseasedsronitg
343 1515 17 lctseaiedsronitg
360 1537 2 lctseaiedsronitp
362 1643 32 lctseaiedtronisp
394 1708 21 lstseaiedtronisp
415 1727 46 lstseiaedtronisp
461 1744 40 lstseiaedtrinosp
501 1873 36 lstseiaentrinosp
537 2053 9 lstseiaentrisosp
546 2162 13 lstseiaentrigosp
559 2445 42 sltseiaentrigosp
601 2445 21 sltseiaentrigosp
622 2524 58 sltseiaentrigesp
680 2597 37 stlseiaentrigesp
717 2614 232 stlseiaentragesp
949 3090 276 stlseiaentrpgesa
1225 3151 26 stlseiaentrpdesa
1251 3182 n/a stlseiaentrpdeso

Some fun words on that last board: “desperate”, “entreaties”, “operated” and “serenest”. No shortage of 11-pointers on this board!

That’s all for now. Next time we’ll look at some of the highest-scoring boards that simulated annealing finds.

As usual, you can find all the source code for this project on the performance-boggle project at Google Code.

http://www.danvk.org/wp/2009-02-19/sky-high-boggle-scores-with-simulated-annealing/feed/ 9
A Java Surprise http://www.danvk.org/wp/2007-10-09/a-java-surprise/ http://www.danvk.org/wp/2007-10-09/a-java-surprise/#comments Wed, 10 Oct 2007 06:44:25 +0000 http://www.danvk.org/wp/?p=224 java.png
I’ve always been a Java and Eclipse naysayer, but I’m afraid new experiences are forcing me to reevaluate my skepticism. The last time I used Java was JDK 1.3 on a Sparc workstation back in early 2004. Eclipse was hella slow on that hardware, and somehow my workspace wound up in a temporary directory. This was a very bad thing, because as soon as I logged out, my project was gone forever. So I had good reason to swear off Eclipse.

More generally, Java left off a mighty stink back in 2004. Any GUI that I ran on the Mac would look out of place and felt clunky. Performance was poor. But in retrospect, I suspect much of the rank Java smell was really coming from the design patterns gibberish I was being force-fed at the same time. Why use a simple array when you could use an AbstractListFactory that does the same thing with 10x code bloat?

Regular readers only get one guess what program I wrote to get in the swing of things.

The C++ code translated almost line-for-line to Java. Most of the changes were either syntax (“->” to “.”) or swapping String for char*. It took a while to remember that everything is a pointer in Java. The Java statement Trie t = new Trie(); is the equivalent of C++’s Trie* t = new Trie; and not plain old Trie t;, which uses the stack. Another major pain was the lack of unsigned types, which I used as hash codes in some tests.

Here’s the Java code, for those interested. I expected Java code to be an order of magnitude slower than the equivalent C++, so this initial benchmark surprised me:

C++:  Evaluated 21632 boards in 0.694 seconds = 31,162 bds/sec
Java: Evaluated 21632 boards in 1.108 seconds = 19,523 bds/sec

That’s only a 37% performance penalty, far less than the 90% or more I was expecting. Since my code doesn’t create/destroy any objects during the critical section, I’m really testing the effectiveness of the JIT. Since this code is a direct C conversion, it makes sense that it does well.

I don’t understand the internals of Java well enough to know how valid my benchmark is. Profiling C++ code is relatively straightforward, since nothing is going on behind your back. Not the case with Java. For example, does that 1.108 seconds include the time it took the JIT to compile my code? That might explain the whole perf difference. Also, am I fully optimizing? I hear there’s a difference between the client and server JITs. Which one am I using? How can I tell? Lazyweb?

A few more thoughts on the whole experience:

  • Eclipse is just great, especially for someone whose knowledge of the JDK is rusty. Having “string.” bring up a list of methods was great. This was less useful for more obscure Java-isms, like System.getProperty("user.dir") to get the current working directory.
  • When Eclipse spots an error in your code, it suggests a solution, which you can double-click to perform. This was just perfect for a rusty coder. It added lines like import java.io.File; to my code, which would have taken me a while to figure out on my own. That’s always a nuisance in C++.
  • I can’t figure out what command line Eclipse is using to run my code. I have no idea how it’s running my unit tests. Is there any way of finding this out?
  • jUnit was very easy to use inside Eclipse. I particularly liked the ability to write unit tests inside the class they were testing. Just tag a method with @Test and it becomes a unit test. Very cool.
  • I found the whole StringBuilder business pretty clunky. Writing "a" + "b" is special kludge for new StringBuilder().Append("a").Append("b").toString(). I also missed printf when writing to stdout. Is there no natural way to mix numbers and strings?
  • Java GUIs on the Mac still aren’t quite there. When I moused over the icons in Eclipse, there was noticeable flicker.

A tool like Eclipse is a great boon in understanding a large, foreign code base. One thing I’ve discovered in my past year at Google is that, while I’ve become a fairly good coder through experience, I have almost no external experience working with other people’s code. Make no mistake, navigating other people’s code is a skill. And to a surprising extent, it’s a skill disjoint from coding itself. We have a few tools to help with this at Google, but nothing quite so low-latency as what Eclipse pops up when you press type a period. The possibility of tools like Eclipse is a strong argument for languages with a simple syntax. When anyone can parse a language, they can build amazing tools like this one.

http://www.danvk.org/wp/2007-10-09/a-java-surprise/feed/ 2
How many Boggle boards are there? http://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-are-there/ http://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-are-there/#comments Fri, 03 Aug 2007 03:45:32 +0000 http://www.danvk.org/wp/?p=195 I’ve taken a several months break from my Boggle series, mostly because I think everyone got tired of it. I’m going to come back to it, but hopefully at a slower pace this time around.

Last time, we completed a board solver that could score 10,000 boards per second. So what to do with all that power? One option was to compute statistics on random boards. Another is to hunt for the holy grail of Boggle: the highest possible scoring Boggle board. The next few posts will be devoted to the search for this board.

Before undertaking any search, you need to get a feel for your search space. In our case, that’s the set of all 4×4 Boggle boards. How many are there? We can do a few back-of-the-envelope calculations.

To create a board, you roll 16 dice. Each has six possible letters on it, which gives 6^16 possibilities. These dice land in some permutation on the board, which gives another factor of 16!. Finally, a 4×4 board has eight-fold symmetry, which takes us down to 6^16 * 16! / 8 = 7.3e24 ≈ 2 ^ 83.

That’s one upper bound. But it assumed that all 6*16 = 96 symbols on the dice were unique. Obviously they’re not. After a roll, each of the 16 squares will have one of 26 letters on it. Divide by the symmetries, and you get 26 ^ 16 / 8 = 5e21 ≈ 2^72. Much better!

I haven’t been able to come up with any better upper bounds than these. The main flaw in the second approximation is that not all boards can be rolled with the sixteen dice that Hasbro provides. A board of all z’s or qu’s simply can’t occur. If we knew the probability that any sixteen character sequence could be rolled, this would give a true approximation of the number of distinct boards.

The easiest way to do this is with a computer program. It picks a random sequence of sixteen characters, checks whether this board can be rolled, and repeats several thousand times. I believe that checking whether a given board can be rolled is NP-Complete, but in this case the greedy approximation works quite well. I wrote a program (leave a comment if you want to see it) to do this, and processed one million 16-character sequences. Only 84,492 could be represented with the usual dice, or 8.4%. This gives a total of

(26 ^ 16 / 8) * (84,492 / 1,000,000) = 4.6e20 ≈ 2^69.

If you like confidence intervals, my sample size of one million boards gives a 95% confidence interval of [4.545e24, 4.666e24] for the total number of boards. Pretty good.

So, assuming we could enumerate all these boards quickly, how long would it take for our faster solver to find the best board? At 10k boards/sec, we’re looking at 4.5e16 seconds = 1.9 billion years! Clearly we need to do better.

http://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-are-there/feed/ 8
JRuby Performance, Yowch! http://www.danvk.org/wp/2007-05-02/jruby-performance-yowch/ http://www.danvk.org/wp/2007-05-02/jruby-performance-yowch/#comments Thu, 03 May 2007 05:58:13 +0000 http://www.danvk.org/wp/?p=141 I took JRuby 0.9.9 for a spin with the exceptionally-inefficient Boggle program from a few months back. Here are the numbers:

$ time ruby short.rb c a t d l i n e m a r o p e t s
ruby short.rb c a t d l i n e m a r o p e t s  241.95s user 1.20s system 97% cpu 4:08.35 total
$ time jruby short.rb c a t d l i n e m a r o p e t s
jruby short.rb c a t d l i n e m a r o p e t s  1178.86s user 40.84s system 108% cpu 18:44.44 total

I’d heard JRuby was slow, but this is spectacular. Four times slower than the already-slow Ruby?

I’d always thought that the point of JRuby was to run Ruby programs on the JVM, and hence get the benefits of the JVM’s JIT. I guess not. With that kind of performance, the only possible justification is the ability to use Java libraries.

http://www.danvk.org/wp/2007-05-02/jruby-performance-yowch/feed/ 7