10.09.07

A Java Surprise

Posted in boggle, programming at 11:44 pm by danvk

java.png
I’ve always been a Java and Eclipse naysayer, but I’m afraid new experiences are forcing me to reevaluate my skepticism. The last time I used Java was JDK 1.3 on a Sparc workstation back in early 2004. Eclipse was hella slow on that hardware, and somehow my workspace wound up in a temporary directory. This was a very bad thing, because as soon as I logged out, my project was gone forever. So I had good reason to swear off Eclipse.

More generally, Java left off a mighty stink back in 2004. Any GUI that I ran on the Mac would look out of place and felt clunky. Performance was poor. But in retrospect, I suspect much of the rank Java smell was really coming from the design patterns gibberish I was being force-fed at the same time. Why use a simple array when you could use an AbstractListFactory that does the same thing with 10x code bloat?

Regular readers only get one guess what program I wrote to get in the swing of things.
Read the rest of this entry »

08.02.07

How many Boggle boards are there?

Posted in boggle, math at 8:45 pm by danvk

I’ve taken a several months break from my Boggle series, mostly because I think everyone got tired of it. I’m going to come back to it, but hopefully at a slower pace this time around.

Last time, we completed a board solver that could score 10,000 boards per second. So what to do with all that power? One option was to compute statistics on random boards. Another is to hunt for the holy grail of Boggle: the highest possible scoring Boggle board. The next few posts will be devoted to the search for this board.

Before undertaking any search, you need to get a feel for your search space. In our case, that’s the set of all 4×4 Boggle boards. How many are there? We can do a few back-of-the-envelope calculations.

To create a board, you roll 16 dice. Each has six possible letters on it, which gives 6^16 possibilities. These dice land in some permutation on the board, which gives another factor of 16!. Finally, a 4×4 board has eight-fold symmetry, which takes us down to 6^16 * 16! / 8 = 7.3e24 ≈ 2 ^ 83.

That’s one upper bound. But it assumed that all 6*16 = 96 symbols on the dice were unique. Obviously they’re not. After a roll, each of the 16 squares will have one of 26 letters on it. Divide by the symmetries, and you get 26 ^ 16 / 8 = 5e21 ≈ 2^72. Much better!

I haven’t been able to come up with any better upper bounds than these. The main flaw in the second approximation is that not all boards can be rolled with the sixteen dice that Hasbro provides. A board of all z’s or qu’s simply can’t occur. If we knew the probability that any sixteen character sequence could be rolled, this would give a true approximation of the number of distinct boards.

The easiest way to do this is with a computer program. It picks a random sequence of sixteen characters, checks whether this board can be rolled, and repeats several thousand times. I believe that checking whether a given board can be rolled is NP-Complete, but in this case the greedy approximation works quite well. I wrote a program (leave a comment if you want to see it) to do this, and processed one million 16-character sequences. Only 84,492 could be represented with the usual dice, or 8.4%. This gives a total of

(26 ^ 16 / 8) * (84,492 / 1,000,000) = 4.6e20 ≈ 2^69.

If you like confidence intervals, my sample size of one million boards gives a 95% confidence interval of [4.545e24, 4.666e24] for the total number of boards. Pretty good.

So, assuming we could enumerate all these boards quickly, how long would it take for our faster solver to find the best board? At 10k boards/sec, we’re looking at 4.5e16 seconds = 1.9 billion years! Clearly we need to do better.

05.02.07

JRuby Performance, Yowch!

Posted in boggle, programming at 10:58 pm by danvk

I took JRuby 0.9.9 for a spin with the exceptionally-inefficient Boggle program from a few months back. Here are the numbers:

$ time ruby short.rb c a t d l i n e m a r o p e t s
2338
ruby short.rb c a t d l i n e m a r o p e t s  241.95s user 1.20s system 97% cpu 4:08.35 total
$ time jruby short.rb c a t d l i n e m a r o p e t s
2338
jruby short.rb c a t d l i n e m a r o p e t s  1178.86s user 40.84s system 108% cpu 18:44.44 total

I’d heard JRuby was slow, but this is spectacular. Four times slower than the already-slow Ruby?

I’d always thought that the point of JRuby was to run Ruby programs on the JVM, and hence get the benefits of the JVM’s JIT. I guess not. With that kind of performance, the only possible justification is the ability to use Java libraries.

02.16.07

Initial Impressions of D

Posted in boggle, programming at 12:07 am by danvk

I got curious about the programming language D after noticing it outperform C++ in the somewhat-silly Computer Language Shootout and reading Steve Yegge’s thoughts on it. I’ve played around it for the past few days. Here’s some thoughts.

First of all, D is what you might call a “low PageRank language.” If you search for ‘D’, there’s nothing related to the language until result #15. The next result isn’t until #160, and it’s pretty random. Contrast that with C. This makes searching for help on D almost impossible. Try “D slow file I/O” for a taste. This isn’t necessarily Google’s fault, either. There’s just not that many good D resources online yet.
Read the rest of this entry »

02.11.07

Some Boggle Statistics

Posted in boggle at 10:46 pm by danvk

With a fast boggle solver in hand, it’s time for some fun statistics. These are all based on boggle boards rolled with real boggle dice. I’m going sans-code this time, but if you’re interested in seeing it, feel free to holla.

Most common words:

3 letters   4 letters   5 letters
Word Freq (%)   Word Freq (%)   Word Freq (%)
toe 19.258   teen 6.718   eaten 2.034
tee 19.074   tees 6.564   enate 2
ten 17.944   tent 6.02   sente 1.954
net 17.944   note 5.976   setae 1.944
tea 17.65   tone 5.838   tense 1.86
set 17.51   teat 5.804   tease 1.856
eta 17.176   toes 5.664   teeth 1.788
ate 17.176   toea 5.548   eater 1.788
tae 16.518   nets 5.432   teens 1.712
eat 16.518   test 5.344   seton 1.702
tie 16.432   rete 5.208   notes 1.702
het 15.684   nett 5.204   tents 1.646
ret 15.108   nest 5.174   retie 1.632
eth 14.938   tens 5.172   steno 1.624
oes 14.698   sent 5.156   sheet 1.618
the 14.542   neat 5.146   ester 1.618
eon 14.474   etna 5.144   oaten 1.61
one 14.366   ante 5.144   teats 1.608
ose 13.82   thee 5.064   tones 1.606
see 13.78   tote 5.052   enter 1.596

I looked these words up and they all check out. See the Scrabble dictionary if you’re not convinced.

How many words can we expect to find on each board?

words.png

That looks like a log-normal distribution. The mean is 98.53 words. How many points?

scores.png

That’s also a log-normal distribution with the characteristically long tail. The mean is 140.97 points per board.

How many words of each length can we expect to find on a board? Here’s a histogram of the number of words of each length on a board:

lens.png

Those also look like log-normals, with four letter words being most common.

Put another way, what’s the likelihood of finding a word of a given length on a board?

Len. Likelihood
3 99.97994%
4 99.901%
5 98.62%
6 87.56%
7 56.21%
8 21.36%
9 3.94%
10 0.442%
11 0.0362%
12 0.00228%
13 0.0001%

For context, the longest word I’ve ever found in a game was “thrashers” at nine letters.

The most common words were based on a 50,000 board sample. The graphs are based on a 5,000,000 board sample. Feel free to contact me if you’d like source or the Excel spreadsheet.

« Previous Page« Previous entries Next entries »Next Page »