10.09.07
Posted in boggle, programming at 11:44 pm by danvk

I’ve always been a Java and Eclipse naysayer, but I’m afraid new experiences are forcing me to reevaluate my skepticism. The last time I used Java was JDK 1.3 on a Sparc workstation back in early 2004. Eclipse was hella slow on that hardware, and somehow my workspace wound up in a temporary directory. This was a very bad thing, because as soon as I logged out, my project was gone forever. So I had good reason to swear off Eclipse.
More generally, Java left off a mighty stink back in 2004. Any GUI that I ran on the Mac would look out of place and felt clunky. Performance was poor. But in retrospect, I suspect much of the rank Java smell was really coming from the design patterns gibberish I was being force-fed at the same time. Why use a simple array when you could use an AbstractListFactory that does the same thing with 10x code bloat?
Regular readers only get one guess what program I wrote to get in the swing of things.
Read the rest of this entry »
Permalink
08.02.07
Posted in boggle, math at 8:45 pm by danvk
I’ve taken a several months break from my Boggle series, mostly because I think everyone got tired of it. I’m going to come back to it, but hopefully at a slower pace this time around.
Last time, we completed a board solver that could score 10,000 boards per second. So what to do with all that power? One option was to compute statistics on random boards. Another is to hunt for the holy grail of Boggle: the highest possible scoring Boggle board. The next few posts will be devoted to the search for this board.
Before undertaking any search, you need to get a feel for your search space. In our case, that’s the set of all 4×4 Boggle boards. How many are there? We can do a few back-of-the-envelope calculations.
To create a board, you roll 16 dice. Each has six possible letters on it, which gives 6^16 possibilities. These dice land in some permutation on the board, which gives another factor of 16!. Finally, a 4×4 board has eight-fold symmetry, which takes us down to 6^16 * 16! / 8 = 7.3e24 ≈ 2 ^ 83.
That’s one upper bound. But it assumed that all 6*16 = 96 symbols on the dice were unique. Obviously they’re not. After a roll, each of the 16 squares will have one of 26 letters on it. Divide by the symmetries, and you get 26 ^ 16 / 8 = 5e21 ≈ 2^72. Much better!
I haven’t been able to come up with any better upper bounds than these. The main flaw in the second approximation is that not all boards can be rolled with the sixteen dice that Hasbro provides. A board of all z’s or qu’s simply can’t occur. If we knew the probability that any sixteen character sequence could be rolled, this would give a true approximation of the number of distinct boards.
The easiest way to do this is with a computer program. It picks a random sequence of sixteen characters, checks whether this board can be rolled, and repeats several thousand times. I believe that checking whether a given board can be rolled is NP-Complete, but in this case the greedy approximation works quite well. I wrote a program (leave a comment if you want to see it) to do this, and processed one million 16-character sequences. Only 84,492 could be represented with the usual dice, or 8.4%. This gives a total of
(26 ^ 16 / 8) * (84,492 / 1,000,000) = 4.6e20 ≈ 2^69.
If you like confidence intervals, my sample size of one million boards gives a 95% confidence interval of [4.545e24, 4.666e24] for the total number of boards. Pretty good.
So, assuming we could enumerate all these boards quickly, how long would it take for our faster solver to find the best board? At 10k boards/sec, we’re looking at 4.5e16 seconds = 1.9 billion years! Clearly we need to do better.
Permalink
05.02.07
Posted in boggle, programming at 10:58 pm by danvk
I took JRuby 0.9.9 for a spin with the exceptionally-inefficient Boggle program from a few months back. Here are the numbers:
$ time ruby short.rb c a t d l i n e m a r o p e t s
2338
ruby short.rb c a t d l i n e m a r o p e t s 241.95s user 1.20s system 97% cpu 4:08.35 total
$ time jruby short.rb c a t d l i n e m a r o p e t s
2338
jruby short.rb c a t d l i n e m a r o p e t s 1178.86s user 40.84s system 108% cpu 18:44.44 total
I’d heard JRuby was slow, but this is spectacular. Four times slower than the already-slow Ruby?
I’d always thought that the point of JRuby was to run Ruby programs on the JVM, and hence get the benefits of the JVM’s JIT. I guess not. With that kind of performance, the only possible justification is the ability to use Java libraries.
Permalink
02.16.07
Posted in boggle, programming at 12:07 am by danvk
I got curious about the programming language D after noticing it outperform C++ in the somewhat-silly Computer Language Shootout and reading Steve Yegge’s thoughts on it. I’ve played around it for the past few days. Here’s some thoughts.
First of all, D is what you might call a “low PageRank language.” If you search for ‘D’, there’s nothing related to the language until result #15. The next result isn’t until #160, and it’s pretty random. Contrast that with C. This makes searching for help on D almost impossible. Try “D slow file I/O” for a taste. This isn’t necessarily Google’s fault, either. There’s just not that many good D resources online yet.
Read the rest of this entry »
Permalink
02.11.07
Posted in boggle at 10:46 pm by danvk
With a fast boggle solver in hand, it’s time for some fun statistics. These are all based on boggle boards rolled with real boggle dice. I’m going sans-code this time, but if you’re interested in seeing it, feel free to holla.
Most common words:
| 3 letters |
|
4 letters |
|
5 letters |
| Word |
Freq (%) |
|
Word |
Freq (%) |
|
Word |
Freq (%) |
| toe |
19.258 |
|
teen |
6.718 |
|
eaten |
2.034 |
| tee |
19.074 |
|
tees |
6.564 |
|
enate |
2 |
| ten |
17.944 |
|
tent |
6.02 |
|
sente |
1.954 |
| net |
17.944 |
|
note |
5.976 |
|
setae |
1.944 |
| tea |
17.65 |
|
tone |
5.838 |
|
tense |
1.86 |
| set |
17.51 |
|
teat |
5.804 |
|
tease |
1.856 |
| eta |
17.176 |
|
toes |
5.664 |
|
teeth |
1.788 |
| ate |
17.176 |
|
toea |
5.548 |
|
eater |
1.788 |
| tae |
16.518 |
|
nets |
5.432 |
|
teens |
1.712 |
| eat |
16.518 |
|
test |
5.344 |
|
seton |
1.702 |
| tie |
16.432 |
|
rete |
5.208 |
|
notes |
1.702 |
| het |
15.684 |
|
nett |
5.204 |
|
tents |
1.646 |
| ret |
15.108 |
|
nest |
5.174 |
|
retie |
1.632 |
| eth |
14.938 |
|
tens |
5.172 |
|
steno |
1.624 |
| oes |
14.698 |
|
sent |
5.156 |
|
sheet |
1.618 |
| the |
14.542 |
|
neat |
5.146 |
|
ester |
1.618 |
| eon |
14.474 |
|
etna |
5.144 |
|
oaten |
1.61 |
| one |
14.366 |
|
ante |
5.144 |
|
teats |
1.608 |
| ose |
13.82 |
|
thee |
5.064 |
|
tones |
1.606 |
| see |
13.78 |
|
tote |
5.052 |
|
enter |
1.596 |
I looked these words up and they all check out. See the Scrabble dictionary if you’re not convinced.
How many words can we expect to find on each board?
That looks like a log-normal distribution. The mean is 98.53 words. How many points?
That’s also a log-normal distribution with the characteristically long tail. The mean is 140.97 points per board.
How many words of each length can we expect to find on a board? Here’s a histogram of the number of words of each length on a board:
Those also look like log-normals, with four letter words being most common.
Put another way, what’s the likelihood of finding a word of a given length on a board?
| Len. |
Likelihood |
| 3 |
99.97994% |
| 4 |
99.901% |
| 5 |
98.62% |
| 6 |
87.56% |
| 7 |
56.21% |
| 8 |
21.36% |
| 9 |
3.94% |
| 10 |
0.442% |
| 11 |
0.0362% |
| 12 |
0.00228% |
| 13 |
0.0001% |
For context, the longest word I’ve ever found in a game was “thrashers” at nine letters.
The most common words were based on a 50,000 board sample. The graphs are based on a 5,000,000 board sample. Feel free to contact me if you’d like source or the Excel spreadsheet.
Permalink
« Previous Page — « Previous entries
Next entries » — Next Page »