# 12.31.08

## A Minus Times a Minus

Posted in math at 7:32 pm by danvk

Back in grade school, I never got a satisfactory answer to the question “why is a minus times a minus equal to a plus?”

I didn’t think about it again for years, until my mom asked for an explanation to give when she taught her students. This time around, I immediately realized how you’d go about proving it. Here’s how it goes:

 1 + (-1) = 0 (definition of “-1″) (-1) * 1 + (-1) * (-1) = 0 (multiply both sides by -1 and distribute) -1 + (-1)(-1) = 0 (anything times 1 is itself) (-1)(-1) = 1 (add 1 to both sides)

Easy as could be, right? Unfortunately, mom preferred my brother’s explanation:

A minus times a minus is equal to a plus.
The reasons for this, we need not discuss.

# 09.25.08

Posted in math, politics at 3:45 pm by danvk

On September 15, SurveyUSA released this poll of likely voters in Virginia:

McCain (R) 46%
Obama (D)  50%
Margin of Error: +/-3.7%

Tables like this appear on TV and in newspapers all the time. But they’re never accompanied by any explanation of how to interpret the margin of error. Commentators usually interpret it in one of two ways:

1. Obama is ahead by more than the margin of error, hence his lead is statistically significant.
2. That “+/-” means either number could be off by that amount. If you added 3.7% to McCain’s 46% and subtracted 3.7% from Obama’s 50%, McCain would actually be ahead. So Obama’s lead is not statistically significant; it is less than twice the margin of error.

In either case, they are wrong.

So what’s the right way to interpret the margin of error? A lead is significant if it is 1.6 times the margin of error or greater. That’s 5.9% for our poll, so Obama’s lead is not significant.

This is a strange, non-intuitive rule, which explains why commentators don’t use it. The derivation is more revealing than the rule itself.
Read the rest of this entry »

# 10.15.07

## Living with probability 1-ln(2)

Posted in math at 8:35 pm by danvk

I learned a great brainteaser while hiking in Yosemite with many a Googler this weekend. It’s from a collection called “Seven Puzzles You Think You Must Not Have Heard Correctly” (PDF), and that title is particularly apt for this one.

Here’s the question:

The names of 100 prisoners are placed in 100 wooden boxes, one name to a box, and the boxes are lined up on a table in a room. One by one, the prisoners are led into the room; each may look in at most 50 boxes, but must leave the room exactly as he found it and is permitted no further communication with the others.

The prisoners have a chance to plot their strategy in advance, and they are going to need it, because unless every single prisoner finds his own name all will subsequently be executed.

Find a strategy for them which which has probability of success exceeding 30%.

Since each prisoner has a 50/50 shot at finding his own number, you’d expect their chances of survival to be (1/2)^100, which is tiny! But that’s not quite true. Here’s the reason it’s not impossible: the number of prisoners you expect to find their own number will always be 50 no matter what your strategy. But your strategy can skew the distribution. For example, if everyone looks at the same fifty lockers, you’ll always have exactly 50 prisoners find their own numbers. The problem is to find a strategy that skews the distribution towards the extremes: either it succeeds spectacularly or fails miserably.

There’s a solution in that link above, and I’ll post my own in a few days. Give it a shot!

# 08.02.07

## How many Boggle boards are there?

Posted in boggle, math at 8:45 pm by danvk

I’ve taken a several months break from my Boggle series, mostly because I think everyone got tired of it. I’m going to come back to it, but hopefully at a slower pace this time around.

Last time, we completed a board solver that could score 10,000 boards per second. So what to do with all that power? One option was to compute statistics on random boards. Another is to hunt for the holy grail of Boggle: the highest possible scoring Boggle board. The next few posts will be devoted to the search for this board.

Before undertaking any search, you need to get a feel for your search space. In our case, that’s the set of all 4×4 Boggle boards. How many are there? We can do a few back-of-the-envelope calculations.

To create a board, you roll 16 dice. Each has six possible letters on it, which gives 6^16 possibilities. These dice land in some permutation on the board, which gives another factor of 16!. Finally, a 4×4 board has eight-fold symmetry, which takes us down to 6^16 * 16! / 8 = 7.3e24 ≈ 2 ^ 83.

That’s one upper bound. But it assumed that all 6*16 = 96 symbols on the dice were unique. Obviously they’re not. After a roll, each of the 16 squares will have one of 26 letters on it. Divide by the symmetries, and you get 26 ^ 16 / 8 = 5e21 ≈ 2^72. Much better!

I haven’t been able to come up with any better upper bounds than these. The main flaw in the second approximation is that not all boards can be rolled with the sixteen dice that Hasbro provides. A board of all z’s or qu’s simply can’t occur. If we knew the probability that any sixteen character sequence could be rolled, this would give a true approximation of the number of distinct boards.

The easiest way to do this is with a computer program. It picks a random sequence of sixteen characters, checks whether this board can be rolled, and repeats several thousand times. I believe that checking whether a given board can be rolled is NP-Complete, but in this case the greedy approximation works quite well. I wrote a program (leave a comment if you want to see it) to do this, and processed one million 16-character sequences. Only 84,492 could be represented with the usual dice, or 8.4%. This gives a total of

(26 ^ 16 / 8) * (84,492 / 1,000,000) = 4.6e20 ≈ 2^69.

If you like confidence intervals, my sample size of one million boards gives a 95% confidence interval of [4.545e24, 4.666e24] for the total number of boards. Pretty good.

So, assuming we could enumerate all these boards quickly, how long would it take for our faster solver to find the best board? At 10k boards/sec, we’re looking at 4.5e16 seconds = 1.9 billion years! Clearly we need to do better.

# 07.15.07

Posted in math at 11:07 pm by danvk

Inspired by the recent Laffer curve fuss, I’ll be doing my MarkCC imitation tonight, calling out the scourge of bad math wherever it rears its ugly head.

Case number 1: I’m on something of an Alex Haley kick, so I’ve been reading The Autobiography of Malcolm X. Here’s a passage that jumped out at me:

Here is an example: a British anthropologist named Dr. Louis S. B. Leakey is displaying some fossil bones—a foot, part of a hand, some jaws, and skull fragments. On the basis of these, Dr. Leakey has said it’s time to rewrite completely the history of man’s origin.
This species of man lived 1,818,036 years before Christ. And these bones were found in Tanganyika. In the Black Continent.1

How good of Dr. Leakey to be so precise! And here I thought radio-carbon dating was only accurate to about one part in a hundred at best. What year was this written in again? 1964? Well, 1,818,036 + 1,964 = 1,820,000. Something makes me think Alex never learned about significant figures..

Case number 2: This one’s an exercise for the reader, since my tongue is tied. According to The Economist’s Technology Quarterly,

The average installed storage capacity of a Fortune 1000 company grew from 198 terabytes in early 2005 to 680 terabytes in October 2006, according to figures from Deloitte, a consultancy.2

What was different about the Fortune 1000 in early 2005 and October 2006? Why might this claim by incredibly misleading?