A problem came up at work yesterday: I was creating a web page that received 64-bit hex numbers from one API. But it needed to pass them off to another API that expected decimal numbers.
parseInt("1234abcd", 16) = 305441741
(305441741).toString(16) = "1234abcd"
Unfortunately, for larger numbers, there’s a big problem lurking:
parseInt("123456789abcdef", 16) = 81985529216486900
(81985529216486900).toString(16) = "123456789abcdf0"
The last two digits are wrong. Why did these functions stop being inverses of one another?
(Math.pow(2, 53) + 1) - 1 = 9007199254740991
That ends with a 1, so whatever it is, it’s certainly not a power of 2. (It’s off by one).
To solve this problem, I wrote some very simple hex <-> decimal conversion functions which use arbitrary precision arithmetic. In particular, these will work for 64-bit numbers or 128-bit numbers. The code is only about 65 lines, so it’s much more lightweight than a full-fledged library for arbitrary precision arithmetic.
The algorithm is pretty cool. You can see a demo, read an explanation and get the code here:
I recently built a version of the CDC’s Vital Statistics database for Google’s BigQuery service. You can read more in my post on the Google Research Blog.
The Natality data set is one of the most fascinating I’ve ever worked with. It is an electronic record which goes back to 1969. Every single one of the 68 million rows in it represents a live human birth. I can’t imagine any other data set which was more… laborious… to create. :)
But beyond the data itself, the processes surrounding it also tell a fascinating story. The yearly user guides are a tour-de-force in how publishing has changed in the last forty years. The early manuals were clearly written on typewriters. To make a table, you spaced things out right, then used a ruler and a pen to draw in the lines. Desktop publishing is so easy now that it’s easy to forget how much standards have improved in the last few decades.
They’ve had to balance the statistical benefits of gathering a uniform data set year after year with a need to track a society which has evolved considerably. In 1969, your race was either “Black”, “White” or “Other”. There was a question about whether the child was “legitimate”. There were no questions about alcohol, smoking or drug use. And there was no attempt to protect privacy — most of these early records contain enough information to uniquely identify individuals (though doing so is a federal crime).
I included four example analyses on the BigQuery site. I’ll include one more here: it’s a chart of the twin rate over thirty years as a function of age.
A few takeaways from this chart:
- The twin rate is clearly a function of age.
- It used to be that older women were less likely to have twins.
- Starting around 1994, this pattern reversed itself (likely due to IVF).
- The y-axis is on a log scale, so this effect is truly dramatic.
- There has been an overall increase in the twin rate in the last thirty years.
- This increase spans all ages.
The increase in twin rate is often attributed to IVF, but the last two points indicate that this isn’t the whole story. IVF clearly has a huge effect on the twin rate for older (40+) women, but it can’t explain the increase for younger women. A 21-year old mother was 40% more likely to have twins in 2002 than she was in 1971.
My guess is that this is ultimately because of improved neonatal care. Twins pregnancies are more likely to have complications, and these are less likely to lead to miscarriages than in the past. If this interpretation is correct, then there were just as many 21-year olds pregnant with twins forty years ago. It’s just that this led to fewer births.
Chart credits: dygraphs and jQuery UI Slider.