danvk.org » Uncategorized

Moving to GitHub Pages

danvk — Thu, 09 Oct 2014 15:59:51 +0000

After eight years of blogging with WordPress, it’s time to ditch this 11-year old technology in favor of the merely 6-year old GitHub pages.

Visiting danvk.org will take you to the new site, from which you can find the new danvk blog. For RSS subscribers, you can find the new feed here. Head on over and leave a comment!

As part of the move, I read through all eight years of blog posts. This was a real trip down memory lane for me. The blog started as something of a personal journal when I moved to California, but turned more tech-focused and sparse as I started to make more friends in my new home. I pulled out some highlights from the last eight years here.

Why do this? Blogging with WordPress feels heavy-weight and inflexible compared to using Markdown and my existing git workflows. And I’m very excited about the idea of not hosting my own site—danvk.org has always felt quite slow to load, but I’ve never quite been sure why.

Reading OSM data in C++

danvk — Sun, 17 Aug 2014 17:50:47 +0000

I’m interested in using OpenStreetMap data to add lots more shapes to Comparea. There are far too many polygons in OSM to include everything, so you have to filter to “interesting” ones. That’s a hard concept to make precise! One idea is to say that any feature with an associated Wikipedia article is interesting.

To make a list of such features, I started with the planet.osm file, which you can download as a Torrent. This file is in “PBF” format, an OSM-specific format based on Google’s Protocol Buffers. I tried to filter down to just the features with Wikipedia tags using GDAL’s ogr2ogr tool (which supports PBF format), but had no luck.

Instead, I wrote my own filter using C++. This was much easier than you might expect and, since the planet.osm.pbf file is 25GB and growing, probably worth the effort.

I used libosmpbfreader, which depends on libosmpbf, which in turn depends on protoc. On Mac OS X, this was what my install sequence looked like:

brew install protobuf git clone https://github.com/scrosby/OSM-binary.git cd OSM-binary make -C src make -C src install cd .. git clone https://github.com/CanalTP/libosmpbfreader.git cd libosmpbfreader make ./example_counter planet.osm.pbf

Running the example_counter binary over planet.osm.pbf, I was able to read something like 2GB/minute of data, so 12-13 minutes for the full file. The wikipedia filtering code ran in ~30 minutes. Here's my full code if you're interested. There were 117,211 ways with Wikipedia tags and 156,064 relations.

Introducing Comparea

danvk — Wed, 13 Aug 2014 18:33:16 +0000

Comparea is a tool that lets you Comparea Areas. It lets you answer questions like “how big is Greenland, really?” or “how large would Alaska be if it were in the contiguous US?”

Comparea projects the two geographic features using equal area projections with the same scale but different centers. This results in a valid comparison of their areas with minimal distortion for each shape. You can read more on Comparea’s about page or study this nifty diagram:

It works great on both desktop and touch devices, where you can drag the two shapes with your fingers or pinch to zoom.

Like most of my side projects, this one was developed on and off over a long period of time. I first worked on the project in spring 2012, discovered Natural Earth Data and quickly made a purely client-side demo. The response was generally positive, but I didn’t release it because of general UI jankiness (the transitions never quite worked), the lack of a backend and the feeling that I didn’t have enough polygons to make it fun.

About a year later (in early 2013) I started playing with Open Street Maps data. There are tons of great polygons in their data dumps, including smaller features like Golden Gate Park or Central Park. I spent a few weeks playing around with this data set, but ultimately could’t come up with a good way of deciding which shapes were notable enough to include (there are millions, and not every one block park is notable enough for inclusion).

I’d shown the Comparea demo to enough people that I was convinced it was worth publishing. (One notable demo session was with my six year old nephew, who rattled off country comparisons to me for nearly an hour!) When I got some time off between jobs, I decided to make releasing Comparea an explicit goal.

In addition to the finished product, the real value of any side project is what you learn from doing it. Here were some highlights:

Flask This was my third Flask project (after gitcritic and webdiff) and I finally knew enough to organize it correctly.
The whole hosting stack was new to me. This was my first Heroku project and my first time using CloudFlare for distribution. I’m a big fan of both. Developing using Flask and Heroku is far more lightweight and flexible than using AppEngine, which was my tool of choice in the past.
I rewrote the UI using D3. Its projection and behavior tools were godsends. These both made lots of gross SVG manipulation code from my initial demo melt away.
I debugged Natural Earth Data shapes using IPython Notebooks. Nothing beats a visualization for spotting outlying islands, which could dramatically affect a feature’s bounding box.

One of the main bits of feedback I got on the initial Comparea demo was that it would be helpful to show the area of each feature somewhere in the UI, so that you could compare numerically in addition to visually. I expanded this request to include populations and descriptions for each place as well. Sourcing and verifying all this data wound up being one of the most challenging pieces of the project. For example: no one can quite agree on what’s Morocco and what’s Western Sahara. The areas and populations you compute will vary depending on where you draw the line, but sources don’t always say what their line is! To make sure I wasn’t doing anything too unreasonable, I calculated the areas of my polygons and then compared this value to the stated area from an official source. I iterated until they were all within about 10% of one another.

What’s next? I’d like to add more fine-grained shapes to allow comparisons like San Francisco vs. NYC or Golden Gate Park vs. Central Park. And I’d like to add a mode where you can move one shape around a complete map of another area. This would be helpful for comparing San Francisco to the entire NYC area, for example.

But for now, I’m happy to finally get this project launched. Enjoy!

Introducing git webdiff

danvk — Thu, 03 Jul 2014 17:01:23 +0000

After leaving Google and working in the open source ecosystem for the past few months, it’s become increasingly clear to me which pieces of Google’s infrastructure are ahead of the curve and which are not. One piece that’s clearly ahead is Google’s code review tool.

Google’s original code review tool was Guido van Rossum’s Mondrian, which he eventually open sourced as Rietveld, a project which was in turn forked into Gerrit. Mondrian has since been replaced by a newer system at Google but, to my knowledge, this new system has never been publicly discussed.

These code review tools all excel at showing side-by-side diffs. The difference between inline and side-by-side (two column) diffs is so dramatic that I refuse to waste mental energy trying to grok the inline diffs that github insists on showing.

There are a few ways to get two column diffs out of git (“git difftool”), but they all have their problems:

Many diff tools (e.g. tkdiff) are XWindows programs are clearly out of place in Mac OS X. They often don’t work well with the app switcher and or show high resolution (“retina”) type.
Most diff tools want to operate on pairs of files. tkdiff and p4merge show you one file at a time in isolation. Once you advance past a file, you can’t go back. I like to flip back and forth through files when viewing diffs.
They typically do not support syntax highlighting.

There are certainly diff tools for Mac OS X that do all these things well, but they tend to be commercial.

Enter “git webdiff“, my newly-released side project which aims to improve this situation.

Any time you’d run “git diff”, you can run “git webdiff” instead to get a nice, two-column diff in your browser, complete with syntax highlighting. Here’s what it looks like:

When you run “git webdiff”, your browser will pop open a tab with this UI. When you’re done, you close the tab and the diff command will terminate. It works quite well any time you have an even remotely complex diff or merge to work through.

You can install it with:

pip install webdiff

Please give it a try and let me know what you think!

Google’s New Finance Onebox

danvk — Sat, 14 Jun 2014 21:59:24 +0000

The last project I worked on at Google recently launched: a new and improved Finance Onebox.

You trigger the feature by searching for a stock ticker, e.g. AAPL, GOOG or .INX:

For comparison, here’s what it used to look like:

The main new features are:

A larger chart
A cleaner, more modern design
More relevant attributes (P/E ratio, Dividend Yield)
After-hours trading on the stock chart
Interactivity

The interactivity comes in a few forms. First, you can click the tabs on the top to zoom out to 5 day, 1 month, 3 month, 1 year, 5 year or Max views.

Next, you can mouse over the chart to see the price at any point. On touch devices, tap/swipe with one finger:

But wait, there’s more! What if you see a change on the chart and want to know how large it was? You can click and drag across that time range to see a delta. On touch devices, you trigger this by putting two fingers on the chart:

This is particularly useful for longer-range charts, where it lets you easily answer questions like “how much did the S&P 500 drop from 2008–2009?”

A deep dive into the Krubera Cave

danvk — Sat, 29 Mar 2014 21:40:41 +0000

After seeing this image posted on reddit last week, I took a deep dive into the strange world of extreme caving.

This image is big! Click through to see the whole thing.

The Krubera Cave is the deepest in the world, descending 2,197 meters from its inconspicuous entrance to its deepest explored areas. It’s located in Abkhazia, a breakaway territory in the Republic of Georgia. In some ways, caving is an even more extreme activity than high altitude climbing. Descending to the bottom of Krubera takes a team of dozens of people over a month, a month during which they’ll never see the sun.

One question I had was “why does an expedition end?” I got some answers from this amazing documentary about a 2003 expedition. Their goal was to explore this siphon at 1440m below the surface:

Previous expeditions had explored in other directions, because they didn’t have the necessary scuba equipment to explore past the siphon (carrying heavy equipment down 1400+ meters is challenging). The hope was that, if they dove through the water, there would be a dry “continuation” on the other side.

And was there ever! You can see the start of the continuation they discovered in green in the image. It continues all the way to 2,197m, the lowest known point in the cave:

A siphon is one thing that can stop an expedition. In the case of the 2003 expedition, they continued exploring until they literally ran out of ropes and carabiners. These seem to be the two limiting factors for exploratory spelunking: water and equipment. Why not just bring more equipment? Because rope is heavy, especially when it’s waterlogged (as it’s sure to be in the humid environment of caves).

Here are a few photos from the cave:

I’d highly, highly recommend watching the Russian documentary about the Krubera cave if you’re at all interested in this sort of thing. I don’t think caving is a hobby for me, but I’ll read news about caves with much greater interest now that I’ve watched this docu.

Fact Comparisons

danvk — Mon, 24 Feb 2014 23:30:01 +0000

This past fall, my group launched a new feature on Google Search that we call “Fact Comparisons”. It triggers for many numeric fact queries, for example distance from the sun to mars:

The idea is that, by showing you answers to related questions (“how far is jupiter from the sun?”), we can help you contextualize the answer to your original question. The number “141,600,000 miles” is hard to fathom, but it makes more sense when you see that it’s between Earth (92.96M mi) and Jupiter (483.8M mi).

If you click on one of the related images, you’ll launch into a Carousel filled with related facts:

Most numeric facts will trigger this feature. The most popular numeric facts are people’s ages and heights. A strip of famous people’s ages is pretty interesting:

A few other fun ones:

Vernor Vinge on Video Calls

danvk — Mon, 10 Feb 2014 20:55:44 +0000

I’ve referenced an anecdote from Vernor Vinge’s A Fire upon the Deep several times during video calls in the last few weeks and thought I’d share it here.

The novel is a classic space opera. Two ships with infinitely powerful computers are in communication over a very narrow channel. Rather than send pixelated images to one another, the computers go to extremes to make the best use of their limited bandwidth.

Fleet Central refused the full video link coming from the Out of Band … Kjet had to settle for a combat link: The screen showed a color image with high resolution. Looking at it carefully, one realized the thing was a poor evocation…. Kjet recognized Owner Limmende and Jan Skrits, her chief of staff, but they looked several years out of style: old video matched with the transmitted animation cues. The actual communication channel was less than four thousand bits per second

Precious bits aren’t wasted on low-level features like pixels. Rather, they’re used to transmit information about the participants and “animation cues”. The computer on the receiving ship creates the most realistic video it can using those cues and the imagery that it has on file.

The picture was crisp and clear, but when the figures moved it was with cartoonlike awkwardness. And some of the faces belonged to people Kjet knew had been transferred before the fall of Sjandra Kei. The processors here on the Ølvira were taking the narrowband signal from Fleet Central, fleshing it out with detailed (and out of date) background and evoking the image shown. No more evocations after this, Svensndot promised himself, at least while we’re down here.

Vernor Vinge calls it an evocation but we’d probably call it an avatar. Even with all its power, their computer has trouble reproducing motion and is constrained by its out-of-date database.

“Strange,” interrupted Pham. “The pictures were strange.” His tone was drifty.

“You mean our relay from Fleet Central?” Svensndot explained about the narrow bandwidth and the crummy performance of his ship’s processors…

“And so their picture of us must have been equally bad… I wonder what they thought I was?”

“Unh…” Good question. … “… wait a minute. That’s not how evocations work. I’m sure they got a pretty clear view of you. See, a few high-resolution pics would get sent at the beginning of the session. Then those would be used as the base for the animation.”

This is more or less how a digital video is encoded: there are a few high-resolution key frames followed by information about how the pixels change in each successive animation frame. Vinge’s computers do something similar. But instead of encoding how the pixels change, they encode at a higher level of abstraction. Presumably they’re recording that a hand moved or that a particular word was said, and the computer at the other end will do its best to update the keyframe to reflect this.

In a world with limited bandwidth but infinite computing power, this is how things should work. Next time you’re in a video call and the image drops out or becomes grainy, think about how much better it would be if your counterpart turned into a cartoon instead!

Statistics Knowledge Panel

danvk — Sun, 18 Aug 2013 15:57:29 +0000

A few months ago my group at work launched the Statistics Knowledge Panel, which shows you an interactive chart when you search for a public data statistic on Google:

The feature uses query refinements to anticipate other statistics you might be interested in (e.g. “population india” → “population china”). These put the original statistic into context. We got a shout out from Amit Singhal at Google I/O when we launched!

Here are a few fun queries you can try:

The charts work great on mobile and tablet, too!

dygraphs 1.0.0

danvk — Thu, 15 Aug 2013 12:39:13 +0000

Six years ago I created dygraphs, an interactive JavaScript charting library. Four years ago, I open sourced it. Yesterday, we officially released version 1.0.0.

The project continues to grow, sometimes seemingly in spite of itself, gaining new users and contributors. Robert’s blog post describes the reasons for doing versioned releases. Personally, I’m quite excited about having the freedom to change core behaviors without worrying about upsetting users!

The key features of dygraphs are, and remain, the ease of quickly creating a chart and the ability it gives you to explore large data sets. The canonical chart is a comparison of temperatures for three years in SF and NYC. You can zoom in, see individual data points and change the rolling average: