danvk.org » web

Finding Pictures in Pictures

danvk — Sun, 10 Feb 2013 06:11:42 +0000

Over the past month, I’ve been working with imagery from the NYPL’s Milstein Collection. Astute readers may have some guesses why. The images look something like this one:

There are two photos in this picture! They’re on cards set against a brown background. Other pictures in the Milstein gallery have one or three photos, with or without a white border:

To make something akin to OldSF, I’d need to write a program to find and extract each of the photos embedded in these pictures. It’s incredibly easy for our eyes to pick out the embedded photos, but this is deceptive. We’re really good at this sort of thing! Teaching a computer to do makes you realize how non-trivial the problem is.

I started by converting the images to grayscale and running edge detection:

The white lines indicate places where there was an “edge” in the original image. It’s an impressive effect—almost like you hired someone to sketch the image. The details on the stoops are particularly cool:

The interesting bit for us isn’t the lines inside the photo so much as the white box around it. Running an edge detection algorithm brings it into stark relief. There are a number of image processing algorithms to detect lines, for example the Hough Transform or scipy’s probabilistic_hough. I’ve never been able to get these to work, however, and this ultimately proved to be a dead end.

A simple algorithm often works much better than high-powered computer vision algorithms like edge detection and the Hough Transform. In this case, I realized that there was, in fact, a much simpler way to do things.

The images are always on brown paper. So why not find the brown paper and call everything else the photos? To do this, I found the median color in each image, blurred it and called everything within an RMSE of 20 “brown”. I colored the brown pixels black and the non-brown pixels white. This left me with an image like this:

Now this is progress! The rectangles stand out clearly. Now it’s a matter of teaching the computer to find them.

To do this, I used the following algorithm:

Pick a random white pixel, (x, y) (statistically, this is likely to be in a photo)
Call this a 1×1 rectangle.
Extend the rectangle out in all directions, so long as you keep adding new white pixels.
If this rectangle is larger than 100×100, record it as a photo.
Color the rectangle black.
If <90% of the image is black, go back to step 1.

Eventually this should find all the photos. Here are the results on the original photo from the top of the post:

The red rectangles are those found by the algorithm. This has a few nice properties:

It naturally generalizes to images with 1, 2, 3, 4, etc. photos.
It still works well when the photos are slightly rotated.
It works for any background color (lighting conditions vary for each image).

There’s still some tweaking to do, but I’m really happy with how this algorithm has performed! You can find the source code here.

OldSF Buzz!

danvk — Mon, 28 Jan 2013 04:21:10 +0000

After its recent update, OldSF got an unexpected surge of traffic after appearing on Hacker News and reddit San Francisco.

Traffic peaked at 14,000 visitors the day it hit Hacker News, then trailed off:

I particularly enjoyed the new “Live Analytics” feature on Google Analytics, which shows you who’s on your site right now:

In the end, 40% of our traffic came from Facebook, 39% from Hacker News, 10% from reddit, 7.5% from Twitter and 3% from Google+. In other words, things started on Hacker News but wound up spreading through other social media.

The traffic spike was exciting, but also a bit sad. OldSF is fundamentally a read-only site, which makes it hard to keep people coming back. Raven and I did some brainstorming and decided to start tweeting “pictures of the day” on @Old_SF. Please follow us!

Developing the OldSF Slideshow

danvk — Tue, 22 Jan 2013 02:21:10 +0000

If you head over to oldsf.org, you’ll find a sleek new UI and a brand new slideshow feature. Here’s the before/after:

Locations like the Sutro Baths can have hundreds of photos. The slideshow lets you flip through them quickly.

As so often happens, what looked simple at first became more and more complex as I implemented it. Here’s how that process went for the OldSF update.

It started with Raven’s mock of the feature:

I started by looking for a JavaScript library that could do most of the heavy lifting for me. Raven’s mock shows a single big image in the center with bits of the previous and next images visible on either side. After finding lots of “slideshow” libraries that weren’t quite right, I realized that what I really wanted was called a “carousel”, not a “slideshow”. After making this conceptual breakthrough, I quickly settled on jCarousel.

The slideshow/carousel was barely in place before I ran into a new problem: the images kept shifting out of place! The issue is that jCarousel lays out all the images in your slideshow in a big long line, like so:

Most of the images are off-screen. To change the “active” image, jCarousel slides the whole strip to the left or right. To save bandwidth, I try not to load images that you’ll never see. An image only gets loaded when it appears on the screen. Before it loaded, I had no idea what its width was. The long strip of images really looked like this:

If an image turned out to be wider than expected, then the browser would push all the later images farther to the right, like so:

This wreaked havoc on the carousel’s layout. It could budge the center image all the way off the screen!

At first, I told jCarousel to redo its layout whenever a new image loaded. This mostly prevented the budging, but it had a nasty side effect. Images typically get loaded when you scroll through the slideshow. This scrolling is animated. But if jCarousel redid the layout, then the animation would suddenly stop and the motion would look very janky. My first thought was to prevent the relayout during animations, and this is what I did for our initial “launch”. But a few days later, Raven told me that she wasn’t consistently seeing the correct image when she copy/pasted links to the slideshow. The layout issues weren’t gone!

The source of all this complexity was the images in the slideshow with unknown widths. So the cleanest solution was to make them known! I added image width and height to the database and propagated them through to the client. This meant that the layout never had to change when an image loaded. Using the same schematic as above, the carousel looked like this:

With the layout fixed, all the hacks I’d written melted away and the slideshow links turned rock solid.

Another surprisingly tricky part involved moving the Google Maps navigation controls. In our new UI, the map uses the full window. The logo, date range selector and right-hand panel “float” above the map. When I first implemented it, I saw the mess you see to the left.

Uh-oh! I assumed this would be easy to fix—just shove the Google Maps navigation down a bit. But Google Maps doesn’t expose any CSS classes for its controls, so this turns out to be quite tricky. With some help from the API docs and this StackOverflow question I learned that the only way to do this is to create a small, invisible custom maps control which shoves all the other controls out of the way. Sheesh. The invisible control is outlined in this image:

Another fun issue came up with Street View. Having Street View on OldSF is quite useful, since it lets you do “now and then” comparisons. But when we went to the full-screen map layout, we ran into this annoyance:

That “x” button in the top right corner is the only way to get out of street view, and it’s covered by the right-hand panel. You’re stuck! The solution here was to find the events corresponding to entering and leaving Street View. When you go into Street View, we hide our UI elements. When you leave, we show them again. Raven has suggested that it’s nice to still see the images, so in the future I may just shove the right-hand panel down a bit or provide my own exit button.

Those were three of the most interesting issues I ran into while creating this new feature. There were many, many more. Nothing is ever so simple as it seems!

Updated Online Boggle Solver

danvk — Tue, 03 Jul 2012 17:42:45 +0000

See that Online Boggle Solver link on the right? Bet you’ve never clicked it!

I built the online solver back in 2006, mostly just for fun. In the six years since, I’ve wound up using it almost exclusively on my phone during real-life Boggle games. The iPhone didn’t exist when I built that page, and it is a device for which its fancy design (with hover effects and fixed position elements) is very ill-suited. Minimal designs tend to work better on Mobile, and I’ve just pushed out a new version for phones which is very minimal indeed.

The code behind this boggle solver differs from my other boggle code in that it’s designed for low latency rather than high throughput. This is precisely what you want from a web solver. You’re only finding the words on a single board, so time spent loading the word list dominates time spent finding words on the board. The current version uses a custom dictionary which can be MMAPped. This winds up being about 3x faster than reading a dictionary text file and building a Trie on every invocation.

I haven’t worked on Boggle in the past three years, so the state of the art in finding the highest-scoring boards is still where it was when I made my last blog post about breaking 3×3 Boggle. The interesting development then was that you could get something like a 1000x speedup by upper-bounding whole classes of boards rather than scoring each board individually. This is a big enough speed boost to make 3×3 Boggle “solvable” in a day on a desktop computer, but not enough to make 3×4 or 4×4 solvable, even on a big cluster of computers. I was optimistic that memoization could give some more big wins, but when those failed to materialize I got frustrated and stopped working on the project.

puzzle+: Crosswords for Google+

danvk — Thu, 17 May 2012 15:48:12 +0000

To solve a crossword with your friends in Google+, click this giant hangout button:

You’ll see something like this:

Click “Hang out” to invite everyone in your circles to help you with the puzzle. If you want to collaborate with just one or two people, click the “x” on “Your Circles” and then click your friend’s names on the right.

You’ll be prompted to either upload a .puz file or play one of the built-in Onion puzzles. You can get a free puzzle from the New York Times by clicking “Play in Across Lite” on this page.

With the puzzle downloaded, drag it into the drop area:

And now you’re off to the races! The big win of doing this in a Google+ hangout is that you get to video chat with your collaborators while you’re solving the puzzle, just like you would in person!

Astute readers will note that puzzle+ is a revival of lmnopuz for Google Shared Spaces, which was a revival of lmnowave (Crosswords for Google Wave), which was in turn a revival of Evan Martin and Dan Erat‘s standalone lmnopuz. Hopefully the Google+ Hangouts API will be more long-lived than its predecessors.

Horizontal and Vertical Centering with CSS

danvk — Mon, 14 May 2012 17:04:47 +0000

I recently wanted to center some content both vertically and horizontally on a web page. I did not know in advance how large the content was, and I wanted it to work for any size browser window.

These two articles have everything you need to know about horizontal centering and vertical centering.

The two articles don’t actually combine the techniques, so I’ll do that here.

In the bad old days before CSS, you might accomplish this with tables:


  
    
      Content goes here

Simple enough! In the wonderful world of HTML5, you do the same thing by turning divs into tables using CSS. You need no fewer than three divs to pull this off:


  
    
      Content goes here

And here’s the CSS:

.container {
  display: table;
  width: 100%;
  height: 100%;
}
.middle {
  display: table-cell;
  vertical-align: middle;
}
.inner {
  display: table;
  margin: 0 auto;
}

A few comments on why this works:

You can only apply vertical-align: middle to an element with display: table-cell. (Hence .middle)
You can only apply display: table-cell to an element inside of another element with display: table. (Hence .container)
Elements with display: block have 100% width by default. Setting display: table has the side effect of shrinking the div to fit its content, while still keeping it a block-level element. This, in turn, enables the margin: 0 auto trick. (Hence .inner)

I believe all three of these divs are genuinely necessary. For the common case that you want to center elements on the entire screen, you can make .container the body tag to get rid of one div.

In the future, this will get slightly easier with display: flexbox, a box model which makes infinitely more sense for layout than the existing CSS model. You can read about how do to horizontal and vertical centering using flexbox here.

Crosscountry Crosswords

danvk — Sun, 27 Mar 2011 21:07:05 +0000

It’s been almost a year since I introduced lmnowave, the collaborative crossword puzzle gadget for Google Wave. A lot has happened in that past year, not least the cancelation of Wave.

First, to clear up some confusion. It’s not “I’m no wave”, it’s “L-M-N-O-Wave”, which is a play on “L-M-N-O-Puz”, aka lmnopuz, the software on which my collaborative crossword system is based. Only a few dozen people ever saw lmnopuz, so no one got the joke. And I realized after releasing it that, by changing ‘puz’ -> ‘wave’, I’d taken away any hint of what my wave gadget actually did. A bad name. Oh well.

In August, Google announced that Wave was canceled. This seemed to be the end of lmnowave. Sure, Wave was still usable. But the life had been sucked out of the project. This was quite disappointing to me, since I’d spent a fair bit of my own time developing the crossword gadget.

Then, in mid-December, Douwe Osinga introduced the oddly-named Google Shared Spaces. It’s an attempt to salvage the Wave gadget code, to let it live outside of Wave.

For lmnopuz, it’s perfect. Here’s the lmnowave shared space. You can use it to collaborate on crosswords with your friends, just like you could with lmnowave. In some ways, it’s even better, since the Wave UI is stripped away and you can focus on your puzzle. To do crosscountry crosswords, my friend and I open up a shared space and call each other on Skype. The combination works really well.

What does the future hold for lmnowave? It’s a bit unclear. I may turn it into a Facebook game, or perhaps use it to learn how to write applications for the Mac App store.

Enjoy!

Commacopy

danvk — Wed, 09 Mar 2011 15:16:08 +0000

At work, I often see web pages that display large numbers like so:

num-bytes	1,234,567,890
num-entries	123,456,789

Including the commas in the display makes the numbers easier to read. But it does have a downside. Say you want to calculate the average number of bytes per entry. If you copy/paste the numbers above, the commas will prevent most programming languages (e.g. python or bc) from interpreting them correctly.

My coworker Dan came up with a great solution to this conundrum using CSS. Try copy/pasting these numbers over into the text box:

1234 or 2345
-12345.67
-123456789

The commas don’t copy! Best of both worlds!

You can view source to see how it works, but let’s jump straight to the goodies:

Bookmarklet: commacopy

Unobtrusive JavaScript: commacopy.js

To use the bookmarklet, drag it to your browser’s bookmarks toolbar. If you click it, it will silently convert all numbers containing commas on the current page to the fancy copy/pasteable commas. This should really be a Chrome extension that runs on every page, but I’ll leave that as an exercise for the reader.

To use the unobtrusive JS, make a copy of commacopy.js and include it in your page via: