04.20.13
Posted in Uncategorized at 11:26 am by danvk
As you may remember from a previous post, I’ve been doing some work with a collection of old images. The first problem was to write a program to find the individual photos in images like these:
This is easy for humans, but hard for a machine!
A key part of developing any heuristic algorithm like this one is to get some training data. You find the correct answer by hand for some fraction of the data, then judge your program by seeing how its results compare to the “golden” data.
For my photo detection project, the golden data might look like this:
image_url,rects
images/700005f.jpg,[{x1:959,x2:3137,y1:533,y2:1939}]
images/700049f.jpg,[{x1:863,x2:3033,y1:571,y2:1983}]
images/700079f.jpg,[{x1:829,x2:2987,y1:457,y2:1852}]
images/700256f.jpg,[{x1:837,x2:3002,y1:536,y2:1927}]
images/700284f.jpg,[{x1:919,x2:2303,y1:845,y2:2956}]
images/700288f.jpg,[{x1:1140,x2:3290,y1:545,y2:1923},{x1:1157,x2:3313,y1:2286,y2:3659}]
You could generate this sort of data by hand using a photo inspector and a text editor. But it would be tremendously tedious. You wouldn’t want to do this for 100 images unless you were being paid. For a personal project, it’s a non-starter.
A little bit of usability work here can go a long way. For this project, I built a simple web tool using my localturk service. It’s a tool which helps you step through repetitive tasks using a web browser. It exposes the exact same API as Amazon’s Mechanical Turk: CSV input + HTML Template → CSV output. But it runs on your own machine and you do the work yourself. No external turkers or exchange of money involved.
My tool shows the original image on the page and asks you to drag rectangles across the individual photos. You can resize or move the rectangles after you draw them:

Click the image to try this tool in your browser.
localturk records the responses in a a CSV output file which you can use as your golden data. The whole process is done visually in your browser.
I’d estimate I spent maybe an hour creating that template and half an hour stepping through the 100 photos. This may or may not compare favorably to the photo inspector and text editor process I described above, but it was certainly more enjoyable.
In his Machine Learning class, Andrew Ng says that you should ask “how hard would it be to get 10x more training data?” With this fancier system, it would take a few hours. Or I could upload 1,000 tasks to Mechanical Turk and trade time for money.
How do other ML people generate small amounts of training data?
Permalink
03.27.13
Posted in Uncategorized at 8:28 pm by danvk
My group recently launched a custom UI for March Madness searches:

The Sweet Sixteen view looks particularly nice on tablets, where you get high resolution team logos and crisp text for the team names. While games are being played, you can follow the scores live in the bracket.
The launch was an interesting experience. I wrote a tweet shortly after we went live. It immediately got picked up by TechCrunch, Search Engine Land and Fred Wilson.
There are some slightly wild CSS tricks going on to mirror the bracket on the right-hand side and to make substitute team abbreviations when their full names won’t fit. Fodder for a future danvk.org post!
Permalink
03.03.13
Posted in Uncategorized at 11:25 am by danvk
I sometimes see code like this to generate DOM structures in JavaScript:
var div = document.createElement("div");
div.innerHTML = "<div id='foo' onclick='document.getElementById(\"foo\").style.display=\"none\";' style='position: absolute; top: 10px; left: 10px;'>" + content + "</div>";
document.body.appendChild(div);
This style gets very confusing very quickly. The issue is that it uses JavaScript to write HTML, CSS and even JavaScript inside HTML!
The key to untangling these knots is to remember this rule:
Keep your HTML in your HTML, your JavaScript in your JavaScript and your CSS in your CSS.
Here’s how I’d rewrite that snippet with this in mind:
HTML:
<div id='foo-template' class='foo' style='display:none;'>
</div>
CSS:
.foo {
position: absolute;
top: 10px;
left: 10px;
}
JavaScript:
var $foo = $('#foo-template').clone().removeAttr('id');
$foo
.on('click', function() { $(this).hide(); })
.text(content)
.appendTo(document.body)
.show();
The idea is that JavaScript is a really terrible language for building DOM structures. You can use either the innerHTML technique (in which case you run into quoting issues) or the DOM manipulation APIs (which are quite cumbersome and verbose).
HTML is a great way to define DOM structures! So define your DOM structures there, even the ones that you’ll add dynamically. You can use jQuery’s clone method to make copies of them that you fill out before adding them to the page.
CSS is also a great language for defining styles, so why put them inline in your HTML? Just use a class and move the styles into your CSS.
And really, do you want to write JavaScript that writes HTML that includes JavaScript?
You can see a real-world example of this technique in OldSF’s HTML, JS and CSS.
Permalink
02.23.13
Posted in Uncategorized at 1:34 pm by danvk
According to Wikipedia, there are twelve living people born in the 1800s:
| # |
Name |
Sex |
Birth date |
Age |
Residence |
| 1 |
Jiroemon Kimura |
M |
1897 April 19 |
115y 310d |
Japan |
| 2 |
Misao Okawa |
F |
1898 March 5 |
114y 355d |
Japan |
| 3 |
Maria Redaelli-Granoli |
F |
1899 April 3 |
113y 326d |
Italy |
| 4 |
Elsie Thompson |
F |
1899 April 5 |
113y 324d |
United States |
| 5 |
Jeralean Talley |
F |
1899 May 23 |
113y 276d |
United States |
| 6 |
Susannah Jones |
F |
1899 July 6 |
113y 232d |
United States |
| 7 |
Bernice Madigan |
F |
1899 July 24 |
113y 214d |
United States |
| 8 |
Soledad Mexia |
F |
1899 August 13 |
113y 194d |
United States |
| 9 |
Evelyn Kozak |
F |
1899 August 14 |
113y 193d |
United States |
| 10 |
Mitsue Nagasaki |
F |
1899 Sept. 18 |
113y 158d |
Japan |
| 11 |
Emma Morano-Martinuzzi |
F |
1899 Nov. 29 |
113y 86d |
Italy |
| 12 |
Grace Jones |
F |
1899 Dec. 7 |
113y 78d |
United Kingdom |
These are the verified people, which means that the Gerontology Research Group has validated at least three documents mentioning their date of birth. Wikipedia lists at least 50 others whose claims do not meet this stringent standard.
So how long will it be until we can completely close the door on the 19th century? Wikipedia gives the odds of surviving your 114th and 115th years as about 30%, in which case we’d expect the last survivor to die in the next three years. On the other hand, if Grace Jones turns out to be another Jeanne Calment, then we may have to wait another ten!
Permalink
02.22.13
Posted in Uncategorized at 7:45 am by danvk
I recently saw this map on the delightful “MapPorn” subreddit:

Please click through to the full image, it’s huge!
What I love about this map is that insights spring right out of it. A few that came to my mind:
- The old world (Asia & Europe) is still more heavily populated than the new.
- Population in the new world tends to be more clustered around cities (and roads) than in Europe or Asia.
- Population density in Russia goes much farther east than I’d realized.
- Moscow is much farther east than any other European city.
- All of Egypt’s population lives along the Nile.
- France and Spain are far more centered around their cities than Germany.
- Population density in the United States drops off sharply around the 100th meridian.
- There are no “empty” spots in India or Eastern China.
- Southwest Africa is quite empty.
- Deserts suck. So does tundra.
Do you see anything when you look at the map? The data comes from the Gridded Population of the World project.
Permalink
« Previous entries
Next Page »