<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>danvk.org</title>
	<atom:link href="http://www.danvk.org/wp/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.danvk.org/wp</link>
	<description>Keepin' static like wool fabric since 2006</description>
	<lastBuildDate>Mon, 14 May 2012 17:04:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Horizontal and Vertical Centering with CSS</title>
		<link>http://www.danvk.org/wp/2012-05-14/horizontal-and-vertical-centering-with-css/</link>
		<comments>http://www.danvk.org/wp/2012-05-14/horizontal-and-vertical-centering-with-css/#comments</comments>
		<pubDate>Mon, 14 May 2012 17:04:47 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=854</guid>
		<description><![CDATA[I recently center some content both vertically and horizontally on a web page. I did not know in advance how large the content was, and I wanted it to work for any size browser window. These two articles have everything you need to know about horizontal centering and vertical centering. The two articles don&#8217;t actually [...]]]></description>
			<content:encoded><![CDATA[<p>I recently center some content both vertically and horizontally on a web page. I did not know in advance how large the content was, and I wanted it to work for any size browser window.</p>
<p>These two articles have everything you need to know about <a href="http://haslayout.net/css-tuts/Horizontal-Centering">horizontal centering</a> and <a href="http://haslayout.net/css-tuts/Vertical-Centering">vertical centering</a>.</p>
<p>The two articles don&#8217;t actually combine the techniques, so I&#8217;ll do that here.</p>
<p>In the bad old days before CSS, you might accomplish this with tables:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;table width=100% height=100%&gt;
  &lt;tr&gt;
    &lt;td valign=middle align=center&gt;
      Content goes here
    &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
</pre>
<p>Simple enough! In the wonderful world of HTML5, you do the same thing by turning <code>div</code>s into <code>table</code>s using CSS. You need no fewer than <i>three</i> divs to pull this off:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;div class=&quot;container&quot;&gt;
  &lt;div class=&quot;middle&quot;&gt;
    &lt;div class=&quot;inner&quot;&gt;
      Content goes here
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</pre>
<p>And here&#8217;s the CSS:</p>
<pre class="brush: css; title: ; notranslate">
.container {
  display: table;
  width: 100%;
  height: 100%;
}
.middle {
  display: table-cell;
  vertical-align: middle;
}
.inner {
  display: table;
  margin: 0 auto;
}
</pre>
<p>A few comments on why this works:</p>
<ul>
<li>You can only apply <code>vertical-align: middle</code> to an element with <code>display: table-cell</code>. (Hence <code>.middle</code>)
<li>You can only apply <code>display: table-cell</code> to an element inside of another element with <code>display: table</code>. (Hence <code>.container</code>)
<li>Elements with <code>display: block</code> have 100% width by default. Setting <code>display: table</code> has the side effect of shrinking the div to fit its content, while still keeping it a block-level element. This, in turn, enables the <code>margin: 0 auto</code> trick. (Hence <code>.inner</code>)
</ul>
<p>I believe all three of these divs are genuinely necessary. For the common case that you want to center elements on the entire screen, you can make <code>.container</code> the <code>body</code> tag to get rid of one div.</p>
<p>In the future, this will get slightly easier with <a href="http://www.the-haystack.com/2012/01/04/learn-you-a-flexbox/">display: flexbox</a>, a box model which makes infinitely more sense for layout than the existing CSS model. You can read about how do to horizontal and vertical centering using flexbox <a href="http://www.kirupa.com/html5/centering_vertically_horizontally.htm">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2012-05-14/horizontal-and-vertical-centering-with-css/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Sunrise/Sunset Onebox, Now in Many Languages</title>
		<link>http://www.danvk.org/wp/2012-04-10/the-sunrisesunset-onebox-now-in-many-languages/</link>
		<comments>http://www.danvk.org/wp/2012-04-10/the-sunrisesunset-onebox-now-in-many-languages/#comments</comments>
		<pubDate>Tue, 10 Apr 2012 19:14:48 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[personal]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=842</guid>
		<description><![CDATA[Nearly two years ago, I wrote about launching the Sunrise/Sunset Onebox, which tells you when the sun will rise or set in any location. You trigger it in English by search for [sunset nyc] or even just [sunset] to get times for your current location. Over the weekend, I launched the onebox in 30+ new [...]]]></description>
			<content:encoded><![CDATA[<p>Nearly two years ago, I <a href="/wp/2010-06-28/sunrisesunset-onebox/">wrote about</a> launching the <a href="http://googleblog.blogspot.com/2010/06/this-week-in-search-62710.html">Sunrise/Sunset Onebox</a>, which tells you when the sun will rise or set in any location.</p>
<p>You trigger it in English by search for [<a href="https://www.google.com/search?q=sunset+nyc">sunset nyc</a>] or even just [<a href="https://www.google.com/search?q=sunset+nyc">sunset</a>] to get times for your current location.</p>
<p>Over the weekend, I launched the onebox in 30+ new languages. It&#8217;s pretty cool to see your work in a language that you don&#8217;t understand. Here are a few examples:</p>
<p>Arabic: [<a href="http://www.google.com/search?ie=UTF-8&#038;q=%D8%BA%D8%B1%D9%88%D8%A8+%D8%A7%D9%84%D8%B4%D9%85%D8%B3+%D9%81%D9%8A+%D8%A7%D9%84%D9%85%D8%AF%D9%8A%D9%86%D8%A9+%D8%A7%D9%84%D9%85%D9%86%D9%88%D8%B1%D8%A9&#038;hl=ar">غروب الشمس في المدينة المنورة</a>] = [sunset in medina]</p>
<p style="text-align: center;" ><img src="http://www.danvk.org/wp/wp-content/uploads/2012/04/Screen-Shot-2012-04-10-at-2.56.37-PM.png" alt="Onebox triggering for [sunset in medina] in Arabic" title="sunset-medina" width="478" height="62" class="aligncenter size-full wp-image-843" /></p>
<p>Or in Vietnamese: [<a href="http://www.google.com/search?q=m%E1%BA%B7t%20tr%E1%BB%9Di%20m%E1%BB%8Dc%20H%C3%A0%20N%E1%BB%99i&#038;hl=vi">mặt trời mọc Hà Nội</a>] = [sunrise in Hanoi]</p>
<p style="text-align: center;" ><img src="http://www.danvk.org/wp/wp-content/uploads/2012/04/Screen-Shot-2012-04-10-at-3.04.40-PM.png" alt="Onebox triggering for &quot;sunset in hanoi&quot;" title="sunset-hanoi" width="524" height="79" class="aligncenter size-full wp-image-844" /></p>
<p>Or in French: [<a href="http://www.google.com/search?q=coucher+de+soleil+paris&#038;hl=fr">coucher de soleil paris</a>] = [sunset paris]</p>
<p style="text-align: center;" ><img src="http://www.danvk.org/wp/wp-content/uploads/2012/04/Screen-Shot-2012-04-10-at-3.06.09-PM.png" alt="Onebox trigger for sunset in paris" title="sunset-paris" width="472" height="64" class="aligncenter size-full wp-image-845" /></p>
<p>The translated onebox is proving particularly popular in Arabic-speaking countries, where the sunrise is important for prayer times. It will be interesting to see whether there&#8217;s a spike in Hebrew queries on Friday, when Israel observes the sabbath beginning at sundown.</p>
<p>This launch has been more of a slog than I ever would have expected, so it&#8217;s gratifying to see it out there in the wild, being used. The world&#8217;s languages are a baffling mix of left-to-right and right-to-left. Arabic gets a special shout-out here for its plural forms. It has different word endings for quantities of 1, 2-10 and 11+!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2012-04-10/the-sunrisesunset-onebox-now-in-many-languages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Happy People: A Year in the Taiga</title>
		<link>http://www.danvk.org/wp/2012-02-12/happy-people-a-year-in-the-taiga/</link>
		<comments>http://www.danvk.org/wp/2012-02-12/happy-people-a-year-in-the-taiga/#comments</comments>
		<pubDate>Mon, 13 Feb 2012 05:06:42 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[movies]]></category>
		<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=832</guid>
		<description><![CDATA[Last fall, I was excited to read about Werner Herzog and Dmitry Vasyukov&#8217;s new film, Happy People: A Year in the Taiga. I bought tickets for its one-night premiere at the IFC. Raven and I raced from our dinner to catch the 9 PM show&#8230; only to find out that it had been the night [...]]]></description>
			<content:encoded><![CDATA[<p>Last fall, I was excited to read about Werner Herzog and Dmitry Vasyukov&#8217;s new film, <a href="http://www.wernerherzog.com/62.html">Happy People: A Year in the Taiga</a>.</p>
<p><center><iframe width="560" height="315" src="http://www.youtube.com/embed/8_wnpkOVIHQ" frameborder="0" allowfullscreen></iframe></center></p>
<p>I bought tickets for its one-night premiere at the <a href="http://www.ifccenter.com/">IFC</a>. <a href="http://ravenkeller.com/">Raven</a> and I raced from our dinner to catch the 9 PM show&hellip; only to find out that it had been the night before. A tragic mistake for a one-night only show!</p>
<p>I recently found a full copy of the film <a href="http://www.youtube.com/watch?v=V26jauHZgtw">on YouTube</a> and we watched it. (Pro tip: the volume is a little low in the YouTube video. You can visit <a href="http://www.saveyoutube.com/watch?v=V26jauHZgtw">saveyoutube.com</a> to download it to your hard drive. Then watch it in a desktop player like <a href="http://www.videolan.org/vlc/">VLC</a> with the volume turned up past the max.)</p>
<p>Herzog and Vasyukov glamorize life in the Taiga. The fur trappers&#8217; existence is simple. They have few material possessions which they do not make themselves. A rifle, snowmobile and outboard motor are the lone exceptions.  There&#8217;s something immensely satisfying about seeing the hunter making skis and a canoe in the fall, then using them in the winter. They are nearly completely cut off from the modern world. The only intrusion it makes into the film is when a Siberian politician visits on a boat, a curiosity to which the villagers pay little regard.</p>
<p>The men live for the winter hunt, and this is clearly the part of their lives which the filmmakers found most interesting. We hear more about their hunting dogs than we do about their wives or children. The only time we see real emotion from a hunter is when he describes watching a bear kill his favorite dog. Less pleasant things are talked of only briefly: the native people have been largely displaced by ethnic Russians, and those who remain are alcoholics. The protagonist of the movie was brought to <a href="http://maps.google.com/maps?ll=62.465416,89.002519&#038;spn=0.1,0.1&#038;t=h&#038;q=62.465416,89.002519">Bakhta</a> by helicopter thirty years ago to trap for the communist government. They had few supplies. Another man came with him, but he was &#8220;not up to the task&#8221; of survival.</p>
<p>This is a beautiful film which offers a glimpse into an increasingly rare way of life. Herzog and Vasyukov portray it as simple and remote, but I think is more due to their editing than to the reality of life in Bakhta. What about the women, who never speak in this film? Or the natives? <i>Happy People</i> leaves you respecting the people who live in the Taiga, but wanting to know more about them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2012-02-12/happy-people-a-year-in-the-taiga/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Accurate hexadecimal to decimal conversion in JavaScript</title>
		<link>http://www.danvk.org/wp/2012-01-20/accurate-hexadecimal-to-decimal-conversion-in-javascript/</link>
		<comments>http://www.danvk.org/wp/2012-01-20/accurate-hexadecimal-to-decimal-conversion-in-javascript/#comments</comments>
		<pubDate>Fri, 20 Jan 2012 23:05:20 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[javascript]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=824</guid>
		<description><![CDATA[A problem came up at work yesterday: I was creating a web page that received 64-bit hex numbers from one API. But it needed to pass them off to another API that expected decimal numbers. Usually this would not be a problem &#8212; JavaScript has built-in functions for converting between hex and decimal: parseInt("1234abcd", 16) [...]]]></description>
			<content:encoded><![CDATA[<p>A problem came up at work yesterday: I was creating a web page that received 64-bit hex numbers from one API. But it needed to pass them off to another API that expected decimal numbers.</p>
<p>Usually this would not be a problem &mdash; JavaScript has built-in functions for converting between hex and decimal:</p>
<p><code>parseInt("1234abcd", 16) = 305441741<br />
(305441741).toString(16) = "1234abcd"</code></p>
<p>Unfortunately, for larger numbers, there&#8217;s a big problem lurking:</p>
<p><code>parseInt("123456789abcd<u>ef</u>", 16) = 81985529216486900<br />
(81985529216486900).toString(16) = "123456789abcd<u>f0</u>"<br />
</code></p>
<p>The last two digits are wrong. Why did these functions stop being inverses of one another?</p>
<p>The answer has to do with how JavaScript stores numbers. It uses 64-bit floating point representation for all numbers, even integers. This means that integers larger than 2^53 cannot be represented precisely. You can see this by evaluating:</p>
<p><code>(Math.pow(2, 53) + 1) - 1 = 9007199254740991</code></p>
<p>That ends with a 1, so whatever it is, it&#8217;s certainly not a power of 2. (It&#8217;s off by one).</p>
<p>To solve this problem, I wrote some very simple hex &lt;-&gt; decimal conversion functions which use arbitrary precision arithmetic. In particular, these will work for 64-bit numbers or 128-bit numbers. The code is only about 65 lines, so it&#8217;s much more lightweight than a full-fledged library for arbitrary precision arithmetic.</p>
<p>The algorithm is pretty cool. You can see a demo, read an explanation and get the code here:<br />
<a href="http://danvk.org/hex2dec.html">http://danvk.org/hex2dec.html</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2012-01-20/accurate-hexadecimal-to-decimal-conversion-in-javascript/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s Going on with Twin Rates?</title>
		<link>http://www.danvk.org/wp/2012-01-14/whats-going-on-with-twin-rates/</link>
		<comments>http://www.danvk.org/wp/2012-01-14/whats-going-on-with-twin-rates/#comments</comments>
		<pubDate>Sat, 14 Jan 2012 23:57:50 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=806</guid>
		<description><![CDATA[I recently built a version of the CDC&#8217;s Vital Statistics database for Google&#8217;s BigQuery service. You can read more in my post on the Google Research Blog. The Natality data set is one of the most fascinating I&#8217;ve ever worked with. It is an electronic record which goes back to 1969. Every single one of [...]]]></description>
			<content:encoded><![CDATA[<p>I recently built a version of the CDC&#8217;s <a href="http://www.cdc.gov/nchs/nvss.htm">Vital Statistics database</a> for Google&#8217;s BigQuery service. You can read more in my <a href="http://googleresearch.blogspot.com/2012/01/cdc-birth-vital-statistics-in-bigquery.html">post</a> on the <a href="http://googleresearch.blogspot.com/">Google Research Blog</a>.</p>
<p>The <a href="http://code.google.com/apis/bigquery/docs/dataset-natality.html">Natality data set</a> is one of the most fascinating I&#8217;ve ever worked with. It is an electronic record which goes back to 1969. Every single one of the 68 million rows in it represents a <i>live human birth</i>. I can&#8217;t imagine any other data set which was more&hellip; laborious&hellip; to create. :)</p>
<p>But beyond the data itself, the processes surrounding it also tell a fascinating story. The yearly <a href="http://www.cdc.gov/nchs/data_access/Vitalstatsonline.htm">user guides</a> are a tour-de-force in how publishing has changed in the last forty years. The early manuals were clearly written on typewriters. To make a table, you spaced things out right, then used a ruler and a pen to draw in the lines. Desktop publishing is so easy now that it&#8217;s easy to forget how much standards have improved in the last few decades.</p>
<p>They&#8217;ve had to balance the statistical benefits of gathering a uniform data set year after year with a need to track a society which has evolved considerably. In 1969, your race was either &#8220;Black&#8221;, &#8220;White&#8221; or &#8220;Other&#8221;. There was a question about whether the child was &#8220;legitimate&#8221;. There were no questions about alcohol, smoking or drug use. And there was no attempt to protect privacy &mdash; most of these early records contain enough information to uniquely identify individuals (though doing so is a <a href="http://www.cdc.gov/nchs/data_access/restrictions.htm">federal crime</a>).</p>
<p>I included <a href="http://goo.gl/yvlJ9">four example analyses</a> on the BigQuery site. I&#8217;ll include one more here: it&#8217;s a chart of the <i>twin rate</i> over thirty years as a function of age.</p>
<p><iframe src="/multiples/multiples.html" width=480 height=380 frameborder=0 scrolling="no"><br />
</iframe></p>
<p>A few takeaways from this chart:</p>
<ul>
<li>The twin rate is clearly a function of age.</li>
<li>It <i>used</i> to be that older women were less likely to have twins.</li>
<li>Starting around 1994, this pattern reversed itself (likely due to <a href="http://en.wikipedia.org/wiki/In_vitro_fertilisation">IVF</a>).</li>
<li>The y-axis is on a log scale, so this effect is truly dramatic.</li>
<li>There has been an overall increase in the twin rate in the last thirty years.</li>
<li>This increase spans all ages.</li>
</ul>
<p>The increase in twin rate is often attributed to IVF, but the last two points indicate that this isn&#8217;t the whole story. IVF clearly has a huge effect on the twin rate for older (40+) women, but it can&#8217;t explain the increase for younger women. A 21-year old mother was 40% more likely to have twins in 2002 than she was in 1971.</p>
<p>My guess is that this is ultimately because of improved neonatal care. Twins pregnancies are more likely to have complications, and these are less likely to lead to miscarriages than in the past. If this interpretation is correct, then there were just as many 21-year olds pregnant with twins forty years ago. It&#8217;s just that this led to fewer births.</p>
<p><i>Chart credits: <a href="http://dygraphs.com/">dygraphs</a> and <a href="http://jqueryui.com/demos/slider/">jQuery UI Slider</a>.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2012-01-14/whats-going-on-with-twin-rates/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Takeaways from Stanford&#8217;s Machine Learning Class</title>
		<link>http://www.danvk.org/wp/2011-12-19/takeaways-stanfords-machine-learning-class/</link>
		<comments>http://www.danvk.org/wp/2011-12-19/takeaways-stanfords-machine-learning-class/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 00:04:33 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=782</guid>
		<description><![CDATA[Over the past two months, I&#8217;ve participated in Andrew Ng&#8217;s online Stanford Machine learning class. It&#8217;s a very high-level overview of the field with an emphasis on applications and techniques, rather than theory. Since I just finished the last assignment, it&#8217;s a fine time to write down my thoughts on the class! Overall, I&#8217;ve learned [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past two months, I&#8217;ve participated in Andrew Ng&#8217;s online <a href="http://ml-class.org">Stanford Machine learning class</a>. It&#8217;s a very high-level overview of the field with an emphasis on applications and techniques, rather than theory. Since I just finished the last assignment, it&#8217;s a fine time to write down my thoughts on the class!</p>
<p>Overall, I&#8217;ve learned quite a bit about how <a href="http://en.wikipedia.org/wiki/Machine_learning">ML</a> is used in practice. Some highlights for me:</p>
<style type="text/css">ul.body-list li { padding-bottom: 10px; }</style>
<ul class="body-list">
<li><a href="http://en.wikipedia.org/wiki/Gradient_descent">Gradient descent</a> is a very general optimization technique. If you can calculate a function and its partial derivatives, you can use gradient descent. I was particularly impressed with the way we  used it to train Neural Networks. We learned how the networks operated, but had no need to think about how to train them &mdash; we just used gradient descent.
<li>There are <a href="http://en.wikipedia.org/wiki/BFGS_method">many</a> advanced &#8220;unconstrained optimization&#8221; algorithms which can be used as alternatives to gradient descent. These often have the advantage that you don&#8217;t need to tune parameters like a learning rate.
<li><a href="http://en.wikipedia.org/wiki/Regularization_(machine_learning)">Regularization</a> is used almost universally. I&#8217;d previously had very negative associations with using high-order polynomial features, since I most often saw them used in examples of overfitting. But I realize now that they are quite reasonable to add if you also make good use of regularization.
<li>The <a href="http://en.wikipedia.org/wiki/Backpropagation">backpropagation algorithm</a> for <a href="http://en.wikipedia.org/wiki/Artificial_neural_network">Neural Networks</a> is really just an efficient way to compute partial derivatives (for use by gradient descent and co).
<li>Learning curves (plots of train/test error as a function of the number of examples) are a great way to figure out how to improve your ML algorithm. For example, if your training and test errors are both high, it means that you&#8217;re not overfitting your data set and there&#8217;s no point in gathering more data. What it does mean is that you need to add more features (e.g. the polynomial which I used to fear) in order to increase your performance.
</ul>
<p>The other takeaway is that, as in many fields, there are many &#8220;tricks of the trade&#8221; in Machine Learning. These are bits of knowledge that aren&#8217;t part of the core theory, but which are still enormously helpful for solving real-world problems.</p>
<p>As an example, consider the last problem in the course: Photo OCR. The problem is to take an image like this:</p>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/photo-ocr-e1324338543864.png" alt="Example of Photo OCR" title="photo-ocr" width="531" height="362" class="aligncenter size-full wp-image-783" /></p>
<p>and extract all the text: &#8220;LULA B&#8217;s ANTIQUE MALL&#8221;, &#8220;LULA B&#8217;s&#8221;, &#8220;OPEN&#8221; and &#8220;Lula B&#8217;s&#8221;. Initially, this seems quite daunting. Machine Learning is clearly relevant here, but how do you break it down into concrete problems which can be attacked using ML techniques? You don&#8217;t know where the text is and you don&#8217;t even have a rough idea of the text&#8217;s size.</p>
<p>This is where the &#8220;tricks&#8221; come in. <a href="http://en.wikipedia.org/wiki/Binary_classification">Binary classifiers</a> are the &#8220;hammer&#8221; of ML. You can write a binary classifier to determine whether a fixed-size rectangle contains text:</p>
<table>
<tr>
<td><i>Positive examples</i></td>
<td><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/text-negative.png" alt="" title="text-positive" width="383" height="75" class="alignleft size-full wp-image-785" /></td>
</tr>
<tr>
<td><i>Negative examples</i></td>
<td><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/text-positive.png" alt="" title="text-negative" width="384" height="75" class="alignleft size-full wp-image-784" /></td>
</tr>
</table>
<p>You then run this classifier over thousands of different &#8220;windows&#8221; in the main image. This tells you where all the bits of text are. If you ignore all the non-contiguous areas, you have a pretty good sense of the bounding boxes for the text in the image.</p>
<p>But even given the text boxes, how do you recognize the characters? Time for another trick! We can build a binary classifier to detect a gap between letters in the center of a fixed-size rectangle:</p>
<table>
<tr>
<td><i>Positive examples</i></td>
<td><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/split-positive.png" alt="" title="split-positive" width="143" height="94" class="alignleft size-full wp-image-791" /></td>
</tr>
<tr>
<td><i>Negative examples</i></td>
<td><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/split-negative.png" alt="" title="split-negative" width="143" height="94" class="alignleft size-full wp-image-791" /></td>
</tr>
</table>
<p>If we slide this along, it will tell us where each character starts and ends. So we can chop the text box up into character boxes. Once we&#8217;ve done that, classifying characters in a fixed-size rectangle is another concrete problem which can be tackled with Neural Networks or the like.</p>
<p>In an ML class, you&#8217;re presented with this pipeline of ML algorithms for the Photo OCR problem. It makes sense. It reduces the real-world problem into three nice clean, theoretical problems. In the class, you&#8217;d likely spend most of your time talking about those three concrete problems. In retrospect, the pipeline seems as natural as could be.</p>
<p>But if you were given the Photo OCR problem in the real world, you might never come up with this breakdown. Unless you knew the trick! And the only way to learn tricks like this is to see them used. And that&#8217;s my final takeaway from this practical ML class: familiarity with a vastly larger set of ML tricks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-12-19/takeaways-stanfords-machine-learning-class/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Java, Ten Years Later</title>
		<link>http://www.danvk.org/wp/2011-11-05/java-ten-years-later/</link>
		<comments>http://www.danvk.org/wp/2011-11-05/java-ten-years-later/#comments</comments>
		<pubDate>Sat, 05 Nov 2011 20:33:54 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=765</guid>
		<description><![CDATA[It&#8217;s been almost ten years since I&#8217;ve actively used the Java programming language. In the mean time, I&#8217;ve mostly used C++. I&#8217;ve had to pick up a bit of Java again recently. Here are a few of the things that I found surprising or notable. These are all variants on &#8220;that&#8217;s changed in the last [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been almost ten years since I&#8217;ve actively used the <a href="http://en.wikipedia.org/wiki/Java_(programming_language)">Java</a> programming language. In the mean time, I&#8217;ve mostly used <a href="http://en.wikipedia.org/wiki/C%2B%2B">C++</a>. I&#8217;ve had to pick up a bit of Java again recently. Here are a few of the things that I found surprising or notable. These are all variants on &#8220;that&#8217;s changed in the last ten years&#8221; or &#8220;that&#8217;s not how C++ does it.&#8221;</p>
<p><b>The Java compiler enforces what would be conventions in C++.</b><br />
For example, &#8220;public class Foo&#8221; has to be in Foo.java. In C++, this would just be a convention. You can use &#8220;private class&#8221; when you&#8217;re playing around with test code and want to use only a single file. Similarly, class foo.Bar needs to be in &#8220;foo/Bar.java&#8221;.</p>
<p><b>Java Packages are a more pervasive concept than namespaces in C++.</b><br />
There&#8217;s a &#8220;default package&#8221;, but using this prevents you from loading classes by name: Class.fromName(&#8220;Foo&#8221;) won&#8217;t work, but Class.fromName(&#8220;package.Foo&#8221;) will. Classes in your current package are auto-imported, which surprised me at first. The default visibility for methods/fields in Java is &#8220;package private&#8221;, which has no analogue in C++.</p>
<p><b>Java keeps much more type information at runtime time than C++ does.</b><br />
The reflection features (Class.getMethods(), Method.getParameters(), etc.) have no equivalent in C++. This leads to some seemingly-magical behaviors, e.g. naming a method &#8220;foo&#8221; in a Servlet can cause it to be served at &#8220;/foo&#8221; without you saying anything else. Not all information is kept though: you can get a list of all packages, but not a list of all classes in a package. You can request a class by its name, but you can&#8217;t get a list of all classes. You can get a list of all the method names in a class, but you can&#8217;t get a list of all the parameter names in a method.</p>
<p><b>Java enums are far richer than C/C++ enums.</b><br />
enums in Java are more like classes: they can have constructors, methods, fields, even per-value method implementations. I really like this. Examples:</p>
<p><code>public enum Suit {<br />
&nbsp; CLUB("C"), DIAMOND("D"), HEART("S"), SPADE("S");<br />
&nbsp; private String shortName;<br />
&nbsp; private Suit(shortName) { this.shortName = shortName; }<br />
&nbsp; public String toString() { return shortName; }<br />
}<br />
</code></p>
<p><b>Java is OK with a two-tier type system.</b><br />
At its core, C++ is an attempt to put user-defined types on an equal footing with built-in types like int and char. This is in no way a goal of Java, which is quite content to have a two-tier system of primitive and non-primitive types. This means that you can&#8217;t do Map&lt;int, int&gt;, for instance. You have to do Map&lt;Integer, Integer&gt;. Autoboxing makes this less painful, but it&#8217;s still a wart in the language that you have to be aware of.</p>
<p>One concrete example of this is the &#8220;array[index]&#8221; notation. In C++, this is also used for maps. There&#8217;s no way to do this in Java, and I really miss it. Compare:</p>
<p><code>map[key] += 1;</code></p>
<p>to</p>
<p><code>map.put(key, 1 + map.get(key));</code></p>
<p>which has more boilerplate and is more error-prone, since you might accidentally do:</p>
<p><code>map.put(key, 1 + other_map.get(key));</code></p>
<p><b>The designers of Java Generics learned from the chaos of C++ templates.</b><br />
Generic classes in Java are always templated on types: no more insane error messages. You can even say what interface the type has to implement. And there&#8217;s no equivalent of method specialization, a C++ feature which is often misused.</p>
<p><b>Variables/fields in Java behave more like C++ pointers than C++ values.</b><br />
This is a particular gotcha for a field. For example, in C++:</p>
<p><code>class C {<br />
&nbsp;public:<br />
&nbsp; C() {<br />
&nbsp; &nbsp; // foo_ is already constructed and usable here.<br />
&nbsp; }<br />
&nbsp;private:<br />
&nbsp; Foo foo_;<br />
};<br />
</code></p>
<p>But in Java:</p>
<p><code>class C {<br />
&nbsp; public C() {<br />
&nbsp; &nbsp; // foo is null here. We have to do foo = new Foo();<br />
&nbsp; }<br />
&nbsp; private Foo foo;<br />
}<br />
</code></p>
<p><b>Java constructors always require a trailing (), even if they take no parameters.</b><br />
This is a minor gotcha, but one I find myself running into frequently. It&#8217;s &#8220;new Foo()&#8221; instead of &#8220;new Foo&#8221; (which is acceptable in C++).</p>
<p><b>The Java foreach loop is fantastic</b><br />
Compare</p>
<p><code>for (String arg : args) { ... }</code></p>
<p>to</p>
<p><code>for (Set&lt;string&gt;::const_iterator it = args.begin(); it != args.end(); ++it) { ... }</code></p>
<p><b>The &#8220;static {}&#8221; construct is nice</b><br />
This lets you write code to initialize static variables. It has no clear analogue in C++. To use the Suit example above,</p>
<p><code>private static HashMap&lt;String, Suit&gt; name_to_suit;<br />
static {<br />
&nbsp; for (Suit s : Suit.values()) { name_to_suit.put(s.toString(), s); }<br />
}<br />
</code></p>
<p>The new features (Generics, enums, autoboxing) that Java has gained in the last ten years make it much more pleasant to use.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-11-05/java-ten-years-later/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Robert Moses, Getting Things Done</title>
		<link>http://www.danvk.org/wp/2011-08-19/robert-moses-getting-things-done/</link>
		<comments>http://www.danvk.org/wp/2011-08-19/robert-moses-getting-things-done/#comments</comments>
		<pubDate>Fri, 19 Aug 2011 23:19:55 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[personal]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=758</guid>
		<description><![CDATA[I recently finished The Power Broker, Robert Caro&#8217;s critically-acclaimed biography of New York Master Builder Robert Moses. At 1200 pages, it&#8217;s an undertaking. But I&#8217;d highly recommend it if you live in the New York area. One passage about Moses&#8217; daily routine struck me: A third feature of Moses&#8217; office was his desk. It wasn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>I recently finished <a href="http://en.wikipedia.org/wiki/The_Power_Broker" title="The Power Broker">The Power Broker</a>, Robert Caro&#8217;s critically-acclaimed biography of New York Master Builder <a href="http://en.wikipedia.org/wiki/Robert_Moses" title="Robert Moses">Robert Moses</a>. At 1200 pages, it&#8217;s an undertaking. But I&#8217;d highly recommend it if you live in the New York area.</p>
<p>One passage about Moses&#8217; daily routine struck me:</p>
<blockquote><p>A third feature of Moses&#8217; office was his desk. It wasn&#8217;t a desk but rather a large table. The reason was simple: Moses did not like to let problems pile up. If there was one on his desk, he wanted it disposed of immediately. Similarly, when he arrived at his desk in the morning, he disposed of the stacks of mail awaiting him by calling in secretaries and going through the stacks, letter by letter, before he went on to anything else. Having a table instead of a desk was an insurance that this procedure would be followed. Since a table has no drawers, there was no place to hide papers; there was no escape from a nagging problem or a difficult-to-answer letter except to get rid of it in one way or another. And there was another advantage: when your desk was a table, you could have conferences at it without even getting up. <i>(p. 268)</i></p></blockquote>
<p>Moses&#8217; approach to snail mail sounds a lot like the &#8220;Getting Things Done&#8221; approach to email: make your inbox a to-do list and keep it empty. Moses wouldn&#8217;t do <i>anything</i> until his mail was cleared. He wouldn&#8217;t let tasks pile up, so he always had a clean plate every day. He even tailored his office to enforce this workflow.</p>
<p>I&#8217;ve been trying the Moses technique on my work inbox recently. When I arrive in the morning, I deal with all the emails waiting for me. No excuses. No starring and leaving the message as a &#8220;to-do&#8221; in the bottom of my inbox. There are many emails/tasks that I&#8217;d prefer to ignore, but it turns out that most of them only require ten minutes of work to deal with completely.</p>
<p>So far, this is working well for me. But will I be able to keep it up? Robert Moses did for forty years, so there&#8217;s hope!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-08-19/robert-moses-getting-things-done/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Crosscountry Crosswords</title>
		<link>http://www.danvk.org/wp/2011-03-27/crosscountry-crosswords/</link>
		<comments>http://www.danvk.org/wp/2011-03-27/crosscountry-crosswords/#comments</comments>
		<pubDate>Sun, 27 Mar 2011 21:07:05 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[personal]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=749</guid>
		<description><![CDATA[It&#8217;s been almost a year since I introduced lmnowave, the collaborative crossword puzzle gadget for Google Wave. A lot has happened in that past year, not least the cancelation of Wave. First, to clear up some confusion. It&#8217;s not &#8220;I&#8217;m no wave&#8221;, it&#8217;s &#8220;L-M-N-O-Wave&#8221;, which is a play on &#8220;L-M-N-O-Puz&#8221;, aka lmnopuz, the software on [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/logo.png" alt="logo" title="logo" width="167" height="101" align="right" class="alignright size-full wp-image-688" />It&#8217;s been almost a year since I introduced <a href="/wp/2010-03-22/introducing-lmnowave/">lmnowave</a>, the collaborative crossword puzzle gadget for <a href="http://wave.google.com/">Google Wave</a>. A lot has happened in that past year, not least the cancelation of Wave.</p>
<p>First, to clear up some confusion. It&#8217;s not &#8220;I&#8217;m no wave&#8221;, it&#8217;s &#8220;L-M-N-O-Wave&#8221;, which is a play on &#8220;L-M-N-O-Puz&#8221;, aka <a href="http://neugierig.org/software/lmnopuz/">lmnopuz</a>, the software on which my collaborative crossword system is based. Only a few dozen people ever saw lmnopuz, so no one got the joke. And I realized after releasing it that, by changing &#8216;puz&#8217; -&gt; &#8216;wave&#8217;, I&#8217;d taken away any hint of what my wave gadget actually did. A bad name. Oh well.</p>
<p>In August, Google announced that <a href="http://googleblog.blogspot.com/2010/08/update-on-google-wave.html">Wave was canceled</a>. This seemed to be the end of lmnowave. Sure, Wave was still usable. But the life had been sucked out of the project. This was quite disappointing to me, since I&#8217;d spent a fair bit of my own time developing the crossword gadget.</p>
<p>Then, in mid-December, Douwe Osinga <a href="http://googlewave.blogspot.com/2010/12/announcing-google-shared-spaces.html">introduced</a> the oddly-named <a href="http://sharedspaces.googlelabs.com/">Google Shared Spaces</a>. It&#8217;s an attempt to salvage the Wave gadget code, to let it live outside of Wave.</p>
<p>For lmnopuz, it&#8217;s perfect. Here&#8217;s the <a href="http://sharedspaces.googlelabs.com/gallery/app?app_id=96001">lmnowave shared space</a>. You can use it to collaborate on crosswords with your friends, just like you could with lmnowave. In some ways, it&#8217;s even better, since the Wave UI is stripped away and you can focus on your puzzle. To do crosscountry crosswords, my <a href="http://ericaricardo.com/">friend</a> and I open up a shared space and call each other on Skype. The combination works really well.</p>
<p>What does the future hold for lmnowave? It&#8217;s a bit unclear. I may turn it into a Facebook game, or perhaps use it to learn how to write applications for the <a href="http://www.apple.com/mac/app-store/">Mac App store</a>.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-03-27/crosscountry-crosswords/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Commacopy</title>
		<link>http://www.danvk.org/wp/2011-03-09/commacopy/</link>
		<comments>http://www.danvk.org/wp/2011-03-09/commacopy/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 15:16:08 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=720</guid>
		<description><![CDATA[At work, I often see web pages that display large numbers like so: num-bytes 1,234,567,890 num-entries 123,456,789 Including the commas in the display makes the numbers easier to read. But it does have a downside. Say you want to calculate the average number of bytes per entry. If you copy/paste the numbers above, the commas [...]]]></description>
			<content:encoded><![CDATA[<p>At work, I often see web pages that display large numbers like so:</p>
<table>
<tr>
<td>num-bytes</td>
<td>1,234,567,890</td>
</tr>
<tr>
<td>num-entries</td>
<td>123,456,789</td>
</tr>
</table>
<p>Including the commas in the display makes the numbers easier to read. But it does have a downside. Say you want to calculate the average number of bytes per entry. If you copy/paste the numbers above, the commas will prevent most programming languages (e.g. <a href="http://en.wikipedia.org/wiki/Python_(programming_language)">python</a> or <a href="http://en.wikipedia.org/wiki/Bc_programming_language">bc</a>) from interpreting them correctly.</p>
<p>My coworker <a href="http://dsandler.org/wp/">Dan</a> came up with a great solution to this conundrum using CSS. Try copy/pasting these numbers over into the text box:</p>
<style type="text/css">.pre-comma:before { content:","; }</style>
<table>
<tr valign="top">
<td valign="top" width="50%">
<ul>
<li>1<span class='pre-comma'>234</span> <b> or </b> 2<span class='pre-comma'>345</span>
<li>-12<span class='pre-comma'>345</span>.67
<li>-123<span class='pre-comma'>456</span><span class='pre-comma'>789</span></ul>
</td>
<td valign="top"><textarea rows=4 cols=30></textarea></td>
</tr>
</table>
<p>The commas don&#8217;t copy! Best of both worlds!</p>
<p>You can view source to see how it works, but let&#8217;s jump straight to the goodies:</p>
<p><!-- hoisted from https://www.squarefree.com/bookmarklets/ --></p>
<style type="text/css">a.bml { border:1px outset #ddd; padding: 1px; vertical-align: 1px; background: #ddd; text-decoration: none;  font-family: sans-serif; color: darkgreen; }</style>
<p><b>Bookmarklet: </b><a class="bml" href="javascript:var%20s%20=%20document.getElementsByTagName(%27*%27);var%20re%20=%20/[-+]?(\d{1,3})(,\d\d\d)+(\.\d*)?/;var%20changed%20=%20false;for%20(var%20i%20=%200;%20i%20<%20s.length;%20i++)%20{var%20el%20=%20s[i];for%20(var%20j%20=%200;%20j%20<%20el.childNodes.length;%20j++)%20{if%20(el.childNodes[j].nodeType%20==%203)%20{var%20txtEl%20=%20el.childNodes[j];var%20txt%20=%20txtEl.nodeValue;if%20(txt.match(re))%20{changed%20=%20true;var%20new_span%20=%20document.createElement(%22span%22);new_span.innerHTML%20=%20txt.replace(/,(\d\d\d)/g,%22<span%20class=%27pre-comma%27>$1</span>%22);el.replaceChild(new_span,%20txtEl);}}}}if%20(changed)%20{var%20rule%20=%20%22content:%20%27,%27;%22;var%20styleSheetElement%20=%20document.createElement(%22style%22);styleSheetElement.type%20=%20%22text/css%22;document.getElementsByTagName(%22head%22)[0].appendChild(styleSheetElement);for%20(var%20i%20=%200;%20i%20<%20document.styleSheets.length;%20i++)%20{if%20(document.styleSheets[i].disabled)%20continue;var%20mysheet%20=%20document.styleSheets[i];try%20{if%20(mysheet.insertRule)%20{var%20idx%20=%20mysheet.cssRules%20?%20mysheet.cssRules.length%20:%200;mysheet.insertRule(%22.pre-comma:before%20{%20%22%20+%20rule%20+%20%22%20}%22,%20idx);}%20else%20if%20(mysheet.addRule)%20{mysheet.addRule(%22.pre-comma:before%22,%20rule);}return;}%20catch(err)%20{}}}">commacopy</a></span></p>
<p><b>Unobtrusive JavaScript: </b><a href="/commacopy/commacopy.js">commacopy.js</a></p>
<p>To use the bookmarklet, drag it to your browser&#8217;s bookmarks toolbar. If you click it, it will silently convert all numbers containing commas on the current page to the fancy copy/pasteable commas. This should really be a Chrome extension that runs on every page, but I&#8217;ll leave that as an exercise for the reader.</p>
<p>To use the unobtrusive JS, make a copy of <a href="/commacopy/commacopy.js">commacopy.js</a> and include it in your page via:<br />
<code><br />
&lt;script src="commacopy.js" language="text/javascript"&gt;&lt;script&gt;<br />
</code></p>
<p>commacopy works by converting a number like:<br />
<code><br />
123,456,789<br />
</code></p>
<p>into this HTML:<br />
<code><br />
&lt;style type="text/css"&gt;<br />
.pre-comma:before {<br />
&nbsp;&nbsp;content: ",";<br />
}<br />
&lt;/style&gt;<br />
123&lt;span class='pre-comma'&gt;456&lt;/span&gt;&lt;span class='pre-comma'&gt;789&lt;/span&gt;<br />
</code></p>
<p>The commas are only present in a CSS style, rather than in the text itself. For reasons which aren&#8217;t entirely clear to me, this means that they don&#8217;t make it into the clipboard when you copy/paste them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-03-09/commacopy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

