<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>danvk.org &#187; programming</title>
	<atom:link href="http://www.danvk.org/wp/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.danvk.org/wp</link>
	<description>Keepin' static like wool fabric since 2006</description>
	<lastBuildDate>Fri, 20 Jan 2012 23:05:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Accurate hexadecimal to decimal conversion in JavaScript</title>
		<link>http://www.danvk.org/wp/2012-01-20/accurate-hexadecimal-to-decimal-conversion-in-javascript/</link>
		<comments>http://www.danvk.org/wp/2012-01-20/accurate-hexadecimal-to-decimal-conversion-in-javascript/#comments</comments>
		<pubDate>Fri, 20 Jan 2012 23:05:20 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[javascript]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=824</guid>
		<description><![CDATA[A problem came up at work yesterday: I was creating a web page that received 64-bit hex numbers from one API. But it needed to pass them off to another API that expected decimal numbers. Usually this would not be a problem &#8212; JavaScript has built-in functions for converting between hex and decimal: parseInt("1234abcd", 16) [...]]]></description>
			<content:encoded><![CDATA[<p>A problem came up at work yesterday: I was creating a web page that received 64-bit hex numbers from one API. But it needed to pass them off to another API that expected decimal numbers.</p>
<p>Usually this would not be a problem &mdash; JavaScript has built-in functions for converting between hex and decimal:</p>
<p><code>parseInt("1234abcd", 16) = 305441741<br />
(305441741).toString(16) = "1234abcd"</code></p>
<p>Unfortunately, for larger numbers, there&#8217;s a big problem lurking:</p>
<p><code>parseInt("123456789abcd<u>ef</u>", 16) = 81985529216486900<br />
(81985529216486900).toString(16) = "123456789abcd<u>f0</u>"<br />
</code></p>
<p>The last two digits are wrong. Why did these functions stop being inverses of one another?</p>
<p>The answer has to do with how JavaScript stores numbers. It uses 64-bit floating point representation for all numbers, even integers. This means that integers larger than 2^53 cannot be represented precisely. You can see this by evaluating:</p>
<p><code>(Math.pow(2, 53) + 1) - 1 = 9007199254740991</code></p>
<p>That ends with a 1, so whatever it is, it&#8217;s certainly not a power of 2. (It&#8217;s off by one).</p>
<p>To solve this problem, I wrote some very simple hex &lt;-&gt; decimal conversion functions which use arbitrary precision arithmetic. In particular, these will work for 64-bit numbers or 128-bit numbers. The code is only about 65 lines, so it&#8217;s much more lightweight than a full-fledged library for arbitrary precision arithmetic.</p>
<p>The algorithm is pretty cool. You can see a demo, read an explanation and get the code here:<br />
<a href="http://danvk.org/hex2dec.html">http://danvk.org/hex2dec.html</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2012-01-20/accurate-hexadecimal-to-decimal-conversion-in-javascript/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Takeaways from Stanford&#8217;s Machine Learning Class</title>
		<link>http://www.danvk.org/wp/2011-12-19/takeaways-stanfords-machine-learning-class/</link>
		<comments>http://www.danvk.org/wp/2011-12-19/takeaways-stanfords-machine-learning-class/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 00:04:33 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=782</guid>
		<description><![CDATA[Over the past two months, I&#8217;ve participated in Andrew Ng&#8217;s online Stanford Machine learning class. It&#8217;s a very high-level overview of the field with an emphasis on applications and techniques, rather than theory. Since I just finished the last assignment, it&#8217;s a fine time to write down my thoughts on the class! Overall, I&#8217;ve learned [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past two months, I&#8217;ve participated in Andrew Ng&#8217;s online <a href="http://ml-class.org">Stanford Machine learning class</a>. It&#8217;s a very high-level overview of the field with an emphasis on applications and techniques, rather than theory. Since I just finished the last assignment, it&#8217;s a fine time to write down my thoughts on the class!</p>
<p>Overall, I&#8217;ve learned quite a bit about how <a href="http://en.wikipedia.org/wiki/Machine_learning">ML</a> is used in practice. Some highlights for me:</p>
<style type="text/css">ul.body-list li { padding-bottom: 10px; }</style>
<ul class="body-list">
<li><a href="http://en.wikipedia.org/wiki/Gradient_descent">Gradient descent</a> is a very general optimization technique. If you can calculate a function and its partial derivatives, you can use gradient descent. I was particularly impressed with the way we  used it to train Neural Networks. We learned how the networks operated, but had no need to think about how to train them &mdash; we just used gradient descent.
<li>There are <a href="http://en.wikipedia.org/wiki/BFGS_method">many</a> advanced &#8220;unconstrained optimization&#8221; algorithms which can be used as alternatives to gradient descent. These often have the advantage that you don&#8217;t need to tune parameters like a learning rate.
<li><a href="http://en.wikipedia.org/wiki/Regularization_(machine_learning)">Regularization</a> is used almost universally. I&#8217;d previously had very negative associations with using high-order polynomial features, since I most often saw them used in examples of overfitting. But I realize now that they are quite reasonable to add if you also make good use of regularization.
<li>The <a href="http://en.wikipedia.org/wiki/Backpropagation">backpropagation algorithm</a> for <a href="http://en.wikipedia.org/wiki/Artificial_neural_network">Neural Networks</a> is really just an efficient way to compute partial derivatives (for use by gradient descent and co).
<li>Learning curves (plots of train/test error as a function of the number of examples) are a great way to figure out how to improve your ML algorithm. For example, if your training and test errors are both high, it means that you&#8217;re not overfitting your data set and there&#8217;s no point in gathering more data. What it does mean is that you need to add more features (e.g. the polynomial which I used to fear) in order to increase your performance.
</ul>
<p>The other takeaway is that, as in many fields, there are many &#8220;tricks of the trade&#8221; in Machine Learning. These are bits of knowledge that aren&#8217;t part of the core theory, but which are still enormously helpful for solving real-world problems.</p>
<p>As an example, consider the last problem in the course: Photo OCR. The problem is to take an image like this:</p>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/photo-ocr-e1324338543864.png" alt="Example of Photo OCR" title="photo-ocr" width="531" height="362" class="aligncenter size-full wp-image-783" /></p>
<p>and extract all the text: &#8220;LULA B&#8217;s ANTIQUE MALL&#8221;, &#8220;LULA B&#8217;s&#8221;, &#8220;OPEN&#8221; and &#8220;Lula B&#8217;s&#8221;. Initially, this seems quite daunting. Machine Learning is clearly relevant here, but how do you break it down into concrete problems which can be attacked using ML techniques? You don&#8217;t know where the text is and you don&#8217;t even have a rough idea of the text&#8217;s size.</p>
<p>This is where the &#8220;tricks&#8221; come in. <a href="http://en.wikipedia.org/wiki/Binary_classification">Binary classifiers</a> are the &#8220;hammer&#8221; of ML. You can write a binary classifier to determine whether a fixed-size rectangle contains text:</p>
<table>
<tr>
<td><i>Positive examples</i></td>
<td><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/text-negative.png" alt="" title="text-positive" width="383" height="75" class="alignleft size-full wp-image-785" /></td>
</tr>
<tr>
<td><i>Negative examples</i></td>
<td><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/text-positive.png" alt="" title="text-negative" width="384" height="75" class="alignleft size-full wp-image-784" /></td>
</tr>
</table>
<p>You then run this classifier over thousands of different &#8220;windows&#8221; in the main image. This tells you where all the bits of text are. If you ignore all the non-contiguous areas, you have a pretty good sense of the bounding boxes for the text in the image.</p>
<p>But even given the text boxes, how do you recognize the characters? Time for another trick! We can build a binary classifier to detect a gap between letters in the center of a fixed-size rectangle:</p>
<table>
<tr>
<td><i>Positive examples</i></td>
<td><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/split-positive.png" alt="" title="split-positive" width="143" height="94" class="alignleft size-full wp-image-791" /></td>
</tr>
<tr>
<td><i>Negative examples</i></td>
<td><img src="http://www.danvk.org/wp/wp-content/uploads/2011/12/split-negative.png" alt="" title="split-negative" width="143" height="94" class="alignleft size-full wp-image-791" /></td>
</tr>
</table>
<p>If we slide this along, it will tell us where each character starts and ends. So we can chop the text box up into character boxes. Once we&#8217;ve done that, classifying characters in a fixed-size rectangle is another concrete problem which can be tackled with Neural Networks or the like.</p>
<p>In an ML class, you&#8217;re presented with this pipeline of ML algorithms for the Photo OCR problem. It makes sense. It reduces the real-world problem into three nice clean, theoretical problems. In the class, you&#8217;d likely spend most of your time talking about those three concrete problems. In retrospect, the pipeline seems as natural as could be.</p>
<p>But if you were given the Photo OCR problem in the real world, you might never come up with this breakdown. Unless you knew the trick! And the only way to learn tricks like this is to see them used. And that&#8217;s my final takeaway from this practical ML class: familiarity with a vastly larger set of ML tricks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-12-19/takeaways-stanfords-machine-learning-class/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Java, Ten Years Later</title>
		<link>http://www.danvk.org/wp/2011-11-05/java-ten-years-later/</link>
		<comments>http://www.danvk.org/wp/2011-11-05/java-ten-years-later/#comments</comments>
		<pubDate>Sat, 05 Nov 2011 20:33:54 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=765</guid>
		<description><![CDATA[It&#8217;s been almost ten years since I&#8217;ve actively used the Java programming language. In the mean time, I&#8217;ve mostly used C++. I&#8217;ve had to pick up a bit of Java again recently. Here are a few of the things that I found surprising or notable. These are all variants on &#8220;that&#8217;s changed in the last [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been almost ten years since I&#8217;ve actively used the <a href="http://en.wikipedia.org/wiki/Java_(programming_language)">Java</a> programming language. In the mean time, I&#8217;ve mostly used <a href="http://en.wikipedia.org/wiki/C%2B%2B">C++</a>. I&#8217;ve had to pick up a bit of Java again recently. Here are a few of the things that I found surprising or notable. These are all variants on &#8220;that&#8217;s changed in the last ten years&#8221; or &#8220;that&#8217;s not how C++ does it.&#8221;</p>
<p><b>The Java compiler enforces what would be conventions in C++.</b><br />
For example, &#8220;public class Foo&#8221; has to be in Foo.java. In C++, this would just be a convention. You can use &#8220;private class&#8221; when you&#8217;re playing around with test code and want to use only a single file. Similarly, class foo.Bar needs to be in &#8220;foo/Bar.java&#8221;.</p>
<p><b>Java Packages are a more pervasive concept than namespaces in C++.</b><br />
There&#8217;s a &#8220;default package&#8221;, but using this prevents you from loading classes by name: Class.fromName(&#8220;Foo&#8221;) won&#8217;t work, but Class.fromName(&#8220;package.Foo&#8221;) will. Classes in your current package are auto-imported, which surprised me at first. The default visibility for methods/fields in Java is &#8220;package private&#8221;, which has no analogue in C++.</p>
<p><b>Java keeps much more type information at runtime time than C++ does.</b><br />
The reflection features (Class.getMethods(), Method.getParameters(), etc.) have no equivalent in C++. This leads to some seemingly-magical behaviors, e.g. naming a method &#8220;foo&#8221; in a Servlet can cause it to be served at &#8220;/foo&#8221; without you saying anything else. Not all information is kept though: you can get a list of all packages, but not a list of all classes in a package. You can request a class by its name, but you can&#8217;t get a list of all classes. You can get a list of all the method names in a class, but you can&#8217;t get a list of all the parameter names in a method.</p>
<p><b>Java enums are far richer than C/C++ enums.</b><br />
enums in Java are more like classes: they can have constructors, methods, fields, even per-value method implementations. I really like this. Examples:</p>
<p><code>public enum Suit {<br />
&nbsp; CLUB("C"), DIAMOND("D"), HEART("S"), SPADE("S");<br />
&nbsp; private String shortName;<br />
&nbsp; private Suit(shortName) { this.shortName = shortName; }<br />
&nbsp; public String toString() { return shortName; }<br />
}<br />
</code></p>
<p><b>Java is OK with a two-tier type system.</b><br />
At its core, C++ is an attempt to put user-defined types on an equal footing with built-in types like int and char. This is in no way a goal of Java, which is quite content to have a two-tier system of primitive and non-primitive types. This means that you can&#8217;t do Map&lt;int, int&gt;, for instance. You have to do Map&lt;Integer, Integer&gt;. Autoboxing makes this less painful, but it&#8217;s still a wart in the language that you have to be aware of.</p>
<p>One concrete example of this is the &#8220;array[index]&#8221; notation. In C++, this is also used for maps. There&#8217;s no way to do this in Java, and I really miss it. Compare:</p>
<p><code>map[key] += 1;</code></p>
<p>to</p>
<p><code>map.put(key, 1 + map.get(key));</code></p>
<p>which has more boilerplate and is more error-prone, since you might accidentally do:</p>
<p><code>map.put(key, 1 + other_map.get(key));</code></p>
<p><b>The designers of Java Generics learned from the chaos of C++ templates.</b><br />
Generic classes in Java are always templated on types: no more insane error messages. You can even say what interface the type has to implement. And there&#8217;s no equivalent of method specialization, a C++ feature which is often misused.</p>
<p><b>Variables/fields in Java behave more like C++ pointers than C++ values.</b><br />
This is a particular gotcha for a field. For example, in C++:</p>
<p><code>class C {<br />
&nbsp;public:<br />
&nbsp; C() {<br />
&nbsp; &nbsp; // foo_ is already constructed and usable here.<br />
&nbsp; }<br />
&nbsp;private:<br />
&nbsp; Foo foo_;<br />
};<br />
</code></p>
<p>But in Java:</p>
<p><code>class C {<br />
&nbsp; public C() {<br />
&nbsp; &nbsp; // foo is null here. We have to do foo = new Foo();<br />
&nbsp; }<br />
&nbsp; private Foo foo;<br />
}<br />
</code></p>
<p><b>Java constructors always require a trailing (), even if they take no parameters.</b><br />
This is a minor gotcha, but one I find myself running into frequently. It&#8217;s &#8220;new Foo()&#8221; instead of &#8220;new Foo&#8221; (which is acceptable in C++).</p>
<p><b>The Java foreach loop is fantastic</b><br />
Compare</p>
<p><code>for (String arg : args) { ... }</code></p>
<p>to</p>
<p><code>for (Set&lt;string&gt;::const_iterator it = args.begin(); it != args.end(); ++it) { ... }</code></p>
<p><b>The &#8220;static {}&#8221; construct is nice</b><br />
This lets you write code to initialize static variables. It has no clear analogue in C++. To use the Suit example above,</p>
<p><code>private static HashMap&lt;String, Suit&gt; name_to_suit;<br />
static {<br />
&nbsp; for (Suit s : Suit.values()) { name_to_suit.put(s.toString(), s); }<br />
}<br />
</code></p>
<p>The new features (Generics, enums, autoboxing) that Java has gained in the last ten years make it much more pleasant to use.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-11-05/java-ten-years-later/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Crosscountry Crosswords</title>
		<link>http://www.danvk.org/wp/2011-03-27/crosscountry-crosswords/</link>
		<comments>http://www.danvk.org/wp/2011-03-27/crosscountry-crosswords/#comments</comments>
		<pubDate>Sun, 27 Mar 2011 21:07:05 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[personal]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=749</guid>
		<description><![CDATA[It&#8217;s been almost a year since I introduced lmnowave, the collaborative crossword puzzle gadget for Google Wave. A lot has happened in that past year, not least the cancelation of Wave. First, to clear up some confusion. It&#8217;s not &#8220;I&#8217;m no wave&#8221;, it&#8217;s &#8220;L-M-N-O-Wave&#8221;, which is a play on &#8220;L-M-N-O-Puz&#8221;, aka lmnopuz, the software on [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/logo.png" alt="logo" title="logo" width="167" height="101" align="right" class="alignright size-full wp-image-688" />It&#8217;s been almost a year since I introduced <a href="/wp/2010-03-22/introducing-lmnowave/">lmnowave</a>, the collaborative crossword puzzle gadget for <a href="http://wave.google.com/">Google Wave</a>. A lot has happened in that past year, not least the cancelation of Wave.</p>
<p>First, to clear up some confusion. It&#8217;s not &#8220;I&#8217;m no wave&#8221;, it&#8217;s &#8220;L-M-N-O-Wave&#8221;, which is a play on &#8220;L-M-N-O-Puz&#8221;, aka <a href="http://neugierig.org/software/lmnopuz/">lmnopuz</a>, the software on which my collaborative crossword system is based. Only a few dozen people ever saw lmnopuz, so no one got the joke. And I realized after releasing it that, by changing &#8216;puz&#8217; -&gt; &#8216;wave&#8217;, I&#8217;d taken away any hint of what my wave gadget actually did. A bad name. Oh well.</p>
<p>In August, Google announced that <a href="http://googleblog.blogspot.com/2010/08/update-on-google-wave.html">Wave was canceled</a>. This seemed to be the end of lmnowave. Sure, Wave was still usable. But the life had been sucked out of the project. This was quite disappointing to me, since I&#8217;d spent a fair bit of my own time developing the crossword gadget.</p>
<p>Then, in mid-December, Douwe Osinga <a href="http://googlewave.blogspot.com/2010/12/announcing-google-shared-spaces.html">introduced</a> the oddly-named <a href="http://sharedspaces.googlelabs.com/">Google Shared Spaces</a>. It&#8217;s an attempt to salvage the Wave gadget code, to let it live outside of Wave.</p>
<p>For lmnopuz, it&#8217;s perfect. Here&#8217;s the <a href="http://sharedspaces.googlelabs.com/gallery/app?app_id=96001">lmnowave shared space</a>. You can use it to collaborate on crosswords with your friends, just like you could with lmnowave. In some ways, it&#8217;s even better, since the Wave UI is stripped away and you can focus on your puzzle. To do crosscountry crosswords, my <a href="http://ericaricardo.com/">friend</a> and I open up a shared space and call each other on Skype. The combination works really well.</p>
<p>What does the future hold for lmnowave? It&#8217;s a bit unclear. I may turn it into a Facebook game, or perhaps use it to learn how to write applications for the <a href="http://www.apple.com/mac/app-store/">Mac App store</a>.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-03-27/crosscountry-crosswords/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Commacopy</title>
		<link>http://www.danvk.org/wp/2011-03-09/commacopy/</link>
		<comments>http://www.danvk.org/wp/2011-03-09/commacopy/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 15:16:08 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=720</guid>
		<description><![CDATA[At work, I often see web pages that display large numbers like so: num-bytes 1,234,567,890 num-entries 123,456,789 Including the commas in the display makes the numbers easier to read. But it does have a downside. Say you want to calculate the average number of bytes per entry. If you copy/paste the numbers above, the commas [...]]]></description>
			<content:encoded><![CDATA[<p>At work, I often see web pages that display large numbers like so:</p>
<table>
<tr>
<td>num-bytes</td>
<td>1,234,567,890</td>
</tr>
<tr>
<td>num-entries</td>
<td>123,456,789</td>
</tr>
</table>
<p>Including the commas in the display makes the numbers easier to read. But it does have a downside. Say you want to calculate the average number of bytes per entry. If you copy/paste the numbers above, the commas will prevent most programming languages (e.g. <a href="http://en.wikipedia.org/wiki/Python_(programming_language)">python</a> or <a href="http://en.wikipedia.org/wiki/Bc_programming_language">bc</a>) from interpreting them correctly.</p>
<p>My coworker <a href="http://dsandler.org/wp/">Dan</a> came up with a great solution to this conundrum using CSS. Try copy/pasting these numbers over into the text box:</p>
<style type="text/css">.pre-comma:before { content:","; }</style>
<table>
<tr valign="top">
<td valign="top" width="50%">
<ul>
<li>1<span class='pre-comma'>234</span> <b> or </b> 2<span class='pre-comma'>345</span>
<li>-12<span class='pre-comma'>345</span>.67
<li>-123<span class='pre-comma'>456</span><span class='pre-comma'>789</span></ul>
</td>
<td valign="top"><textarea rows=4 cols=30></textarea></td>
</tr>
</table>
<p>The commas don&#8217;t copy! Best of both worlds!</p>
<p>You can view source to see how it works, but let&#8217;s jump straight to the goodies:</p>
<p><!-- hoisted from https://www.squarefree.com/bookmarklets/ --></p>
<style type="text/css">a.bml { border:1px outset #ddd; padding: 1px; vertical-align: 1px; background: #ddd; text-decoration: none;  font-family: sans-serif; color: darkgreen; }</style>
<p><b>Bookmarklet: </b><a class="bml" href="javascript:var%20s%20=%20document.getElementsByTagName(%27*%27);var%20re%20=%20/[-+]?(\d{1,3})(,\d\d\d)+(\.\d*)?/;var%20changed%20=%20false;for%20(var%20i%20=%200;%20i%20<%20s.length;%20i++)%20{var%20el%20=%20s[i];for%20(var%20j%20=%200;%20j%20<%20el.childNodes.length;%20j++)%20{if%20(el.childNodes[j].nodeType%20==%203)%20{var%20txtEl%20=%20el.childNodes[j];var%20txt%20=%20txtEl.nodeValue;if%20(txt.match(re))%20{changed%20=%20true;var%20new_span%20=%20document.createElement(%22span%22);new_span.innerHTML%20=%20txt.replace(/,(\d\d\d)/g,%22<span%20class=%27pre-comma%27>$1</span>%22);el.replaceChild(new_span,%20txtEl);}}}}if%20(changed)%20{var%20rule%20=%20%22content:%20%27,%27;%22;var%20styleSheetElement%20=%20document.createElement(%22style%22);styleSheetElement.type%20=%20%22text/css%22;document.getElementsByTagName(%22head%22)[0].appendChild(styleSheetElement);for%20(var%20i%20=%200;%20i%20<%20document.styleSheets.length;%20i++)%20{if%20(document.styleSheets[i].disabled)%20continue;var%20mysheet%20=%20document.styleSheets[i];try%20{if%20(mysheet.insertRule)%20{var%20idx%20=%20mysheet.cssRules%20?%20mysheet.cssRules.length%20:%200;mysheet.insertRule(%22.pre-comma:before%20{%20%22%20+%20rule%20+%20%22%20}%22,%20idx);}%20else%20if%20(mysheet.addRule)%20{mysheet.addRule(%22.pre-comma:before%22,%20rule);}return;}%20catch(err)%20{}}}">commacopy</a></span></p>
<p><b>Unobtrusive JavaScript: </b><a href="/commacopy/commacopy.js">commacopy.js</a></p>
<p>To use the bookmarklet, drag it to your browser&#8217;s bookmarks toolbar. If you click it, it will silently convert all numbers containing commas on the current page to the fancy copy/pasteable commas. This should really be a Chrome extension that runs on every page, but I&#8217;ll leave that as an exercise for the reader.</p>
<p>To use the unobtrusive JS, make a copy of <a href="/commacopy/commacopy.js">commacopy.js</a> and include it in your page via:<br />
<code><br />
&lt;script src="commacopy.js" language="text/javascript"&gt;&lt;script&gt;<br />
</code></p>
<p>commacopy works by converting a number like:<br />
<code><br />
123,456,789<br />
</code></p>
<p>into this HTML:<br />
<code><br />
&lt;style type="text/css"&gt;<br />
.pre-comma:before {<br />
&nbsp;&nbsp;content: ",";<br />
}<br />
&lt;/style&gt;<br />
123&lt;span class='pre-comma'&gt;456&lt;/span&gt;&lt;span class='pre-comma'&gt;789&lt;/span&gt;<br />
</code></p>
<p>The commas are only present in a CSS style, rather than in the text itself. For reasons which aren&#8217;t entirely clear to me, this means that they don&#8217;t make it into the clipboard when you copy/paste them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2011-03-09/commacopy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introducing lmnowave</title>
		<link>http://www.danvk.org/wp/2010-03-22/introducing-lmnowave/</link>
		<comments>http://www.danvk.org/wp/2010-03-22/introducing-lmnowave/#comments</comments>
		<pubDate>Mon, 22 Mar 2010 07:02:29 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[crosswords]]></category>
		<category><![CDATA[lmnowave]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=668</guid>
		<description><![CDATA[Last Winter, a dear friend of mine moved from San Francisco to Brooklyn. With an entire continent between us, my principal crossword puzzle buddy and I looked in vain to the internet for help. Was there truly no good way to do a crossword together online? The New York Times offered an applet, but it [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/logo.png" alt="logo" title="logo" width="167" height="101" align="right" class="alignright size-full wp-image-688" />Last Winter, a <a href="http://ericaricardo.com/">dear friend</a> of mine moved from San Francisco to Brooklyn. With an entire continent between us, my principal crossword puzzle buddy and I looked in vain to the internet for help. Was there truly no good way to do a crossword together online?</p>
<p>The New York Times <a href="http://select.nytimes.com/premium/xword/puzzles.html">offered</a> an applet, but it proved to be finicky and would only let us do the most recent day&#8217;s puzzle. A friend&#8217;s <a href="http://neugierig.org/software/lmnopuz/">project</a> offered hope, but only led to &#8220;Service Temporarily Unavailable&#8221;.</p>
<p>Enter: <b><a target="_blank" href="https://wave.google.com/wave/#restored:wave:googlewave.com!w%252B8X8AwPsDA.1">lmnowave</a></b>!</p>
<p>lmnowave is a crossword puzzle gadget for Google Wave. To do a crossword puzzle with a friend, you&#8217;ll both need <a href="http://wave.google.com/">Google Wave Accounts</a>.</p>
<p>Once you&#8217;ve got that taken care of, click this big link to get going:</p>
<h2><a target="_blank" href="https://wave.google.com/wave/#restored:wave:googlewave.com!w%252B8X8AwPsDA.1">lmnowave installer</a></h2>
<p>You should see something like this:</p>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/installer2.png" alt="lmnowave installer" title="installer" width="458" height="245" class="alignright size-full wp-image-697" /></p>
<p>Click the &#8220;Install Icon&#8221; and create a new wave. You&#8217;ll see a crossword puzzle icon in your toolbar:</p>
<style type="text/css">.bordered { border: solid 1px black; }</style>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/add_icon.png" alt="puzzle icon" title="add_icon" width="360" height="93" class="alignright size-full wp-image-675 bordered" /></p>
<p>Click it to add a crossword gadget. It should look like this:</p>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/drag_screen1.png" alt="load screen" title="drag_screen" width="429" height="383" class="alignright size-full wp-image-677 bordered" /></p>
<p>If you&#8217;re using Chrome or Safari, you may get a warning about not being able to upload puzzle files. This is fine &mdash; just switch to <a href="http://firefox.com/">Firefox</a> for a minute or try one of the built-in Onion puzzles.</p>
<p>If you have a .puz file on your computer (perhaps from your <a href="http://select.nytimes.com/premium/xword/puzzles.html">times subscription</a>), drag it onto the big lmnowave icon:</p>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/dragging2.png" alt="dragging a puz file" title="dragging" width="481" height="303" class="alignright size-full wp-image-682 bordered" /></p>
<p>The puzzle will load instantly. Now drag a friend into the wave:</p>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/adding_erica2.png" alt="Adding a friend" title="adding_erica" width="444" height="361" class="alignright size-full wp-image-683 bordered" /></p>
<p>and you&#8217;re ready to compete or collaborate as you see fit! Each player gets his or her own color, so you can keep track of who&#8217;s filled in each square:</p>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2010/03/solved_puzzle.png" alt="partially-solved puzzle" title="solved_puzzle" width="313" height="264" class="alignright size-full wp-image-686 bordered" /></p>
<p>lmnowave is an open-source project written entirely in JavaScript. If you&#8217;d like to contribute, <a href="http://github.com/danvk/lmnowave/">check it out</a> on github. Run into a bug or have a feature request? Let me know <a href="http://github.com/danvk/lmnowave/issues">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2010-03-22/introducing-lmnowave/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Crossword Word Frequency</title>
		<link>http://www.danvk.org/wp/2009-12-26/crossword-word-frequency/</link>
		<comments>http://www.danvk.org/wp/2009-12-26/crossword-word-frequency/#comments</comments>
		<pubDate>Sat, 26 Dec 2009 17:45:02 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=633</guid>
		<description><![CDATA[In a previous post, I discussed downloading several years&#8217; worth of New York Times Crosswords and categorizing them by day of week. Now, some analysis! Here were the most common words over the last 12 years, along with the percentage of puzzles in which they occurred: Percentage Word Length 6.218% ERA 3 5.703% AREA 4 [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous post, I discussed downloading several years&#8217; worth of New York Times Crosswords and categorizing them by day of week. Now, some analysis!</p>
<p>Here were the most common words over the last 12 years, along with the percentage of puzzles in which they occurred:</p>
<table class="thin sortable draggable">
<tr>
<th>Percentage</th>
<th>Word</th>
<th>Length</th>
</tr>
<tr>
<td>6.218%</td>
<td>ERA
<td>3</td>
</tr>
<tr>
<td>5.703%</td>
<td>AREA
<td>4</td>
</tr>
<tr>
<td>5.413%</td>
<td>ERE
<td>3</td>
</tr>
<tr>
<td>5.055%</td>
<td>ELI
<td>3</td>
</tr>
<tr>
<td>4.854%</td>
<td>ONE
<td>3</td>
</tr>
<tr>
<td>4.585%</td>
<td>ALE
<td>3</td>
</tr>
<tr>
<td>4.496%</td>
<td>ORE
<td>3</td>
</tr>
<tr>
<td>4.361%</td>
<td>ERIE
<td>4</td>
</tr>
<tr>
<td>4.339%</td>
<td>ALOE
<td>4</td>
</tr>
<tr>
<td>4.317%</td>
<td>ETA
<td>3</td>
</tr>
<tr>
<td>4.317%</td>
<td>ALI
<td>3</td>
</tr>
<tr>
<td>4.227%</td>
<td>OLE
<td>3</td>
</tr>
<tr>
<td>4.205%</td>
<td>ARE
<td>3</td>
</tr>
<tr>
<td>4.138%</td>
<td>ESS
<td>3</td>
</tr>
<tr>
<td>4.138%</td>
<td>EDEN
<td>4</td>
</tr>
<tr>
<td>4.138%</td>
<td>ATE
<td>3</td>
</tr>
<tr>
<td>4.048%</td>
<td>IRE
<td>3</td>
</tr>
<tr>
<td>4.048%</td>
<td>ARIA
<td>4</td>
</tr>
<tr>
<td>4.004%</td>
<td>ANTE
<td>4</td>
</tr>
<tr>
<td>3.936%</td>
<td>ESE
<td>3</td>
</tr>
<tr>
<td>3.936%</td>
<td>ENE
<td>3</td>
</tr>
<tr>
<td>3.914%</td>
<td>ADO
<td>3</td>
</tr>
<tr>
<td>3.869%</td>
<td>ELSE
<td>4</td>
</tr>
<tr>
<td>3.825%</td>
<td>NEE
<td>3</td>
</tr>
<tr>
<td>3.758%</td>
<td>ACE
<td>3</td>
</tr>
</table>
<p>(you can click column headings to sort.)</p>
<p>So &#8220;ERA&#8221; appears, on average, in about 23 puzzles per year. How about if we break this down by day of week? Follow me past the fold&#8230;</p>
<p><script type=text/javascript src="/dragtable/sorttable.js"></script><br />
<script type=text/javascript src="/dragtable/dragtable.js"></script></p>
<style type=text/css>
  /* Sortable tables */
  table.sortable thead {
    background-color:#eee;
    color:#666666;
    font-weight: bold;
    cursor: default;
  }
  table.thin, table.thin td, table.thin tr, table.thin th {
    border: thin solid black;
    border-collapse: collapse;
  }
</style>
<p><span id="more-633"></span></p>
<p><b>Monday:</b></p>
<table class="thin sortable draggable">
<tr>
<th>Percentage</th>
<th>Word</th>
<th>Length</th>
</tr>
<tr>
<td>9.404%</td>
<td>ALOE
<td>4</td>
</tr>
<tr>
<td>8.777%</td>
<td>AREA
<td>4</td>
</tr>
<tr>
<td>7.837%</td>
<td>ERIE
<td>4</td>
</tr>
<tr>
<td>6.426%</td>
<td>ONE
<td>3</td>
</tr>
<tr>
<td>6.426%</td>
<td>IDEA
<td>4</td>
</tr>
<tr>
<td>6.426%</td>
<td>ARIA
<td>4</td>
</tr>
<tr>
<td>6.270%</td>
<td>ONCE
<td>4</td>
</tr>
<tr>
<td>6.270%</td>
<td>EDEN
<td>4</td>
</tr>
<tr>
<td>6.113%</td>
<td>ERA
<td>3</td>
</tr>
<tr>
<td>6.113%</td>
<td>ELSE
<td>4</td>
</tr>
<tr>
<td>6.113%</td>
<td>ASEA
<td>4</td>
</tr>
<tr>
<td>5.799%</td>
<td>ERE
<td>3</td>
</tr>
<tr>
<td>5.643%</td>
<td>ORE
<td>3</td>
</tr>
<tr>
<td>5.643%</td>
<td>ETAL
<td>4</td>
</tr>
<tr>
<td>5.643%</td>
<td>ARE
<td>3</td>
</tr>
<tr>
<td>5.643%</td>
<td>ANTE
<td>4</td>
</tr>
<tr>
<td>5.486%</td>
<td>OREO
<td>4</td>
</tr>
<tr>
<td>5.486%</td>
<td>ALEE
<td>4</td>
</tr>
<tr>
<td>5.329%</td>
<td>TREE
<td>4</td>
</tr>
<tr>
<td>5.329%</td>
<td>ESS
<td>3</td>
</tr>
<tr>
<td>5.329%</td>
<td>ELI
<td>3</td>
</tr>
<tr>
<td>5.329%</td>
<td>ACRE
<td>4</td>
</tr>
<tr>
<td>5.172%</td>
<td>TSAR
<td>4</td>
</tr>
<tr>
<td>5.172%</td>
<td>ANTI
<td>4</td>
</tr>
<tr>
<td>5.016%</td>
<td>ORAL
<td>4</td>
</tr>
</table>
<p>The four letter words are more common now. Also look how much higher the percentages are. There&#8217;s less variety in the fill of Monday puzzles. &#8220;ALOE&#8221; and &#8220;ARIA&#8221; are classic crossword words, not to mention &#8220;OREO&#8221;.</p>
<p><b>Saturday:</b></p>
<table class="thin sortable draggable">
<tr>
<th>Percentage</th>
<th>Word</th>
<th>Length</th>
</tr>
<tr>
<td>3.286%</td>
<td>ERA
<td>3</td>
</tr>
<tr>
<td>2.973%</td>
<td>ONE
<td>3</td>
</tr>
<tr>
<td>2.973%</td>
<td>ETE
<td>3</td>
</tr>
<tr>
<td>2.817%</td>
<td>TEN
<td>3</td>
</tr>
<tr>
<td>2.817%</td>
<td>EVE
<td>3</td>
</tr>
<tr>
<td>2.817%</td>
<td>ETA
<td>3</td>
</tr>
<tr>
<td>2.660%</td>
<td>IRE
<td>3</td>
</tr>
<tr>
<td>2.660%</td>
<td>ERR
<td>3</td>
</tr>
<tr>
<td>2.660%</td>
<td>ERE
<td>3</td>
</tr>
<tr>
<td>2.504%</td>
<td>OTIS
<td>4</td>
</tr>
<tr>
<td>2.504%</td>
<td>OLE
<td>3</td>
</tr>
<tr>
<td>2.504%</td>
<td>ENE
<td>3</td>
</tr>
<tr>
<td>2.504%</td>
<td>ELL
<td>3</td>
</tr>
<tr>
<td>2.504%</td>
<td>ELI
<td>3</td>
</tr>
<tr>
<td>2.504%</td>
<td>ARE
<td>3</td>
</tr>
<tr>
<td>2.504%</td>
<td>ARA
<td>3</td>
</tr>
<tr>
<td>2.504%</td>
<td>ALA
<td>3</td>
</tr>
<tr>
<td>2.504%</td>
<td>ACE
<td>3</td>
</tr>
<tr>
<td>2.347%</td>
<td>RTE
<td>3</td>
</tr>
<tr>
<td>2.347%</td>
<td>ICE
<td>3</td>
</tr>
<tr>
<td>2.347%</td>
<td>ATE
<td>3</td>
</tr>
<tr>
<td>2.347%</td>
<td>ALE
<td>3</td>
</tr>
<tr>
<td>2.191%</td>
<td>TSE
<td>3</td>
</tr>
<tr>
<td>2.191%</td>
<td>TERSE
<td>5</td>
</tr>
<tr>
<td>2.191%</td>
<td>SRI
<td>3</td>
</tr>
</table>
<p>Lots of three letter words and <i>much</i> lower percentages. &#8220;OTIS&#8221; is surprising to me, but I don&#8217;t do many Saturday puzzles, so who am I to say?</p>
<p>It would be really interesting to combine this with some <a href="http://en.wikipedia.org/wiki/Document_frequency">document frequency</a> numbers for the English language. This would find words which are much more common in crosswords than they are in general, i.e. crosswordese.</p>
<p>I&#8217;d include everything necessary to reproduce this here, but the puzzles are not free. See <a href="/xword-freq/">this directory</a> for the program I used to tabulate the statistics and complete word counts, both overall and for each day of the week. The first puzzle in my collection was 2006-10-23 and the last was 2009-01-19.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2009-12-26/crossword-word-frequency/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Breaking 3&#215;3 Boggle</title>
		<link>http://www.danvk.org/wp/2009-08-08/breaking-3x3-boggle/</link>
		<comments>http://www.danvk.org/wp/2009-08-08/breaking-3x3-boggle/#comments</comments>
		<pubDate>Sat, 08 Aug 2009 17:35:04 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[boggle]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=516</guid>
		<description><![CDATA[Why is finding the highest-scoring Boggle board so difficult? It&#8217;s because there are so many boards to consider: 2^72 for the 4&#215;4 case and 2^40 for the 3&#215;3 case. At 10,000 boards/second the former corresponds to about 2 billion years of compute time, and the latter just two years. Just enumerating all 2^72 boards would [...]]]></description>
			<content:encoded><![CDATA[<p>Why is finding the highest-scoring Boggle board so difficult? It&#8217;s because there are so many boards to consider: 2^72 for the 4&#215;4 case and 2^40 for the 3&#215;3 case. At <a href="http://www.danvk.org/wp/2007-02-10/one-last-boggle-boost/">10,000 boards/second</a> the former corresponds to about 2 billion years of compute time, and the latter just two years. Just enumerating all 2^72 boards would take over 100,000 years.</p>
<p>So we have to come up with a technique that doesn&#8217;t involve looking at every single board. And I&#8217;ve come up with just such a method! This is the &#8220;exciting news&#8221; I alluded to in the last post.</p>
<p>Here&#8217;s the general technique:</p>
<ol>
<li>Find a very high-scoring board (maybe <a href="http://www.danvk.org/wp/2009-02-19/sky-high-boggle-scores-with-simulated-annealing/">this way</a>)</li>
<li>Consider a large class of boards</li>
<li>Come up with an upper bound on the highest score achieved by any board in the class.</li>
<li>If it&#8217;s lower than the score in step #1, we can eliminate all the boards in the class. If it&#8217;s not, subdivide the class and repeat step #2 with each subclass.</li>
</ol>
<p><b>Classes of Boards</b><br />
By &#8220;class of boards&#8221;, I mean something like this:</p>
<style type="text/css">
.board { text-align: center; border-collapse: collapse; }
.board tbody td { border: 1px solid black; border-collapse: collapse; padding: 4px 8px 4px 8px; }
.board tbody td { font-weight: bold; }
.notable { color: red; }
.change td { padding: 2px 5px 2px 5px; }
.mb { font-family: monospace; padding: 0px 4px 0px 4px; }
</style>
<p><center></p>
<table class="board">
<tr>
<td>{a,e,i,o,u}</td>
<td>{a,e,i,o,u}</td>
<td>r</td>
</tr>
<tr>
<td>{b,c,d,f,g,h}</td>
<td>a</td>
<td>t</td>
</tr>
<tr>
<td>d</td>
<td>e</td>
<td>{r,s,t,v}</td>
</tr>
</table>
<p></center></p>
<p>The squares that contain a set of letters can take on <i>any</i> of those letters. So this board is part of that class:</p>
<p><center></p>
<table class="board">
<tr>
<td>a</td>
<td>i</td>
<td>r</td>
</tr>
<tr>
<td>d</td>
<td>a</td>
<td>t</td>
</tr>
<tr>
<td>d</td>
<td>e</td>
<td>s</td>
</tr>
<tfoot>
<tr>
<td colspan=3><a href="/boggle3.php?quick=airdatdes">189 points</a></td>
</tr>
</tfoot>
</table>
<p></center></p>
<p>and so is this:</p>
<p><center></p>
<table class="board">
<tr>
<td>o</td>
<td>u</td>
<td>r</td>
</tr>
<tr>
<td>f</td>
<td>a</td>
<td>t</td>
</tr>
<tr>
<td>d</td>
<td>e</td>
<td>t</td>
</tr>
<tfoot>
<tr>
<td colspan=3><a href="/boggle3.php?quick=ourfatdet">114 points</a></td>
</tr>
</tfoot>
</table>
<p></center></p>
<p>All told, there are 5 * 5 * 6 * 4 = 600 boards that are part of this class, each with its own score. Other fun classes of boards include &#8220;boards with only vowels&#8221; (1,953,125 members) and &#8220;boards with only consonants&#8221; (794,280,046,581 members).</p>
<p>Follow me past the fold for more&#8230;<br />
<span id="more-516"></span></p>
<p><b>Upper Bounds</b><br />
Now on to step #3 of the general technique: calculating an upper bound. This is going to be easier if we introduce some mathematical notation:</p>
<p><center></p>
<table>
<tr>
<td align=right><i>b</i></td>
<td>=</td>
<td>A boggle board</td>
</tr>
<tr>
<td align=right><i>Score(b)</i></td>
<td>=</td>
<td>sum of the scores of all the words contained on b</td>
</tr>
<tr>
<td align=right><i><b>B</b></i></td>
<td>=</td>
<td>a class of boards, i.e. <img src='http://s.wordpress.com/latex.php?latex=b%20%5Cin%20B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='b \in B' title='b \in B' class='latex' /></td>
</tr>
<tr>
<td align=right><i>Score(<b>B</b>)</i></td>
<td>=</td>
<td><img src='http://s.wordpress.com/latex.php?latex=max%28%5C%7BScore%28b%29%20%7C%20b%20%5Cin%20B%5C%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='max(\{Score(b) | b \in B\})' title='max(\{Score(b) | b \in B\})' class='latex' /></td>
</tr>
</table>
<p></center></p>
<p>An upper bound is a function <img src='http://s.wordpress.com/latex.php?latex=f%28B%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(B)' title='f(B)' class='latex' /> such that <img src='http://s.wordpress.com/latex.php?latex=f%28B%29%20%5Cgeq%20Score%28B%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(B) \geq Score(B)' title='f(B) \geq Score(B)' class='latex' />, i.e. <img src='http://s.wordpress.com/latex.php?latex=f%28B%29%20%5Cgeq%20Score%28b%29%20%5Cforall%20b%20%5Cin%20B&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(B) \geq Score(b) \forall b \in B' title='f(B) \geq Score(b) \forall b \in B' class='latex' />.</p>
<p>There&#8217;s one really easy upper bound: <img src='http://s.wordpress.com/latex.php?latex=Score%28B%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Score(B)' title='Score(B)' class='latex' />! This just enumerates all the boards in the class B, scores each and takes the maximum score. It&#8217;s very expensive to compute for a large class of boards and hence not very practical. You and I both know that no board in containing only consonants has any points on it. We don&#8217;t need to enumerate through all 794 billion such boards to determine this.</p>
<p>With upper bounds, there&#8217;s a trade-off between how hard they are to compute and how &#8220;tight&#8221; they are, i.e. how closely they approximate <img src='http://s.wordpress.com/latex.php?latex=Score%28B%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Score(B)' title='Score(B)' class='latex' />. <img src='http://s.wordpress.com/latex.php?latex=Score%28B%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Score(B)' title='Score(B)' class='latex' /> is very tight but is hard to compute. At the other end of the spectrum, we know that all the words on a board are in the dictionary. So we could just sum up the scores of all the words in the dictionary and get a number, say 1,000,000. Then <img src='http://s.wordpress.com/latex.php?latex=f%28B%29%20%3D%201%2C000%2C000&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(B) = 1,000,000' title='f(B) = 1,000,000' class='latex' /> is an upper bound. It is very easy to compute, but is not very tight.</p>
<p>The trick is to hit some sort of sweet spot that strikes a good balance between &#8220;tightness&#8221; and ease of computation. Over the rest of this blog post, I&#8217;ll present two upper bounds that do this. Upper bounds have the nice property that if <img src='http://s.wordpress.com/latex.php?latex=f%28B%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='f(B)' title='f(B)' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=g%28B%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='g(B)' title='g(B)' class='latex' /> are two upper bounds, then <img src='http://s.wordpress.com/latex.php?latex=h%28B%29%20%3D%20min%28f%28B%29%2C%20g%28B%29%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h(B) = min(f(B), g(B))' title='h(B) = min(f(B), g(B))' class='latex' /> is also an upper bound. So by finding two bounds, we&#8217;ll get a third that&#8217;s better than either one alone.</p>
<p><b>sum/union</b><br />
The idea of this bound is to find all the words that can possibly occur in a class of boards. Since each word can only be found once, we can add the scores of all these words to get an upper bound.</p>
<p>To get the list of words, we use the same <a href="http://www.danvk.org/wp/2007-02-01/tries-the-perfect-data-structure/">depth-first search strategy</a> as we did to find words on a single board. The wrinkle is that, when we encounter a cell with multiple possible letters, we have to do a separate depth-first search for each.</p>
<p>At first glance, it doesn&#8217;t seem like this would be tractable for a board class like this one (alternating vowels and consonants):</p>
<p><center></p>
<table class="board">
<tr>
<td>{a,e,i,o,u}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{a,e,i,o,u}</td>
</tr>
<tr>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{a,e,i,o,u}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
</tr>
<tr>
<td>{a,e,i,o,u}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{a,e,i,o,u}</td>
</tr>
</table>
<p></center></p>
<p>In addition to the branching from going different directions on each square, there&#8217;s also a huge amount of branching from trying each letter on each square. But we&#8217;re saved by the same lesson we learned in <a href="http://www.danvk.org/wp/2007-01-30/boggle-3-succeed-by-not-being-stupid/">boggle post #3</a>: the dictionary is exceptionally effective at pruning thorny search trees. If we prune search trees like &#8216;bqu&#8217; that don&#8217;t begin words, then there doesn&#8217;t wind up being that much work to do.</p>
<p>We can find all possible words on the above board in just under 1 second. This is about 10,000 times slower than it takes to score a conventional board, but it&#8217;s certainly tractable. The resulting score is 195,944. Given that no board scores higher than 545 points, this is a wild overestimate. But at least it&#8217;s a better bound than a million!</p>
<p>This technique does especially well on boards like this one, which contains all consonants:</p>
<p><center></p>
<table class="board">
<tr>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
</tr>
<tr>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
</tr>
<tr>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
</tr>
</table>
<p></center></p>
<p>This takes 0.23 seconds to score and results in a bound of 208 points (it contains words like &#8216;crypt&#8217; and &#8216;gypsy&#8217;). We&#8217;ve already <a href="http://www.danvk.org/wp/2009-08-04/solving-boggle-by-taking-option-three/">found</a> a single board that has <a href="/boggle3.php?quick=perlatdes">545 points</a> on it. So we can eliminate this entire class of 794 billion boards. That&#8217;s a speed of over 3 trillion boards/second! Of course, this board class is not typical.</p>
<p>It&#8217;s also worth pointing out why this upper bound isn&#8217;t tight. Consider this class of boards:</p>
<p><center></p>
<table class="board">
<tr>
<td>{a,i}</td>
<td>r</td>
<td>z</td>
</tr>
<tr>
<td>f</td>
<td>z</td>
<td>z</td>
</tr>
<tr>
<td>z</td>
<td>z</td>
<td>z</td>
</tr>
</table>
<p></center></p>
<p>You can find both &#8220;fir&#8221; and &#8220;far&#8221; on boards in this class, but there aren&#8217;t any boards that contain <i>both</i>. So while each &#8220;fir and &#8220;far&#8221; contribute a point to the upper bound, they should only really contribute a single point. The sum/union bound doesn&#8217;t take into account the relationships between various letter choices. It&#8217;s the best trade-off between computability and &#8220;tightness&#8221; we&#8217;ve seen so far, but it&#8217;s not good enough to make the problem tractable.</p>
<p><b>max/no mark</b><br />
In the sum/union upper bound, we dealt with multiple possible letters on the same square by trying each and adding the resulting scores (taking care not to count any word twice). But why take the sum of all choices when we know that any given board can only take on one of the possibilities? It would result in a much better bound if we took the max of the scores resulting from each possible choice, rather than the sum. This is the idea behind the &#8220;max/no mark&#8221; bound.</p>
<p>This is a huge win over sum/union, especially when there are many squares containing many possible letters. It does have one major drawback, though. The sum/union bound took advantage of the fact that each word could only be found once. With the max/no mark bound, the bookkeeping for this becomes completely intractable. The words we find by making a choice on one square may affect the results of a choice somewhere else. We can&#8217;t make the choices independently. The optimal set of choices becomes an optimization problem in its own right.</p>
<p>Rather than deal with this, max/no mark just throws up its hands. This is what the &#8220;no mark&#8221; refers to. In the past, we&#8217;ve recorded the words we find by <a href="http://www.danvk.org/wp/2007-02-10/one-last-boggle-boost/">marking the Trie</a>. By not marking the Trie with found words, we accept that we&#8217;ll double-count words sometimes. But it still winds up being an excellent upper bound.</p>
<p>Lets try some of our previous examples:</p>
<p><center></p>
<table class="board">
<tr>
<td>{a,e,i,o,u}</td>
<td>{a,e,i,o,u}</td>
<td>r</td>
</tr>
<tr>
<td>{b,c,d,f,g,h}</td>
<td>a</td>
<td>t</td>
</tr>
<tr>
<td>d</td>
<td>e</td>
<td>{r,s,t,v}</td>
</tr>
<tfoot>
<tr>
<td colspan=3 align=center>sum/union: 2880</td>
</tr>
<tr>
<td colspan=3>max/no mark: 1307</td>
</tr>
</tfoot>
</table>
<p></center></p>
<p>Alternating vowels and consonants:</p>
<p><center></p>
<table class="board">
<tr>
<td>{a,e,i,o,u}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{a,e,i,o,u}</td>
</tr>
<tr>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{a,e,i,o,u}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
</tr>
<tr>
<td>{a,e,i,o,u}</td>
<td>{b-d,f-h,j-n,p-t,v-z}</td>
<td>{a,e,i,o,u}</td>
</tr>
<tfoot>
<tr>
<td colspan=3>sum/union: 195944</td>
</tr>
<tr>
<td colspan=3>max/no mark: 15692</td>
</tr>
</tfoot>
</table>
<p></center></p>
<p>A class that can be entirely eliminated:</p>
<p><center></p>
<table class="board">
<tr>
<td>{b,d,f,g,j,k,m,p,v,w,x,z}</td>
<td>a</td>
<td>{s,y}</td>
</tr>
<tr>
<td>{i,o,u}</td>
<td>y</td>
<td>a</td>
</tr>
<tr>
<td>{s,y}</td>
<td>{c,h,l,n,r,t}</td>
<td>{c,h,l,n,r,t}</td>
</tr>
<tfoot>
<tr>
<td colspan=3>sum/union: 2497</td>
</tr>
<tr>
<td colspan=3>max/no mark: 447</td>
</tr>
</tfoot>
</table>
<p></center></p>
<p>max/no mark isn&#8217;t always better than sum/union:</p>
<p><center></p>
<table class="board">
<tr>
<td>{b,d}</td>
<td>a</td>
<td>{b,d}</td>
</tr>
<tr>
<td>a</td>
<td>{b,d}</td>
<td>a</td>
</tr>
<tr>
<td>{b,d}</td>
<td>a</td>
<td>{b,d}</td>
</tr>
<tfoot>
<tr>
<td colspan=3>sum/union: 9</td>
</tr>
<tr>
<td colspan=3>max/no mark: 132</td>
</tr>
</tfoot>
</table>
<p></center></p>
<p>This is something of a worst-case because, while there are relatively few distinct words, there are many different ways to find them.</p>
<p><b>Putting it all together</b><br />
Our two bounds do well in different situations. max/no mark works best when there are lots of choices to be made on particular cells and there are relatively few ways to make any particular word. sum/union works best when there are lots of possibilities but relatively few distinct words. Putting them together results in a bound that&#8217;s good enough to find the best 3&#215;3 boggle board using the technique described at the beginning of this post.</p>
<p>Given an initial class of boards, we wind up with what I call a &#8220;breaking tree&#8221;. If the initial class has an upper bound less than 545 points, then we&#8217;re done. Otherwise, we pick a cell to split and try each possibility.</p>
<p>Here&#8217;s a relatively small breaking tree that results from running <a href="http://code.google.com/p/performance-boggle/source/browse/trunk/3x3/ibucket_breaker.cc">this program</a>:</p>
<pre>
$ ./3x3/ibucket_breaker --best_score 520 --break_class "bdfgjkmpvwxz a sy iou xyz aeiou sy chlnrt chlnrt"
(     0%) (0;1/1) bdfgjkmpvwxz a sy iou xyz aeiou sy chlnrt chlnrt (820, 77760 reps)
                            split cell 4 (xyz) Will evaluate 3 more boards...
(     0%)  (1;1/3) bdfgjkmpvwxz a sy iou x aeiou sy chlnrt chlnrt (475, 25920 reps)
(33.333%)  (1;2/3) bdfgjkmpvwxz a sy iou y aeiou sy chlnrt chlnrt (703, 25920 reps)
                            split cell 5 (aeiou) Will evaluate 5 more boards...
(33.333%)   (2;1/5) bdfgjkmpvwxz a sy iou y a sy chlnrt chlnrt (447, 5184 reps)
(    40%)   (2;2/5) bdfgjkmpvwxz a sy iou y e sy chlnrt chlnrt (524, 5184 reps)
                            split cell (iou) 3 Will evaluate 3 more boards...
(    40%)    (3;1/3) bdfgjkmpvwxz a sy i y e sy chlnrt chlnrt (346, 1728 reps)
(42.222%)    (3;2/3) bdfgjkmpvwxz a sy o y e sy chlnrt chlnrt (431, 1728 reps)
(44.444%)    (3;3/3) bdfgjkmpvwxz a sy u y e sy chlnrt chlnrt (339, 1728 reps)
(46.667%)   (2;3/5) bdfgjkmpvwxz a sy iou y i sy chlnrt chlnrt (378, 5184 reps)
(53.333%)   (2;4/5) bdfgjkmpvwxz a sy iou y o sy chlnrt chlnrt (423, 5184 reps)
(    60%)   (2;5/5) bdfgjkmpvwxz a sy iou y u sy chlnrt chlnrt (318, 5184 reps)
(66.667%)  (1;3/3) bdfgjkmpvwxz a sy iou z aeiou sy chlnrt chlnrt (509, 25920 reps)
</pre>
<p>The numbers in parentheses are the upper bounds. When they get below 520 (the parameter I set on the command line), a sub-class is fully broken.</p>
<p>Using this technique and the following partition of the 26 letters:</p>
<ul>
<li>bdfgjvwxz
<li>aeiou
<li>lnrsy
<li>chkmpt
</ul>
<p>I was able to go through all 262,144 (=4^9) board classes in about six hours on a single machine. This resulted in the boards I listed in the <a href="http://www.danvk.org/wp/2009-08-04/solving-boggle-by-taking-option-three/">last post</a>. Six hours is a big improvement over two years!</p>
<p>If that same factor (two years to six hours) held for the 4&#215;4 case, then we&#8217;d be down to 380 years of compute time to find the best 4&#215;4 boggle board. Or, equivalently, 138 days on 1000 machines. That&#8217;s still a lot. We&#8217;re not quite there yet, but we&#8217;re getting closer!</p>
<p>Code for the program that went through all possible board classes can be found <a href="http://code.google.com/p/performance-boggle/source/browse/trunk/#trunk/paper">here</a>. While <a href="http://ai.stanford.edu/~chuongdo/boggle/index.html">many</a> <a href="http://ankurdave.com/AnkurDaveExtendedEssay2009.pdf">people</a> have found high-scoring boards, I haven&#8217;t found any previous work on this upper bounding approach. So if you have any ideas/suggestions on how to improve the bound, they&#8217;re probably novel and useful!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2009-08-08/breaking-3x3-boggle/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Chart of time.h Functions</title>
		<link>http://www.danvk.org/wp/2009-02-24/chart-of-timeh-functions/</link>
		<comments>http://www.danvk.org/wp/2009-02-24/chart-of-timeh-functions/#comments</comments>
		<pubDate>Wed, 25 Feb 2009 05:04:02 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/?p=459</guid>
		<description><![CDATA[Here&#8217;s a handy chart of the C Standard Library functions in time.h: The ovals are data types and the rectangles are functions. The three basic types are: time_t: number of seconds since the start of the UNIX epoch. This is always UTC! struct tm: A broken-down date, split into years, months, seconds, etc. In Python, [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a handy chart of the <a href="http://en.wikipedia.org/wiki/C_Standard_Library">C Standard Library</a> functions in <code><a href="http://en.wikipedia.org/wiki/Time.h">time.h</a></code>:</p>
<p><img src="http://www.danvk.org/wp/wp-content/uploads/2009/02/unixtime.png" alt="unixtime" title="unixtime" width="400" height="370" class="aligncenter size-full wp-image-458" /></p>
<p>The ovals are data types and the rectangles are functions. The three basic types are:</p>
<ul>
<li><b>time_t</b>: number of seconds since the start of the UNIX epoch. This is always UTC!</li>
<li><b>struct tm</b>: A broken-down date, split into years, months, seconds, etc. In Python, it&#8217;s a tuple.</li>
<li><b>string</b>: Any string representation of a time, e.g. &#8220;Wed Jun 30 21:49:08 1993&#8243;.</li>
</ul>
<p>Generally you either want a <code>time_t</code> (because it&#8217;s easy to do arithmetic with) or a <code>string</code> (because it&#8217;s pretty to look at). So to get from a <code>time_t</code> to a <code>string</code>, you should use something like <code>strftime("%Y-%m-%d", localtime(time()))</code>. To go the other way, you&#8217;d use <code>mktime(strptime(str, "%Y-%m-%d"))</code>.</p>
<p>This library has been around <a href="http://books.google.com/books?id=D7FVAAAAMAAJ&#038;q=mktime+date:0-1982&#038;dq=mktime+date:0-1982&#038;lr=&#038;as_brr=0&#038;as_pt=ALLTYPES&#038;ei=8c-kSdfON5POkAS21byrBg&#038;pgis=1">since at least 1982</a>. It&#8217;s been replicated in many other languages (Python, Perl, Ruby). We seem to be stuck with it.</p>
<p>Read on for my rant about why this is all idiotic.<br />
<span id="more-459"></span></p>
<p>Let me just say that I think this is a <i>horrible</i> system. You almost never want to use <code>struct tm</code>. Most of the time, you want to go between <code>strings</code> and <code>time_t</code>. But lonely <code>ctime</code> is the only function that makes this jump, and it doesn&#8217;t let you set the output format or time zone.</p>
<p>The names are not exactly descriptive, either. They all end in &#8220;time&#8221;, which makes some sense. <code>strptime</code> and <code>strftime</code> are even OK, if a bit cryptic. The <code>p</code> stands for &#8220;parse&#8221; and the <code>f</code> stands for &#8220;format&#8221;, ala <code>printf</code>. The parameter order is hard to remember, though. Don&#8217;t use <code>gmtime</code> unless you have a good reason. <code>ctime</code> and <code>asctime</code> are non-sensical, but I don&#8217;t use them much, either. My greatest loathing is reserved for <code>localtime</code> and <code>mktime</code>. I can <i>never</i> remember which of these does which. Only mnemonic I can think of: <code>mktime</code> <i>m</i>a<i>k</i>es a <i>time</i>_t from a struct tm.</p>
<p>For another exercise in head-scratching, follow the role of time zones through this chart. <code>time_t</code> knows no time zones &#8212; it&#8217;s always UTC. To get to <code>struct tm</code>, you need to specify a time zone. This is not made explicit in the struct, however, so you need to do your own bookkeeping. The time zone for conversion isn&#8217;t a parameter or anything sensible like that, either. You just get two choices: GM (UTC) time or local time. And if you choose gmtime, you&#8217;ll never be able to get back to time_t because that function doesn&#8217;t exist. (Some systems supply a <code>mkgmtime</code> or <code>timegm</code> function.)</p>
<p>How would I design it? <code>struct tm</code> would lose its place at the center of everything. There would be sensibly-named functions to go between <code>time_t</code> and <code>string</code>:</p>
<ul>
<li><code>time_t parsetime(format, string[, timezone])</code></li>
<li><code>string formattime(format, time_t[, timezone])</code></li>
</ul>
<p>And if you really need them:</p>
<ul>
<li><code>splittime(time_t, struct tm*)</code></li>
<li><code>time_t packtime(struct tm*)</code></li>
</ul>
<p>Was that really so hard?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2009-02-24/chart-of-timeh-functions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Draggable Table Columns</title>
		<link>http://www.danvk.org/wp/2008-06-12/draggable-table-columns/</link>
		<comments>http://www.danvk.org/wp/2008-06-12/draggable-table-columns/#comments</comments>
		<pubDate>Thu, 12 Jun 2008 07:41:44 +0000</pubDate>
		<dc:creator>danvk</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.danvk.org/wp/2008-06-12/draggable-table-columns/</guid>
		<description><![CDATA[Inspired by the sorttable library, I&#8217;ve done some Javascript hacking over the last day and created dragtable, a complementary library which lets you drag column headers around to rearrange HTML tables. A demo will make everything clear: Name Date Favorite Color Dan 1984-07-12 Blue Alice 1980-07-22 Green Ryan 1990-09-23 Orange Bob 1966-04-21 Red Drag the [...]]]></description>
			<content:encoded><![CDATA[<p>Inspired by the <a href="http://www.kryogenix.org/code/browser/sorttable/">sorttable</a> library, I&#8217;ve done some Javascript hacking over the last day and created <a href="/wp/dragtable/">dragtable</a>, a complementary library which lets you drag column headers around to rearrange HTML tables. A demo will make everything clear:</p>
<table width=100%>
<tr>
<td align=center>
<table id=table class="thin draggable" cellpadding=2>
<tr>
<th>Name</th>
<th>Date</th>
<th>Favorite Color</th>
</tr>
<tr>
<td>Dan</td>
<td>1984-07-12</td>
<td>Blue</td>
</tr>
<tr>
<td>Alice</td>
<td>1980-07-22</td>
<td>Green</td>
</tr>
<tr>
<td>Ryan</td>
<td>1990-09-23</td>
<td>Orange</td>
</tr>
<tr>
<td>Bob</td>
<td>1966-04-21</td>
<td>Red</td>
</tr>
</table>
</td>
</tr>
</table>
<p>Drag the column headers to rearrange the table. dragtable is incredibly easy to use. To make a table rearrangeable, just add <code>class=draggable</code> to the <code>table</code> tag. And, if you set <code>class="draggable sortable"</code>, you can have a table that&#8217;s simultaneously sortable and rearrangable! For more details and a download link, check out the <a href="/wp/dragtable/">dragtable</a> page.</p>
<p>I&#8217;m calling this v0.9 since I&#8217;m sure there are plenty of bugs and tweaks left to make. I&#8217;d love to get some feedback, so take it for a spin and tell me what you think!</p>
<p><b>Update:</b> I&#8217;ve added full-column dragging and bumped the version to 1.0. Head on over to the <a href="/dragtable/">dragtable</a>, grab a copy, and let me know what you think!</p>
<p><script type=text/javascript src="/dragtable/sorttable.js"></script><br />
<script type=text/javascript src="/dragtable/dragtable.js"></script></p>
<style type=text/css>
  /* Sortable tables */
  table.sortable thead {
    background-color:#eee;
    color:#666666;
    font-weight: bold;
    cursor: default;
  }
  table.thin, table.thin td, table.thin tr, table.thin th {
    border: thin solid black;
    border-collapse: collapse;
  }
</style>
]]></content:encoded>
			<wfw:commentRss>http://www.danvk.org/wp/2008-06-12/draggable-table-columns/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>

