12.26.09

Crossword Word Frequency

Posted in math, programming at 10:45 am by danvk

In a previous post, I discussed downloading several years’ worth of New York Times Crosswords and categorizing them by day of week. Now, some analysis!

Here were the most common words over the last 12 years, along with the percentage of puzzles in which they occurred:

Percentage Word Length
6.218% ERA 3
5.703% AREA 4
5.413% ERE 3
5.055% ELI 3
4.854% ONE 3
4.585% ALE 3
4.496% ORE 3
4.361% ERIE 4
4.339% ALOE 4
4.317% ETA 3
4.317% ALI 3
4.227% OLE 3
4.205% ARE 3
4.138% ESS 3
4.138% EDEN 4
4.138% ATE 3
4.048% IRE 3
4.048% ARIA 4
4.004% ANTE 4
3.936% ESE 3
3.936% ENE 3
3.914% ADO 3
3.869% ELSE 4
3.825% NEE 3
3.758% ACE 3

(you can click column headings to sort.)

So “ERA” appears, on average, in about 23 puzzles per year. How about if we break this down by day of week? Follow me past the fold…


Read the rest of this entry »

08.08.09

Breaking 3×3 Boggle

Posted in boggle, math, programming at 10:35 am by danvk

Why is finding the highest-scoring Boggle board so difficult? It’s because there are so many boards to consider: 2^72 for the 4×4 case and 2^40 for the 3×3 case. At 10,000 boards/second the former corresponds to about 2 billion years of compute time, and the latter just two years. Just enumerating all 2^72 boards would take over 100,000 years.

So we have to come up with a technique that doesn’t involve looking at every single board. And I’ve come up with just such a method! This is the “exciting news” I alluded to in the last post.

Here’s the general technique:

  1. Find a very high-scoring board (maybe this way)
  2. Consider a large class of boards
  3. Come up with an upper bound on the highest score achieved by any board in the class.
  4. If it’s lower than the score in step #1, we can eliminate all the boards in the class. If it’s not, subdivide the class and repeat step #2 with each subclass.

Classes of Boards
By “class of boards”, I mean something like this:

{a,e,i,o,u} {a,e,i,o,u} r
{b,c,d,f,g,h} a t
d e {r,s,t,v}

The squares that contain a set of letters can take on any of those letters. So this board is part of that class:

a i r
d a t
d e s
189 points

and so is this:

o u r
f a t
d e t
114 points

All told, there are 5 * 5 * 6 * 4 = 600 boards that are part of this class, each with its own score. Other fun classes of boards include “boards with only vowels” (1,953,125 members) and “boards with only consonants” (794,280,046,581 members).

Follow me past the fold for more…
Read the rest of this entry »

02.24.09

Chart of time.h Functions

Posted in programming at 10:04 pm by danvk

Here’s a handy chart of the C Standard Library functions in time.h:

unixtime

The ovals are data types and the rectangles are functions. The three basic types are:

  • time_t: number of seconds since the start of the UNIX epoch. This is always UTC!
  • struct tm: A broken-down date, split into years, months, seconds, etc. In Python, it’s a tuple.
  • string: Any string representation of a time, e.g. “Wed Jun 30 21:49:08 1993″.

Generally you either want a time_t (because it’s easy to do arithmetic with) or a string (because it’s pretty to look at). So to get from a time_t to a string, you should use something like strftime("%Y-%m-%d", localtime(time())). To go the other way, you’d use mktime(strptime(str, "%Y-%m-%d")).

This library has been around since at least 1982. It’s been replicated in many other languages (Python, Perl, Ruby). We seem to be stuck with it.

Read on for my rant about why this is all idiotic.
Read the rest of this entry »

06.12.08

Draggable Table Columns

Posted in programming, web at 12:41 am by danvk

Inspired by the sorttable library, I’ve done some Javascript hacking over the last day and created dragtable, a complementary library which lets you drag column headers around to rearrange HTML tables. A demo will make everything clear:

Name Date Favorite Color
Dan 1984-07-12 Blue
Alice 1980-07-22 Green
Ryan 1990-09-23 Orange
Bob 1966-04-21 Red

Drag the column headers to rearrange the table. dragtable is incredibly easy to use. To make a table rearrangeable, just add class=draggable to the table tag. And, if you set class="draggable sortable", you can have a table that’s simultaneously sortable and rearrangable! For more details and a download link, check out the dragtable page.

I’m calling this v0.9 since I’m sure there are plenty of bugs and tweaks left to make. I’d love to get some feedback, so take it for a spin and tell me what you think!

Update: I’ve added full-column dragging and bumped the version to 1.0. Head on over to the dragtable, grab a copy, and let me know what you think!


02.03.08

Reading Old GW-Basic Programs

Posted in personal, programming at 3:50 am by danvk

I found a disk image I’d made of an old hard drive of mine today (circa 1995) and had some fun browsing through my files. Back then, I was programming in a combination of QBASIC and GW-BASIC. It’s easy to read old QBASIC programs, since QB saved code as human-readable text.

Not so, GW-BASIC. To save space, it stored code in a compact, binary format. This seems like an unnecessary optimization now, but back in 1984 it made a lot of sense. GW-BASIC was an interactive environment, and it stored all your code in memory. Memory was a scarce resource at the time, so every byte counted. Hence the binary format.

I wanted to read my old GW-BASIC programs, so I dug around and found this discussion of the GW-BASIC binary file format. It’s incredibly detailed, which let me whip up a decoder in Python over two solid hours of hacking. Without further ado, here it is:

GW-Basic Program Decoder

For a sample decoding, see below the fold.
Read the rest of this entry »

12.24.07

Using Track Parser

Posted in music, programming, web at 12:27 pm by danvk

pitchfork-tracks.png Pitchfork Media has released their two standard year-end lists, the Top 100 Tracks of 2007 and the Top 50 Albums of 2007. As usual, they’ve been lampooned all over the web, including one critique in pie chart form. For me, they made for perfect listening on a long car drive this weekend.

In my case, this list led to a good use of my Track Parser script, which is in all likelihood the most useful program I’ve ever written. It’s an AppleScript for iTunes (i.e. Mac only, sorry) that lets you apply regular expressions to track names/tags. Here’s how I used it today…

Through some strange turn of events (certainly nothing to do with this), I found myself with a playlist of the top 100 tracks. The music was all there, but none of the songs had their “Artist” field filled in! Here’s where my Track Parser script came in.

I googled around and quickly found this page, which has some commentary on the list, as well as what we’re interested in: a copy of all the songs/artists in simple text form. (For what it’s worth, I agree with his reactions.)

I copied the list and ran two regular expressions to get it down to just the artist (s/ ".*//g and ^\d*: if you must know). The tracks are in reverse order of what we want (100 to 1 instead of 1 to 100). So I ran pbpaste | tac | pbcopy to put the #1 track at the top of the list. Or I would have, if Mac OS X had the tac command. Instead, I ran this monstrosity:

pbpaste | perl -ne 'push @x, $_; END { print for reverse @x }' | pbcopy

to do the same thing. In retrospect, I should have just sorted my playlist in reverse track order.

Next I went into iTunes and selected my songs. I ran “Track Parser (Clipboard)” from the Scripts menu, clicked “New Pattern” and put in “%a” to extract the artist from each line. Track Parser handled the rest. Total time: about five minutes.

12.16.07

C++ STL sort weirdness

Posted in programming at 9:59 pm by danvk

I ran into a weird bug at work this past week. What does this code do?

If you said “segfault”, give yourself a pat on the back! Bonus points if you know that changing the “20″ to a “16″ will prevent the segfault.

After spending several hours staring at this, I figured out what was going on. Rather than taking a compare function that returns an int (like qsort or Perl’s sort), it wants a “LessThan” function:

If I’ve ever been happy that C++ silently casts ints to bools, I have now done my penance. I’m still somewhat surprised that std::sort segfaults when given a strange comparison function, rather than returning an unsorted list.

10.26.07

NS-Tower in a Canvas Tag

Posted in programming, web at 2:49 pm by danvk

nsticonw.gif I recently noticed that Rice has unceremoniously purged my Owlnet site, so I’ll be moving some of its content over here. First up: my JavaScript implementation of Nagi-P Software’s NS-Tower.

This is one of the few games I’ve ever seen with only one control: jump. Your character bounces off the walls and you have to power him up for jumps. My record is 282 floors on Hard. Can you beat it? No fair using the JavaScript version, though. More details below (warning: it takes a hard right turn for the nerdy)…
Read the rest of this entry »

10.09.07

A Java Surprise

Posted in boggle, programming at 11:44 pm by danvk

java.png
I’ve always been a Java and Eclipse naysayer, but I’m afraid new experiences are forcing me to reevaluate my skepticism. The last time I used Java was JDK 1.3 on a Sparc workstation back in early 2004. Eclipse was hella slow on that hardware, and somehow my workspace wound up in a temporary directory. This was a very bad thing, because as soon as I logged out, my project was gone forever. So I had good reason to swear off Eclipse.

More generally, Java left off a mighty stink back in 2004. Any GUI that I ran on the Mac would look out of place and felt clunky. Performance was poor. But in retrospect, I suspect much of the rank Java smell was really coming from the design patterns gibberish I was being force-fed at the same time. Why use a simple array when you could use an AbstractListFactory that does the same thing with 10x code bloat?

Regular readers only get one guess what program I wrote to get in the swing of things.
Read the rest of this entry »

07.05.07

“The Cathedral and the Bazaar”

Posted in books, programming, reviews, wikipedia at 10:08 pm by danvk

esr.pngThere’s a long tradition of great titles in the software engineering world. Djikstra’s “Goto Considered Harmful” has spawned thousands of imitators, and even a meta-paper. Fred Brook’s The Mythical Man-Month clicks as soon as you understand the title. Eric S. Raymond’s “The Cathedral and the Bazaar” gives open source software its defining image.

I read “The Cathedral and the Bazaar” as an introduction to the world of open source software for someone interested in joining it. There’s a history lesson to explain where you’ve come from and what you’ve accomplished. There’s arguments and a case study to show that you’re on the right ship. And finally, the essay serves as a call to arms, to get you excited about becoming a contributor.

I found the history lesson most interesting. I’d had some understanding of this before, but lacked much detail. ESR gives a first-person account of UNIX and software development from the late 1970’s to the present. This is the canonical story of open source. It has its heroes and villains, its true believers and false idols. There’s the Moses figure, Richard Stallman, who freed the users of UNIX from the oppressive yoke of restrictive licenses. But like Moses, he couldn’t enter the promise land. Open source stagnated, awaiting its Last True Prophet. This was Linus Torvalds, who created the Linux kernel, the last piece of the open source operating system.

ESR really uses that of tone. I get the sense that he’s intimidated by Richard Stallman and absolutely idolizes Linus Torvalds. The essay drips with hero worship. Linus is the visionary whose vision he’s writing about.

Beyond the hero worship, there is a clear exposition of the open source model. In order to avoid the problem of N^2 channels of communication amongst N contributors, open-source project have a small set of core developers. These core developers have total control over the project. They decide what gets checked in, and where the project goes. It’s a (hopefully) benevolent oligarchy. Outside of that core, there are occasional contributors and legions of testers, who can submit bug reports. Does this strict hierarchy really sound like a Bazaar?

If you want a real Bazaar, think about Wikipedia. Since I’ve never contributed to an open source project, I kept it in mind as a reference point. It works pretty well, but this perspective has the side effect of making open source development look positively Cathedral-like. Think about it. Rather than having a core set of contributors and legions of users/testers, Wikipedia explicitly aims to make all of its users into contributors. It does this by lowering the barriers to entry as low as it conceivably can, even if this leads to vandalism. All that’s needed to contribute is the ability to write in some language. Last time I checked, English had a few more speakers than C++. Rather than just reporting problems, users are empowered to fix them on the spot. See a typo? Just correct it. Want a citation? Find one and plop it in to help future readers.

I enjoyed “The Cathedral and the Bazaar” for the history lesson, but I find its central image misleading. The development process of open-source projects is as well-organized as any commercial venture.

« Previous entries Next Page » Next Page »