JFKBits: 2009

Thursday, July 23, 2009

Speedlink: Engine Yard SHA-1 Contest Winner Writeup

Another blog from the Engine Yard's winning team: Paul Wins an iPhone.

This fills in some of the details left out in the more prominent post, such as the hand-optimized C code for getting 10x improvement on the SHA-1 for ordinary CPUs (not the GPU).

Tuesday, July 21, 2009

A Crack at Cracking the EngineYard's SHA-1 Challenge

That was fun. This is a brief report on my strategy for the just-concluded Engine Yard programming contest. My farm of thirty-something CPU cores, with Polar SSL's SHA-1 hash code and my wrappings managed to come in about #57 with a Hamming distance of 36 for key dijkstra parameters format i10n expo dedicated crispin programming output inspect beta.

I wasn't originally going to enter, but after reading some of the blogs and forum discussions, my interest was piqued and I started thinking about the problem.

I began to get pulled in by counting up the order of magnitude for the search space, and the claim of someone in a discussion forum that he could "solve" the contest in about 5 minutes with about 30,000 nodes. My estimate was that the search space was about 10^63, large enough to make a lot of decisions pretty clear.

Also, I had higher priority activities (work, family, intermittent power outages that were traced back to a tree rubbing away the insulation on one wire of our service line and neutral), so I didn't start coding for the contest until it was already underway, which of course is a bit self-defeating, but I'm happy with where I ended up.

Hardware: I have a collection of nodes at work that are typically unused, most of them of relatively recent vintage meaning somewhere around 3GHz. Mac, Linux, and Windows are all represented, most of them dual core with a couple of 4- or 8-core machines, all of them AMD 64 chips. At the peak I had 37 cores running.

Software: Prototyped in Mathematica, then recoded in C. Polar SSL's SHA1, perfectly acceptable Hamming distance function based on a bytewise bit counting lookup table, and a simple enumeration technique for generating candidate phrases that ignores the capitalization and random character suffix parts of the search space.

Parallelism: Manually start the contest executable by hand:


% nohup contest &

Not scalable, but fine for the dozen-odd machines I had. I did two or three rounds of manually killing all the executables and restarting them with an updated version, which did get old. To distribute the workload, the executable picks a random initial phrase. Random generator is seeded with the random seed based on the current time and process ID.

Tracking minimum: contest executable is compiled with the current best known minimum.

Status reporting: All programs are started from the same network file system directory and write updates there. When an executable finds a new minimum, it writes the phrase to a file using the Hamming distance, hostname, and PID in the filename. My Windows explorer window, sorted by name looks something like this, and I just ignore or delete files with higher scores from nodes that don't know a better score has been found:


36-mylinuxbox-10485.txt
36-my8corewindowsbox-7361.txt
38-mylinuxbox-9587.txt
38-mymacmini-82075.txt
38-an8corelinuxbox-30256.txt
...

Performance: the SHA-1 algorithm (of course) seemed to be the limiting factor for my C code, clocking in somewhere around 1M hashes/sec. My initial simple and stupid method of forming candidate phrases was responsible for cutting performance in about half:

sprintf(phrase,"%s %s %s %s %s %s %s %s %s %s %s %s",
  contestDictionaryWords[word[11]]], contestDictionaryWords[word[10]]], 
  contestDictionaryWords[word[9]]], contestDictionaryWords[word[8]]], 
  contestDictionaryWords[word[7]]], contestDictionaryWords[word[6]]], 
  contestDictionaryWords[word[5]]], contestDictionaryWords[word[4]]], 
  contestDictionaryWords[word[3]]], contestDictionaryWords[word[2]]], 
  contestDictionaryWords[word[1]]], contestDictionaryWords[word[0]]]);

All this copying is horribly slow, but it got me underway. Before retiring for the night on Monday, I refined it so this copy was only done every 1000 times.

When I went back and wrote something that optimally copies only what's changed, I saw around a 50% to 75% speed improvement. One of the tricks in the improvement is to make two copies of the dictionary: one with a space appended to each word (for the first 11 words) and one without the space. Here's how the final phrase initialization code looks:


void initPhrase()
{
  int i, w;
  unsigned char *p;

  initRandom();

  /* Start with a random enumeration to partition work */
  for(i=0; i < 12; i++)
    word[i] = random() % 1000;

  p = phrase;
  for(i=11; i > 0; i--){
    w = word[i];
    strcpy(p, contestDictionaryWordsWithSpace[w]);
    phrasepartend[i] = p += contestDictionaryWordWithSpaceLen[w];
  }

  w = word[0];
  strcpy(p, contestDictionaryWords[w]);
  p += contestDictionaryWordLen[w];
  phraselen = p-phrase;

  printf("initial phrase: %s\n", phrase);
}

Hamming distance: my initial code for this seemed to perform well enough that I never bothered optimizing it further, e.g. performing the XORs a 32-bit word at a time:


unsigned challengeDistance(unsigned char hash[20])
{
  int i, distance;
  distance = 0;
  for(i=0; i < 20; i++){
    distance += numOnes[challengeHash[i] ^ hash[i]];
  }
  return distance;
}

For each byte, xor the hash with the corresponding byte in the challenge hash, and use a precomputed lookup table to count the ones.

Performance reporting: The main program structure is a while(1) loop that generates and evaluates candidate phrases in batches of a few million. These batches are times and current results are printed to the console. I had dozens of console windows where I could keep an eye on progress:


390392448 evals, 1000000.000000 eval/sec, 40 session min, 36 min
400392448 evals, 909090.909091 eval/sec, 40 session min, 36 min
410392448 evals, 909090.909091 eval/sec, 40 session min, 36 min
420392448 evals, 909090.909091 eval/sec, 40 session min, 36 min
430392448 evals, 1000000.000000 eval/sec, 40 session min, 36 min
440392448 evals, 909090.909091 eval/sec, 40 session min, 36 min

Conclusion: I put together a modest-performing program that performed about 100 billion searches during the contest and found a phrase that had a score of 36. One thing I'd like to do is analyze the performance of the SHA-1 hash implementation itself and understand why it takes a few thousand machine cycles, what the limiting factor is, and whether there's a better performing implementation.

Tuesday, April 28, 2009

What Tools do I Need to Program from a Winnebago

Geez, does this guy get it: Patrick Collison blog : Hacking for fun and profit with Mathematica and the Google Analytics API. It's always nice to see Mathematica used in more traditional programming and software prototyping senses.

The maps in Patrick's post prompted me to post a fun result I got yesterday.

Below is a map of the United States showing me where all the comfy cities are --- according to my definition of comfortable. These are the cities that show up in CityData where the current temperature, grabbed from WeatherData, lies within my personally-defined comfort zone of 74-84 degrees Fahenheit (23.2C-28.9C).

The post's title is a stackoverflow.com nod and hints at the real question I had. What I was curious to do but didn't have time is to explore the historic temperatures and answer the question "if I lived in a motor home, where could I go during the year so that I was always living in a temperature range that I specify?" I don't really want to do that, I'm used to the four seasons and lots of weather variety, but I started to wonder what those places would be.

There's absolutely nothing clever to the code for this, it's sheer brute force actually. We just ask for all the U.S. cities in the database, look up their temperature, select those in the right range, and then plot them.


Needs["Units`"];

CityHasDesirableTemperatureQ[city_, low_, high_] := 
 With[{lowC = ConvertTemperature[low, Fahrenheit, Celsius], 
   highC = ConvertTemperature[high, Fahrenheit, Celsius], 
   temp = WeatherData[city, "Temperature"]},
  lowC <= temp <= highC];

allCities = CityData[{All, "UnitedStates"}];

Graphics[{Gray, CountryData["UnitedStates", "Polygon"], 
  PointSize[Large], Red, 
  Point[Reverse[CityData[#, "Coordinates"]]] & /@ 
   Select[allCities, CityHasDesirableTemperatureQ[#, 74, 84] &]}]

Wednesday, March 18, 2009

Ship It

We just shipped the server side of one of my products, the Wolfram Lightweight Grid System.

I am reminded of the immortal words of Steve Bjork, famed CoCo game programmer, when he said you know your program is going to be good because you start to hate it. He seemed to capture that feeling you get when the thrill of seeing something cool come to life is long gone, and you're grinding through all the details that need to be nailed down so you can ship.

Check out some of the screencasts. Also, check out Roman's blog post for some of his longer-term perspective on parallel computing within Mathematica.

Monday, March 09, 2009

Speedlink: Harvest

Harvest is a time-tracking webapp that has a broader scope than Time Tracking Tool, and so far I like it a bit better for my needs. The web interface is just a tad more efficient than that that of TTT.

Harvest is geared for businesses and generating invoices for billable hours, so it has this concept of Timesheets that let you pick tasks from your projects. This matches how I work pretty well, even though I'm not directly billing anybody. On my timesheet I can see at a glance where my time has gone for that day, and how much time total I've spent so far. This is the number one thing I need on an ongoing basis because my schedule is so irregular and I task switch so often between home and work.

I haven't looked into what support it has for planning, and tracking how estimates match actual figures, but I imagine the support is not there.

Saturday, March 07, 2009

Speedlink: Time Tracking Tool

Time Tracking Tool, a program that I'm evaluating. This is pretty close to what I'd want to use. It has the Play/Pause notion, as well as a task hierarchy. When you activate a task, it shows the time you started, to help remind you if (when) you forget to pause it when you leave the computer. That's probably the chief drawback to the whole problem, when you get distracted from a task (phone call, someone stops by your desk to ask a question), by nature it's hard to remember to tell the computer what's going on.

Thursday, March 05, 2009

Stephen Lifts the Lid on Wolfram|Alpha

We finally have said something public about Wolfram|Alpha. The tagline for the product is "Computational Knowledge Engine", and Stephen's blog post introduces the concept this way:

Fifty years ago, when computers were young, people assumed that they’d quickly be able to handle all these kinds of things.

And that one would be able to ask a computer any factual question, and have it compute the answer.

JFKBits