Saturday, March 17, 2007

Progress report: Reproducing the TIOBE PCI

In which reproducing the TIOBE PCI is found to be trickier than it first appeared.

Recently we looked at how TIOBE describes that its Programming Community Index is computed. The description, while detailed, somewhat elusive, and curiosity overcame me. I've been hacking at a system for reproducing the results.

I've now got results, and they don't match what's on the web. The current disconnect is that I forgot about the phrase "for the last 12 months" that comes at the end of "The search query is executed for the regular Google, MSN, and Yahoo! web search and the Google newsgroups and blogs."

So, before I give you some notes about what I've learned so far, let's compare my results (so far) with what TIOBE published for March:

PositionLanguageMy calculated rankingTIOBE ranking


Anonymous said...

Where are the results? Something's amiss!

jfklein said...

Whoops, I didn't realize this draft got published! I decided to postpone publication of my results since there were still some discrepancies, and it looked like it would take a few hours of digging to figure it out, hours which went into much-needed home renovation work.

I had also gotten feedback from one of my hard-core readers (defined as: someone I actually know) that this was not at all an interesting topic and have left the project languish since March. The code is still around though.

Since someone thought to leave a comment, I'm going to leave this post here for now.