Monday, July 10, 2006

Idea: Software Provenance Database

You know how Help -> About... tells you an application's version number? Wouldn't it be great if there was a similarly convenient way to find out, manually or programmatically, what version of something you've got whether it's an operating system, browser, virtual machine, or compiler? The idea here is a database of how-to-tell-what-it-is information. It would store ways for determining version info manually and also programatically, in as many different languages and environments as possible. For example, on a Mac you can get the OS name and version from a file. This works for both manual and programmatic access in almost any language. In Java you can also call System.getProperty("os.name") and get the OS name (not sure about the version number though). There can be both algorithmic and heuristic ways of determining the version of something. For example of a heuristic how-to-tell, I've heard that some machines and TCP/IP stacks can be identified by examining their output with a network sniffer. These are the kinds of things that would be useful to have cataloged somewhere.

Now somebody just needs to find or create such a database.

I did find something called "bitprints" which a collection of identifying information, such as hash values, for file images. This is certainly one way to identify a piece of software, by characteristics of its bit content, e.g. 1 million bytes long with an MD5 hash value of such-and-such. But I'd want a software provenance database to record as many ways to identify it as possible.

Hopefully later I'll post a mockup of a domain model for such a database.

If you have refinements on this idea, or know of places that realize all or part of this idea, please leave a comment.

No comments: