Wednesday, December 05, 2007

Type Directed Code Search

We live in a Golden Age. You can buy Ubuntu desktop computers at Wal*Mart for under $200, there are more choices of programming language than ever, and their software libraries are rich and well-designed with lots of orthogonality. Let's say you're learning one of these new libraries, you're feeling all baroque and XMLish, and you set your heart on acquiring an org.w3c.dom.Document object. Where do people get those, anyway? Where's the parser that spits it out? After much digging, you find that a javax.xml.parsers.DocumentBuilder will make you your document. Ah, but that's an abstract class. Now, who makes DocumentBuilders? DocumentBuilderFactory, of course. This kind of searching gets old quickly.

Maybe this kind of run-around is unavoidable when learning libraries, and documentation and books can certainly help orient you, but there's another tool for our kit which may help in some cases. What might be useful is an index keyed by type that points to methods using the type, either in parameters or in the return type. This index could answer the question "where can I find methods that return type X?" or "what are all the methods that take an X as an argument?" Such an index could help both in learning as described above but in any activity where you need to search the code.

Eclipse goes pretty far in implementing this, not far enough, but enough to hint at what we could do. In the dialog below I've typed to search for "DocumentBuilder", selected to Search For "Type", and limited the search to "References". Ideally I'd like to limit the search to methods returning the given type; I want to know what methods generate this type. Also note that we've checked "JRE libraries" because we want to know what built-in library generates the DocumentBuilder:

The search results:
This feature seems a natural for Java since it is both statically typed and has the necessary reflection to allow a tool to do the analysis. However, it looks like type inference is coming to the dynamically typed languages too, including Ruby and Javascript and even Wasabi. Since types are not as much part of their language, it may not be as natural to think of searching this way, but the need will grow as more and bigger applications get written in these languages.

With a type-informed search, you can do debugging research. One thing I've seen with large team projects is the bug that has an exact duplicate lurking elsewhere. For example, you find a URL object for a web server has omitted setting the port number via the config.getPortNumber() call. Since everybody tests with the well-known default port 80, things appear to work, but a real customer configures with a different port and suddenly links break. You track it down, add the right calls, and close the bug only to have the customer call back with a different broken link, same root cause but in a different spot in the code. Once you identify a bad pattern like that, you typically want to look for other places it might occur. To find this pattern you could do a text search for URL, but a type-informed one or an AST-search is likely to cut down on unrelated references to "URL" in comments and other areas.

In conclusion, this is yet another application of type information. It's a little on the fringes, perhaps not an everyday tool, but for learning a new library, for doing research prior to refactoring or for debugging, it could be a standard feature in IDEs of the future.

No comments: