In Java in XML, JFKBits reader Mark kindly pointed us to several tools of the sort I imagined in XML-Based Languages. These tools parse the "front end" syntax and produce an AST or other form of the source input in a form that is more accessible to other programs. The material on these tools put the issues and applications more succinctly than I did, so let's sample some quotes.
Why would you want to
From an FTPOnline article about JSIS, the Java Semantic Interface Specification:
Sample applications include browsing and navigation tools, code formatting and restructuring tools, source code generation tools, code coverage and test generation tools, code analysis and metric reporting tools, style and standard-compliance reporting, UML diagram and round-trip engineering tools, and interactive source code editing, to name just a few.
What's the issue
From the introduction to JavaML:
The classical plain-text representation of source code is convenient for programmers but requires parsing to uncover the deep structure of the program. While sophisticated software tools parse source code to gain access to the program's structure, many lightweight programming aids such as grep rely instead on only the lexical structure of source code.
The reference to grep is telling: early in my career I worked for two weeks on an Emacs-Lisp program that would convert the structured pseudocode we were writing to C source code. It was a successful program, since it gave everyone a head start on their C source files, but it definitely relied on grep-style (regular expression) parsing and thus had certain inherent limitations.
From the introduction to GCC-XML:
Development tools that work with programming languages benefit from their ability to understand the code with which they work at a level comparable to a compiler. C++ has become a popular and powerful language, but parsing it is a very challenging problem. This has discouraged the development of tools meant to work directly with the language.
There is one open-source C++ parser, the C++ front-end to GCC, which is currently able to deal with the language in its entirety. The purpose of the GCC-XML extension is to generate an XML description of a C++ program from GCC's internal representation. Since XML is easy to parse, other development tools will be able to work with C++ programs without the burden of a complicated C++ parser.
This is particular nice, considering the difficulty in constructing a C++ parser.
As I said in responding to Mark's comment, it would be nice if these parser modules, if you will, would become standard practice in new language development. Tool support helps acceptance of a new language, and it seems like providing programmatic access to the AST would lower the barrier to entry for tools development.