Tuesday, February 20, 2007

Java in XML

From some personal discussion with JFKBits readers about XML-Based Languages, I find there are some precedents to the idea of having a standard structured format, such as XML, for language Abstract Syntax Trees (ASTs).

One example is the PMD project:


PMD is a Java source code analyzer. It finds unused variables, empty catch blocks, unnecessary object creation, and so forth.

This is a language tool of the sort that I imagined would benefit from working with an XML-based AST.

One of the cool things PMD does is generate the Java AST as XML. PMD itself uses a Java parser written in JavaCC, but the Eclipse plugin has a "Generate Abstract Syntax Tree" menu option which makes a complete, detailed .ast file.

The discovery of this feature calls for an example.

Hello, World Java source

public class HelloWorld
{
public static void main(String[] args)
{
System.out.println("Hello, world.");
}
}

And just so we know what we're getting ourselves into,
Hello, World XML AST (see all of it)


<CompilationUnit beginColumn="1" beginLine="1" endColumn="3" endLine="7">
<TypeDeclaration beginColumn="1" beginLine="1" endColumn="1" endLine="7">
<ClassOrInterfaceDeclaration abstract="false" beginColumn="8" beginLine="1" endColumn="1" endLine="7" final="false" image="HelloWorld" interface="false" native="false" nested="false" packagePrivate="false" private="false" protected="false" public="true" static="false" strictfp="false" synchronized="false" transient="false" volatile="false">
<ClassOrInterfaceBody beginColumn="1" beginLine="2" endColumn="1" endLine="7">
<ClassOrInterfaceBodyDeclaration anonymousInnerClass="false" beginColumn="9" beginLine="3" endColumn="9" endLine="6" enumChild="false">
<MethodDeclaration abstract="false" beginColumn="23" beginLine="3" block="Block" endColumn="9" endLine="6" final="false" interfaceMember="false" methodName="main" native="false" packagePrivate="false" private="false" protected="false" public="true" resultType="ResultType" static="true" strictfp="false" synchronized="false" syntacticallyAbstract="false" syntacticallyPublic="true" transient="false" void="true" volatile="false">
...
</ClassOrInterfaceBodyDeclaration>
</ClassOrInterfaceBody>
</ClassOrInterfaceDeclaration>
</TypeDeclaration>
</CompilationUnit>

The bloat factor is impressive. The 7 line, 120 byte Java source turned into 52 lines (bloat factor 7.4) and 5660 bytes (bloat factor 47). I generated the AST for another program, a very modest 70-line utility to print out details of all the network interfaces, and that was 1038 lines (B.F. 14). Should I repeat that this XML AST is intended for consumption by tools, not humans?



Appendix: Full Hello, World AST




<CompilationUnit beginColumn="1" beginLine="1" endColumn="3" endLine="7">
<TypeDeclaration beginColumn="1" beginLine="1" endColumn="1" endLine="7">
<ClassOrInterfaceDeclaration abstract="false" beginColumn="8" beginLine="1" endColumn="1" endLine="7" final="false" image="HelloWorld" interface="false" native="false" nested="false" packagePrivate="false" private="false" protected="false" public="true" static="false" strictfp="false" synchronized="false" transient="false" volatile="false">
<ClassOrInterfaceBody beginColumn="1" beginLine="2" endColumn="1" endLine="7">
<ClassOrInterfaceBodyDeclaration anonymousInnerClass="false" beginColumn="9" beginLine="3" endColumn="9" endLine="6" enumChild="false">
<MethodDeclaration abstract="false" beginColumn="23" beginLine="3" block="Block" endColumn="9" endLine="6" final="false" interfaceMember="false" methodName="main" native="false" packagePrivate="false" private="false" protected="false" public="true" resultType="ResultType" static="true" strictfp="false" synchronized="false" syntacticallyAbstract="false" syntacticallyPublic="true" transient="false" void="true" volatile="false">
<ResultType beginColumn="23" beginLine="3" endColumn="26" endLine="3" void="true"/>
<MethodDeclarator beginColumn="28" beginLine="3" endColumn="46" endLine="3" image="main" parameterCount="1">
<FormalParameters beginColumn="32" beginLine="3" endColumn="46" endLine="3" parameterCount="1">
<FormalParameter abstract="false" array="true" arrayDepth="1" beginColumn="33" beginLine="3" endColumn="45" endLine="3" final="false" native="false" packagePrivate="true" private="false" protected="false" public="false" static="false" strictfp="false" synchronized="false" transient="false" typeNode="Type" varargs="false" volatile="false">
<Type array="true" arrayDepth="1" beginColumn="33" beginLine="3" endColumn="40" endLine="3" typeImage="String">
<ReferenceType array="true" arrayDepth="1" beginColumn="33" beginLine="3" endColumn="40" endLine="3">
<ClassOrInterfaceType beginColumn="33" beginLine="3" endColumn="38" endLine="3" image="String"/>
</ReferenceType>
</Type>
<VariableDeclaratorId array="false" arrayDepth="0" beginColumn="42" beginLine="3" endColumn="45" endLine="3" exceptionBlockParameter="false" image="args" typeNameNode="ReferenceType" typeNode="Type"/>
</FormalParameter>
</FormalParameters>
</MethodDeclarator>
<Block beginColumn="9" beginLine="4" endColumn="9" endLine="6">
<BlockStatement allocation="false" beginColumn="17" beginLine="5" endColumn="52" endLine="5">
<Statement beginColumn="17" beginLine="5" endColumn="52" endLine="5">
<StatementExpression beginColumn="17" beginLine="5" endColumn="51" endLine="5">
<PrimaryExpression beginColumn="17" beginLine="5" endColumn="51" endLine="5">
<PrimaryPrefix beginColumn="17" beginLine="5" endColumn="34" endLine="5">
<Name beginColumn="17" beginLine="5" endColumn="34" endLine="5" image="System.out.println"/>
</PrimaryPrefix>
<PrimarySuffix argumentCount="1" arguments="true" arrayDereference="false" beginColumn="35" beginLine="5" endColumn="51" endLine="5">
<Arguments argumentCount="1" beginColumn="35" beginLine="5" endColumn="51" endLine="5">
<ArgumentList beginColumn="36" beginLine="5" endColumn="50" endLine="5">
<Expression beginColumn="36" beginLine="5" endColumn="50" endLine="5">
<PrimaryExpression beginColumn="36" beginLine="5" endColumn="50" endLine="5">
<PrimaryPrefix beginColumn="36" beginLine="5" endColumn="50" endLine="5">
<Literal beginColumn="36" beginLine="5" endColumn="50" endLine="5" image="&quot;Hello, world.&quot;" stringLiteral="true"/>
</PrimaryPrefix>
</PrimaryExpression>
</Expression>
</ArgumentList>
</Arguments>
</PrimarySuffix>
</PrimaryExpression>
</StatementExpression>
</Statement>
</BlockStatement>
</Block>
</MethodDeclaration>
</ClassOrInterfaceBodyDeclaration>
</ClassOrInterfaceBody>
</ClassOrInterfaceDeclaration>
</TypeDeclaration>
</CompilationUnit>

2 comments:

Mark said...

For Java, there's also JavaML

For C++, there's at least gcc-xml, and perhaps more (looking at a google search for "gcc xml")

Then there's the ASIS (Ada Semantic Interface Specification) and JSIS (Java Semantic Interface Specification) approach, which use an API to access the syntax tree. (though it appears the main JSIS link has been replaced by a spam blog)

jfklein said...

Very helpful links! I like these so much I hope to pick out some choice quotes for a follow-up post.

Hopefully this sort of thing will become standard practice in new language development. Getting tool support is often helpful for acceptance of a new language, or heling it spread into large-scale projects. It seems like this practice of publishing a semantic interface would lower the barrier to entry for tools development.