Experts may disagree, but I like to agree with LtU that the killer app for functional languages is compiler writing. I propose that one simple reason for the success is structural pattern matching, because it lets you write compact and natural tree traversal code:
datatype Expr = Atom of string | Pair of {car : Expr, cdr : Expr};
fun textFormatter(Atom a) = a
| textFormatter(Pair{car=x, cdr=y}) =
concat["(",textFormatter x,",",textFormatter y,")"]
Do OO languages have any hope of approaching this kind of conciseness? That led to my question in
Tree Traversal in Object-Oriented Languages "which of these two techniques do you like better and is there a better one?"
In that post JFKBits reader "peter" commented my "event model" was just the Visitor Pattern. I'd had another idea of what the Visitor Pattern was. After some digging I learned there are two variations both considered the Visitor Pattern, and today we'll look at what difference it makes for writing programming language processors.
Why Tree Traversal Matters
For the programming language tool writer, a tree traversal pattern is like the for loop (or other iteration pattern) for most other programmers -- after all, your bread and butter data structure is the tree, not a flat sequence.
In a word, tree traversal matters if you traverse a lot of trees - sum up the number of traversals you write in a single program or the number of processors you write over time.
You could be writing a series of toy interpreters each with a single eval loop, like this one for
Scheme and this one for
Lisp both written in Python.
On the other hand, you could have one large project with many phases. I have a 12,000 line compiler for
Ila written in Standard ML with five major tree traversals areas in the compiler core, plus some utility traversals like error message pretty printing and a Javadoc-like HTML generator. Writing that compiler one time impressed me that tree traversal pattern matters, especially since there were quite a few node types.
Further, your choice of tree traversal pattern will immediately affect the way you think about your algorithm. I think one variation of visitor makes it unnecessarily difficult, and one makes it quite a bit more natural.
Visitor Pattern in General
The visitor pattern is generally described as a way to
decouple algorithms that operate on a data structure of related node classes from the node classes themselves.
The double-dispatch technique is often described in presentations of visitor. To use the visitor pattern, take the tree object
t to traverse, a visitor object
v corresponding to the algorithm (type checking, syntax highlighting) and call the virtual method
t.accept(v) (this is the first dispatch). The node subclasses (Atom, Pair) each have an implementation of
accept that calls the visitor method corresponding to their particular class (the second dispatch, hence "double dispatch"). That way the visitor can be written as a collection of methods that each know the exact node subtype they're dealing with.
Visitor Pattern for Language Processors
What's often glossed over is the critical question
where is the knowledge about a node's children? In other words, where does the recursion/iteration occur - in the acceptor or the visitor? I originally imagined the visitor pattern was defined to mean recursion occurs in the acceptor. Either way is considered an accepted variation of the pattern, but I claim it makes a substantial difference for most compiler algorithms.
Recursion in the Acceptor
Here's recursion-in-the-acceptor for the Lisp structure above:
public abstract class Expr
{
public abstract void accept(Visitor v);
}
public class Atom extends Expr
{
public String value;
public void accept(Visitor v)
{
v.visit(this); // Atom has no children, simple accept
}
}
public class Pair extends Expr
{
public Expr car;
public Expr cdr;
public void accept(Visitor v)
{
car.accept(v); // children visited in the acceptor
cdr.accept(v);
v.visit(this); // postorder traversal
}
}
public interface Visitor
{
void visit(Atom a);
void visit(Pair p);
}
Now, given those definitions, how would you write the TextFormatter iterator that produces the same output as the SML example given earlier?
Recursion in the acceptor introduces some problems. One is that it fixes the traversal order -- postorder in this case. Now the iterators need to know what the traversal order is, a dynamic property that must be remembered when writing each visitor. Also, what if postorder isn't what you need? In particular, in order to generate the grouping parentheses, which are associated with the Pair, you would like code associated with the Pair to run both before and after visiting the Pair's children. Further, you wanted to generate a comma in between -- that means a visit in between the two children. You're forced to write something like this:
import java.util.Stack;
public class TextFormatter implements Visitor
{
private Stack m_stack = new Stack();
private String m_result = null;
public void visit(Atom a)
{
m_stack.push(a.value);
}
public void visit(Pair p)
{
// We need to know the order of children traversal. Hmmm.
String cdr = (String)m_stack.pop();
String car = (String)m_stack.pop();
String value = "("+car+","+cdr+")";
m_stack.push(value);
}
public String toString()
{
if(m_result == null)
m_result = (String)m_stack.pop();
return m_result;
}
}
Using an explicit stack is not 100% always bad -- if you tend to process programs with
extremely deep nesting, you may hit your implementation language's control stack limit, but overall this is not the style in which I'd prefer to write dozens of visitors.
Note that the SAX parser model, which follows recurse-in-the-acceptor, addresses the fixed-order problem by having multiple visit methods for nodes with children: it has
startElement as well as
endElement, and the like.
The other problem with this approach is that not every visitor needs to visit every node. Some may only need to go so deep. That brings us to the recurse in the visitor approach, which I would consider the recommended one for programming language processors.
Recurse in the Visitor
This approach has the advantage that the
accept method is always the trivial
v.visit(this), and the visitors can be written in a more natural style:
public class TextFormatter implements Visitor
{
private StringBuffer m_buff = new StringBuffer();
private String m_result = null;
public void visit(Atom a)
{
m_buff.append(a.value);
}
public void visit(Pair p)
{
m_buff.append("(");
p.car.accept(this);
m_buff.append(",");
p.cdr.accept(this);
m_buff.append(")");
}
public String toString()
{
if(m_result == null)
m_result = m_buff.toString();
return m_result;
}
}
Is this what I'd like to write as a functional programmer? Of course not. I'd prefer to have
accept return a value, so I could write
visit(Pair) more like this:
public void visit(Pair p)
{
m_buff.append("("+p.car.accept(this)+","+p.cdr.accept(this)+")");
}
Which of course means that
visit should return a type, instead of accumulating results in a member variable. But then we need to know what type to return, which in Java most likely needs to be
Object. It all gets ugly again.
Conclusion - Still Not Perfect
So, a visitor with recursion in the visitor is acceptable, but still not ideal, and it doesn't come close to the conciseness of pattern matching. The accept method needs to be added to each node class, which is a pain for the rapid prototyping, and the visitor of its nature requires that you accumulate state in member variables instead of using a functional style of combining returned results in a single expression.
Still, I'm just one guy with a certain slice of experience --- is there a better (more compact, less boilerplate) tree traversal technique for OO languages?
Or is this really the ultimate pattern, OO loses this particular battle, and
Nice multi-methods or
Scala case classes or other language extensions become necessary parts of an OO-capable language that supports programming language implementations as well as the functional languages?