Friday, January 04, 2008

Subtleties of Instance Variable Initialization

One of the more mundane corners of programming with objects is the constructor, in particular the initializing of instance variables. In this post we'll find that behind this pedestrian feature are some subtle questions to be answered by implementers.

Consider the following Java class:

public class R
{
public int x = 1;
public int y = x+1;
}
Pretty clear that x is 1 and y is 2, right? What if you switch the order of the declarations?
public class R
{
public int y = x+1;
public int x = 1;
}
This one won't compile. Pretty clear that the intent is the same, x=1 and y=2, but Java won't allow forward references to data members when initializing. Why?

The short answer is because Java evaluates the instance initialization in a certain order. But why is that?

Before answering that, let's compare this example with a couple of dynamic languages. First, Javascript:
// Construct an object with two different orderings depending on its argument
function DynamicDependence(state)
{
return {
x : (state? 1 : this.y-1),
y : (state? this.x+1 : 2)
}
}
var dd = new DynamicDependence(true)
document.write('x='+dd.x+' y='+dd.y+'<br/>')
var dd2 = new DynamicDependence(false)
document.write('x='+dd2.x+' y='+dd2.y+'<br/>')
You'll get this output:

x=1 y=NaN
x=NaN y=2
Javascript is cool with this in the sense that we get no errors. But we do get NaN, not just for the first example (x=1, then y=x+1) but for both orderings.

Can you begin to spot some subtleties in the way objects are constructed? There's something going on, and it happens when there's a dependence between the instance variables. Let's look at Ruby:

class DynamicDependence
attr_accessor :x, :y
def initialize()
@x = $state? 1 : @y-1
@y = $state? @x+1 : 2
end
def to_s()
"{x=#{@x}, y=#{@y}}\n"
end
end

$state = true
dd = DynamicDependence.new
puts dd
# Emits {x=1,y=2}

$state = false
dd2 = DynamicDependence.new
puts dd2
# DynamicDependence.rb:4:in `initialize': undefined method `-' for nil:NilClass (NoMethodError)
In Ruby, the first ordering works, as it did for Java, and the second ordering fails.

The nice thing, for clarity's sake here, about Ruby is that it doesn't offer the option of "implicit initialization" that Java does -- you spell out all initialization in a sequence determined by you in the constructor. The convenient initializers in Java and Javascript have to be done some algorithmic way, which is nice unless the algorithm isn't in the order you particularly wanted. When spelled out as it is in Ruby, we can see that the question of what happens when initializing these members depends on how the "uninitialized" instance variable is viewed by the language. Notice I didn't try this example in C++.

Let's now return to the question of why order of initialization matters. After all, you and I can figure out that when stating "mathematically" that x=1 and y=x+1 it doesn't matter what order you specify these, it means that x=1 and y=2.

What if we wrote the original example this way instead:

public class R
{
public int x() {return 1;}
public int y() {return x()+1;}
}
Suddenly we get the behavior we want, and we can switch the order of declaration. We can declare the y method first, and we get no problems when evaluating x or y.

The trick to all of this is that Java, Javascript, Ruby and virtually every other language out there uses call-by-value, or eager, evaluation of the expressions supplied to the instance initialization. The reason an order needs to be imposed is that in order to supply a value for a field (instance variable), the language needs to evaluate the expression supplied, and it needs to do it at construction time, not later when it's actually used.

Even initializing fields in a certain order is not enough to prohibit problems. For example, we can fool Java's forward declaration checker:

public class BreakJava implements BreakJavaInterface
{
public int x = wreck(this);
// Halting Problem =>
// compiler can't in general tell wreck(this) refers to y
// so this call is allowed even though it will cause problems
public int y = 2;
public int gety() {return y;}

private static int wreck(BreakJavaInterface b)
{
return 2 / b.gety(); // div-by-zero exception at runtime
}

public static void main(String[] args)
{
BreakJava bj = new BreakJava(); // div-by-zero
}
}

interface BreakJavaInterface
{
abstract int gety();
}
Voila, the foolish programmer was able to hang himself yet again.

I know, you've been thinking "who cares" about these funny examples, right? You care if you're designing or implementing an object-based language (with call-by-value evaluation), because you need to think what happens when initializing the instance variables. Maybe you say "fine, I'm implementing Ruby, where I don't need to support an instance initialization syntax, and the usual variable assignment work gets me the desired behavior for free." Fine, the decision is up to you, but there is a decision to be made, because these subtle differences between languages didn't occur by accident. Even so, I can give you a perfectly natural, non-contrived example of instance initialization with dependences:

// Dirt simple Javascript logger
function Logger(name)
{
return {
ERROR : 1,
WARN : 2,
DEBUG : 3,
TRACE : 4,
id : name,
level : this.DEBUG,
log : function(lvl,msg) {
if(lvl > this.level)
return
document.write(this.id+' '+this.level+' '+msg)
}
}
}
var logger = new Logger('example')
We would love to define these constants ERROR, WARN, etc. that go with the class, and we'd also like to initialize this.level to DEBUG. But in this case, level is going to be undefined even though it looks like it could work.

There's some deeper magic going on here that I hope to explore next time. In the meantime, ask yourself what rules you would like there to be for instance variable (non-method) initialization, and how you would implement them.

1 comment:

Anonymous said...

A related topic:
lenient evaluation