One of the more mundane corners of programming with objects is the constructor, in particular the initializing of instance variables. In this post we'll find that behind this pedestrian feature are some subtle questions to be answered by implementers.
Consider the following Java class:
public class RPretty clear that x is 1 and y is 2, right? What if you switch the order of the declarations?
{
public int x = 1;
public int y = x+1;
}
public class RThis one won't compile. Pretty clear that the intent is the same, x=1 and y=2, but Java won't allow forward references to data members when initializing. Why?
{
public int y = x+1;
public int x = 1;
}
The short answer is because Java evaluates the instance initialization in a certain order. But why is that?
Before answering that, let's compare this example with a couple of dynamic languages. First, Javascript:
// Construct an object with two different orderings depending on its argumentYou'll get this output:
function DynamicDependence(state)
{
return {
x : (state? 1 : this.y-1),
y : (state? this.x+1 : 2)
}
}
var dd = new DynamicDependence(true)
document.write('x='+dd.x+' y='+dd.y+'<br/>')
var dd2 = new DynamicDependence(false)
document.write('x='+dd2.x+' y='+dd2.y+'<br/>')
Javascript is cool with this in the sense that we get no errors. But we do get NaN, not just for the first example (x=1, then y=x+1) but for both orderings.
x=1 y=NaN
x=NaN y=2
Can you begin to spot some subtleties in the way objects are constructed? There's something going on, and it happens when there's a dependence between the instance variables. Let's look at Ruby:
In Ruby, the first ordering works, as it did for Java, and the second ordering fails.
class DynamicDependence
attr_accessor :x, :y
def initialize()
@x = $state? 1 : @y-1
@y = $state? @x+1 : 2
end
def to_s()
"{x=#{@x}, y=#{@y}}\n"
end
end
$state = true
dd = DynamicDependence.new
puts dd
# Emits {x=1,y=2}
$state = false
dd2 = DynamicDependence.new
puts dd2
# DynamicDependence.rb:4:in `initialize': undefined method `-' for nil:NilClass (NoMethodError)
The nice thing, for clarity's sake here, about Ruby is that it doesn't offer the option of "implicit initialization" that Java does -- you spell out all initialization in a sequence determined by you in the constructor. The convenient initializers in Java and Javascript have to be done some algorithmic way, which is nice unless the algorithm isn't in the order you particularly wanted. When spelled out as it is in Ruby, we can see that the question of what happens when initializing these members depends on how the "uninitialized" instance variable is viewed by the language. Notice I didn't try this example in C++.
Let's now return to the question of why order of initialization matters. After all, you and I can figure out that when stating "mathematically" that x=1 and y=x+1 it doesn't matter what order you specify these, it means that x=1 and y=2.
What if we wrote the original example this way instead:
Suddenly we get the behavior we want, and we can switch the order of declaration. We can declare the y method first, and we get no problems when evaluating x or y.
public class R
{
public int x() {return 1;}
public int y() {return x()+1;}
}
The trick to all of this is that Java, Javascript, Ruby and virtually every other language out there uses call-by-value, or eager, evaluation of the expressions supplied to the instance initialization. The reason an order needs to be imposed is that in order to supply a value for a field (instance variable), the language needs to evaluate the expression supplied, and it needs to do it at construction time, not later when it's actually used.
Even initializing fields in a certain order is not enough to prohibit problems. For example, we can fool Java's forward declaration checker:
Voila, the foolish programmer was able to hang himself yet again.
public class BreakJava implements BreakJavaInterface
{
public int x = wreck(this);
// Halting Problem =>
// compiler can't in general tell wreck(this) refers to y
// so this call is allowed even though it will cause problems
public int y = 2;
public int gety() {return y;}
private static int wreck(BreakJavaInterface b)
{
return 2 / b.gety(); // div-by-zero exception at runtime
}
public static void main(String[] args)
{
BreakJava bj = new BreakJava(); // div-by-zero
}
}
interface BreakJavaInterface
{
abstract int gety();
}
I know, you've been thinking "who cares" about these funny examples, right? You care if you're designing or implementing an object-based language (with call-by-value evaluation), because you need to think what happens when initializing the instance variables. Maybe you say "fine, I'm implementing Ruby, where I don't need to support an instance initialization syntax, and the usual variable assignment work gets me the desired behavior for free." Fine, the decision is up to you, but there is a decision to be made, because these subtle differences between languages didn't occur by accident. Even so, I can give you a perfectly natural, non-contrived example of instance initialization with dependences:
We would love to define these constants ERROR, WARN, etc. that go with the class, and we'd also like to initialize this.level to DEBUG. But in this case, level is going to be undefined even though it looks like it could work.
// Dirt simple Javascript logger
function Logger(name)
{
return {
ERROR : 1,
WARN : 2,
DEBUG : 3,
TRACE : 4,
id : name,
level : this.DEBUG,
log : function(lvl,msg) {
if(lvl > this.level)
return
document.write(this.id+' '+this.level+' '+msg)
}
}
}
var logger = new Logger('example')
There's some deeper magic going on here that I hope to explore next time. In the meantime, ask yourself what rules you would like there to be for instance variable (non-method) initialization, and how you would implement them.
1 comment:
A related topic:
lenient evaluation
Post a Comment