Following on to "Dependent Iteration", I tried the RC filter simulation on a semi-realistic problem with 1.2 million iterations in a few platforms. The two I want to report on seem to run the gamut of fast to slow:
Language | Total | Time/Iteration |
---|---|---|
Java | 29 ms | 24 ns/iteration |
Python | 1507 ms | 1255 ns/iteration |
I ran the filter function in a loop and averaged the times to report the numbers above. The Java code is reasonably close to the metal here: on a 3GHz CPU at 0.3ns/cycle, 24ns is 72 CPU cycles to perform 3 double precision arithmetic operations and two array lookups and some loop overhead, probably with cache block fetches amortized across the life of the loop:
There must be a better way to do this than my Python code (running on version 2.5.1). Aside from whether this uses single or double-precision arithmetic, I tried using both arrays and lists, and reported the faster (the list version, actually):
public double[] lowpass(double[] x, double dt, double RC) {
double[] y = new double[x.length];
double alpha = dt / (RC + dt);
y[0] = x[0];
for(int i=1; i<x.length; ++i)
y[i] = alpha * x[i] + (1-alpha) * y[i-1];
return y;
}
One interesting thing in these exercises is that for the number of language platforms I'd like to try, it takes a nontrivial amount of time to figure out (a) how to load from a file in one-float-per-line format and (b) how to get wall-clock time within the environment.
def lowpassArray(x,dt,RC):
y=array.array('f',range(len(x)))
y[0]=x[0]
yprev=x[0]
alpha=dt / (RC + dt)
for xi in x[1:]:
y[i] = alpha * xi + (1-alpha) * yprev
yprev=y[i]
return y
def lowpassList(x,dt,RC):
yprev=x[0]
alpha=dt / (RC + dt)
y = []
for xi in x[1:]:
yi=alpha * xi + (1-alpha)*yprev
yprev=yi
y.append(yi)
return y
What about your favorite functional or scripting platform? How fast can you get this dependent iteration loop with an input size of 1.2 million?
2 comments:
I assume the python numbers are for the regular interpreter and not using psyco?
The performance numbers using psyco (http://psyco.sourceforge.net/) would be interesting.
The 1507 ms total, 1255 ns/iteration for Python is indeed for the regular interpreter.
Taking a cue from you, I installed Psyco and got 588ms total, 490 ns/iteration for the list version. Very curiously, Pysco did nothing to improve the array version; it's quite possible I'm not doing something right there.
Post a Comment