Wednesday, February 20, 2008

What to Expect When You're Computing for Speed

I've been surprised how many times I'll be discussing programming system X and hear a complaint about the performance of X on task T, when on hearing the details of T, I respond with something akin to "A five ounce bird cannot carry a one pound coconut." Some tasks are inherently asking too much of the hardware, and some common sense tests can often be applied to understand what to expect. The other thing is that when you're staring at a screenful of code, all monospaced, all the same font size, it can be hard to spot the performance bottlenecks. Even a simple diagnostic like logging statements timestamped with a fine-grained timer are so valuable in identifying runtime hotspots.

If anything, this is why you study computer architecture: to understand the basic operation and limitations of modern hardware, and to build a reasonable mental model of performance as you create the software architecture sitting atop the hardware. Planning a cross-country trip differs based on whether you're using an RV, a Camry, or an F16, and what kind of load you expect to carry. Know your platform, and know your basic performance requirements or performance-sensitive use cases.

If you do get into trouble, here are some things to try:

1. I/O tends to trump everything. Listen to the disk. Is it working? A lot? Look at your disk's rotation speed. Can you get a faster disk? I once kept an older computer with a twice-slower CPU for testing because it had a fast hard drive that made all the difference. Think about other types of I/O access like web services HTTP requests or net-mounted file access. Would a fatter network pipe help?

2. Try to have enough main memory for the problem at hand. Is the CPU busy all the time when you expect it to be? Or is CPU idle because your problem is swapping?

3. Don't ignore cache effects. Cache tends to be slow and narrow (not many bits transferred at once).

4. Use wall clock timings as your best measure of performance, rather than CPU-only times, to account for I/O performance. If your machine's background noise level of other activities is high, you may consider that part of the problem to be addressed.

Sometimes issues do come down to something silly in a language implementation. Don't be afraid to add what are essentially tests of the language and its implementation to your product test suite. Is method dispatch for an object with hundreds of fields disproportionately slow? It matters for your application, so it matters for you. If you make a mod to the open-source implementation of your Python interpreter, now you have a regression test in case someone ever runs the test suite with an updated Python that may address the issue.

No comments: