Monday, June 12, 2006

Softcivil engineering

In his recent essay "Make Vendors Liable for Bugs", Bruce Schneier argues for software liabilities to help increase computer security by aligning the industry's capability for improving security with an interest in doing so. That is, the industry lacks a compelling interest to improve security, but it has the capability to improve, so let's make them interested by holding them liable for security issues. I'd like to apply this argument to the tension between software quality and time-to-market. To help gain insights we'll compare software development with other technical fields.

"Towers are popular in Japan: You ride to the top, look down at the city far below you, wonder if the tower is constructed according to sound engineering principles, then ride the elevator back down as soon as possible. That's what I do, anyway." -- Dave Barry, Dave Barry Does Japan

Civil engineering projects, as in Dave Barry's example, often have a public safety aspect that many software projects don't evidence. Some software is of course used in ways that obviously require a high degree of quality and it is engineered well: flight control software, medical diagnostic equipment, and public safety communications equipment. But a lot of software, and I think particularly of web commerce, has a lot of public trust bound up in it, with a marked lack of quality.

What if civil engineering were as easy to do as software? What if an engineer could sit down at his workstation, bang out an XML document describing a hierarchy of bridge members, hit a key, and suddenly a bridge is constructed over the nearby test river? The engineer and twenty of his testing friends drive freight trucks over it, iron out some vibration problems, and their manager says "ship it." Even with more extensive testing, that quick turnaround from a conceptual design to reality would say "book the customers now! more testing just increases cost and drives profits down!" a lot louder than "but it might not work all the time in the next 50 years when it will be used."

Public safety concerns aside, one reason civil engineering isn't done this way is the cost of construction. There's no "XmlToBridge" converter. That, and no chance for a version 2.0, a chance to build a basic bridge and then replace it with a refined bridge with more features in a year or two. Knowing this up front helps everyone involved in the design phase accept that the design phase is critical. The construction cost also gives a quantitative guide or bound on how long is acceptable for the design phase.

In software, the cost of production is the cost of the build process, which could be a few hours for a large product, or in the case of a hosted web application written in an interpreted language, perhaps virtually nothing. This low cost of production and copying is part of what makes software, and computers in general, practical and useful. And cheap, compared with any kind of "hardware" alternative. But if we're trying to align capability for good quality with interest in good quality, the cost of production is working against us.

Hence my claim is that one of software's key advantages, rapid change, is one of the reasons software quality is poor. It lowers the apparent cost of production to a point where it's no longer a barrier that motivates you to double-check your work. Internally a software team can recognize this and take action. Maybe we need to revert to punch-card programming. Rather than trending towards continuous integration (an automated edit-compile-test system), maybe we need a mandatory waiting period between design and coding. I'm not sure what would be effective. But any proposal you come up with to increase quality by investing more time is going directly to the bottom line of software vendors by increasing that production cost. That's why if we want higher quality, we've got to quantify the costs of lower quality, and transfer some of those costs to the software vendors to make it easier for them to decide that quality is worth it.

No comments: