Monday, April 30, 2007

My 1976 Dijkstra book arrives, reminds me of 1773 born sailor

Following the recommendation of a commenter to Proofs, Education and Open Problems, I ordered Dijkstra's "A Discipline of Programming" which arrived Saturday. I was able to sneak in a few minutes of reading while the baby was asleep and found it a delight. For your listening pleasure, I supply some selections.

On the language used in the book

...I came to the conclusion that none of the existing programming languages, nor a subset of them, would suit my purpose; on the other hand I knew myself so unready for the design of a new programming language that I had taken a vow not to do so for the next five years... I have tried to resolve this conflict by only designing a mini-language suitable for my purposes... --EWD, Preface, "A Discipline of Programming"
I had to smile at the several points here, not the least of which was the conclusion that none of the current languages was suitable to the task at hand. A lesser man than I might be tempted to ask "why not use MIX?" (Knuth is listed in the acknowledgments). "Predicate transformers" are at the heart of the (mini-)language he did design.

On Dijkstra's life's work

After having devoted a considerable number of years of my scientific life to clarifying the programmer's task, with the aim of making it intellectually better manageable, I found this effort at clarification to my amazement (and annoyance) repeatedly rewarded by the accusation that "I had made programming difficult." --EWD, Preface, "A Discipline of Programming"
After visualizing the expectations of parents, students, and employers from his depictions in "The Cruelty of Really Teaching Computer Science", this accusation could serve as a good summary, in the spirit of shooting the (particularly helpful) messenger: indeed, fully correct programming is difficult. (Or do I presume too much there? What do I know about fully correct programming?)

On manufacturing defects (revisited)


Finally, the book opens with an illustration that partly addresses my question about bugs appearing in an implementation of a provably correct algorithm.

In illustrating what is a computation by starting with something that is not, we are asked to imagine two pieces of cardboard: "The top one displays the text 'GCD(111,259)='; in order to let the mechanism produce the answer, we pick up the top one and place it to the left of the bottom one, on which we can now read the text '37.'" The major drawback Dijkstra points out is this:
No matter how carefully we inspect the construction of the mechanism, our confidence that it produces the correct answer can only be based on our faith in the manufacturer: he may have made an error, either in the design of his machine or in the production of our particular copy." --EWD, Chapter 0 "Executional Abstraction", "A Discipline of Programming"
I feel this satisfactorily addresses my question about whether Dijkstra would call implementation bugs "manufacturing defects." Sort of nice to see that random question addressed in the first example of the first chapter1.

This drawback, that of faith in the manufacturer of mathematical results, is not a new one. Newberry Medal winner "Carry on Mr. Bowditch" accounts the life and times of 18th century sailor/navigator/mathematician Nathaniel Bowditch who bemoans the errors he finds in Moore's navigation charts. He fusses and fumes over these, saying "mathematics is nothing if not correct." Later has cause for grief when one of those errors causes a good friend to be lost to sea during a wreck2. When Dijkstra invokes "faith in the manufacturer" I can't help but recall these scenes, and the conviction in saving lives that drives Bowditch to publish his own navigational guide, checking and rechecking each figure. I see a future day when a combination of spectacular public software failures and a quality champion or twelve will bring us to a new level of software quality.



1 Not to be confused with chapter[1], being the second chapter counted.

2 What happens next? Read the book!

Tuesday, April 24, 2007

Speedlink: Proof Designer

Just mentioning my discovery today that "How to Prove It," mentioned in Proofs, Education and Open Problems, has a companion Java applet Proof Designer. It appears to be a kind of proof assistant, helping you keep track of hypotheses, the goals and progress, with documentation and access to the various proof strategies outlined in the book.

Friday, April 20, 2007

Adding value with programming languages and their tools

Languages of any kind are a two-edged sword. With the cooperative edge, you communicate. With the political edge, you control, using the utility of and dependence on communication to suit your interests. We who are involved in developing language processors and related tools can do well to keep these two edges in mind as we seek to add value with our work.

What is adding value

Clearly programming languages add value in terms of productivity. Even if we're fighting about the finer points of static versus dynamic typing, or Perl vs. Python vs. Pure LISP, almost anything is better than the truly low level of pointing and grunting the computer uses. For most purposes today a FORTRAN or C compiler is better than an assembler.

I've been dwelling on the proposition "programming languages as notation" (my wording) as summarized by Tablizer thus:
Most of the CS literature fits the pattern found in a typical book or chapter on Boolean algebra. Generally this is the order and nature of the presentation:

  1. Givens - lay out base idioms or assumptions
  2. Play around with those idioms and assumptions to create a math or notation
  3. Show (somewhat) practical examples using the new notation or math
  4. Introduce or reference related or derived topics
Step 2 is where the value is added, and it boils down down to a "math or notation". In my experience with programming languages, I'd reinterpret that as meaning
  • math: give a mapping from one semantic domain to another (direct language translation)
  • notation: programming language syntax
With that as a working definition, we can see analytically how the first FORTRAN compiler added value. It gave programmers a more familiar notation (syntax) in that they could write expressions ("FORmulas") with infix arithmetic operators and it provided the math (mapping) from that syntax to the lower level assembly or machine language.

We've been getting lots of new languages since FORTRAN, C and LISP. Justifications for new languages have included these ideas, "closing the semantic gap" between programmer and application. The hope is to achieve similar leaps of productivity as those initial leaps from the baseline of panel switches and card readers. It's the churn to gimme a better notation. The notation isn't the only thing of course, we want a decent mapping from the syntax to the semantic domain, to meet our performance requirements, and later, our software quality requirements: "let's eliminate whole categories of errors".

One curious result of the productivity leaps made by our newer languages is that they just make us want to write even bigger programs. We're not satisfied to write the old applications in an elegant, maintainable way using the new idioms and language features. We say "well, now we can do that project we only dreamed of," and proceed to applications with ever-increasing scope. (An entertaining take on this can be found in the Law of Leaky Abstractions).

With that said, I want to present a few ideas where language tools can be applied to the productivity challenge. These are opportunities for people with knowledge of compiler and interpreter internals but does not involve yet another new language. With acknowledgment of the irony in presenting these after just bemoaning application bloat, here are a few ideas for areas that could be improved in the practice of programming:

#1: Automatic refactoring support, like Danny Dig's refactoring engine which is included in Eclipse 3.2. This helps make progress in the old can't-refactor-because-it's-not-backwards-compatible problem by providing automatic tools to upgrade client code when shipping libraries with changes in APIs.

#2: Code visualization - tools to understand the static structure and dynamic flow of code . We can also improve in historical visualization, or how code has changed between revisions.

#3: Data visualization - Debuggers let you examine and even edit data structures at runtime, why not provide similar capabilities at code writing time? Why not let programs be specified by something like Emacs record-and-play macros where you "edit" the data structures? I think part of what makes programming difficult because it's hard to automate something without seeing what's being automated. While visual languages have not really taken off, I think it's an untapped potential, for some domains anyway.

#4: Automatic test generation - automatic creation of tests based on static inspection, to supplement hand-written tests.

#5: Machine-learning helpers for programming - instead of programming directly, use a tool that accepts examples of inputs and outputs and generates rules as starter code.

#6: Integration of coding environments with networked repositories - This means things like direct interfacing of language and tools help with team wikis to help share experience between programmers. Microsoft tools have started to do this, and there is room for growth. Instead of static coding style checkers definitions, what about augmenting them with networked information? Worse than Failure's Code Snippet of the Day could be formally encoded to flag catch ridiculous code patterns immediately.

#7: Easier ways to search code by pattern, not just textually, but by structure

These tools don't need to apply to new languages. As we found with AnnoDomini, a Y2K tool for fixing COBOL programs employing type theory and itself written using a functional language, tools can follow many years after their target language.

What adding value is not

We started by mentioning the two-edged sword of languages. The second side of languages, their use as a political tool, is not necessarily a way to add value, but it is to control. This phenomenon is studied in sociolinguistics, and you may have personal knowledge of it. What languages should be taught in schools? What languages can you get in trouble for speaking? What is the "official" language, or languages? What are the motivations for the proscribing or forbidding other people to use or not use a language?

While you think about that, let's talk about plugs and sockets. Let's say we make laptops. We will need to make a power supply, something that has two plugs. One is the AC side that plugs into the wall outlet using a standard connector. The other is the DC side that plugs into the laptop. For this DC connector, we have a choice of going standard, with an off-the-shelf standard size and shape, or going proprietary, designing a unique size and shape, possibly protected by patent. (For the record, I don't actually make laptop power supplies. So if they don't, for some reason, ever ever use standard connectors, I'd love to know, but I don't think it affects the analogy.) We can see the two edges of the language blade. As long as the DC plug and the laptops connector match, they cooperate, it works; the two side "communicate" power. The shape largely doesn't matter to the function of communication. But the choice of shape does affect what other laptops the power supply works with. By choosing a standard shape, we open the possibility of wider interconnectivity. By choosing proprietary, we restrict the possibilities. If someone needs a new power supply, now you have to come to us to buy it, or to a another vendor who licenses our connector patent.

The open vs. closed DC connector choice will sound like a familiar dichotomy from many technical fields, perhaps most familiar as the operating system question. I bring it up for the programming language audience as a reminder that any programming language represents a particular shape of interface, and we need to be aware of its potential for use as a political tool.

Back to human languages. We need to be aware of who is agitating for adoption of a language, and why they are doing it. Hear these motives with an ear for the programming language arena. "You may not use that language spoken by our enemies" emphasizes the enmity on a personal level. "You must now speak our language, vanquished foe" is part of a takeover plan. Language promotion can be sinister, as when members of an ethnic faction seek independence from a larger nation. They may appeal to ethnic identity as leverage to gain power, promoting the household ethnic language over the national one taught in schools. Language promotion need not be sinister. In an extremely fragmented situation it can achieve the communication productivity we spoke of before. Papua New Guinea is actively promoting Tok Pisin as a means of unifying its citizens who natively speak thousands of quite different languages, a situation created in part by its poor network connectivity. No, that's not it, I mean by the rugged terrain and isolated mountain villages. You may see language promotion taking place, and wonder what the language itself does for people. But we need to look for the bigger picture, which includes understanding who is behind the promotion and what they are trying to achieve. "You must use this language because you belong to this group" plays up uniqueness of the group, as with Finnish. Modern Finnish was developed in the 19th century as part of a growing nationalism to help distinguish it from neighboring Sweden and Russia. The emphasis on uniqueness, of preserving culture is interesting. Cultural forms involving language include expressions, wise sayings, jokes, stories, songs and poetry. These form a body of useful works. Preserving them is valuable for those steeped in the culture. You could say they have a vested interest in the culture.

Among other reasons, I think one reason languages are given away for free by businesses is that it helps spread adoption of the interface, creating a hook to create business in other forms. Once a customer has paid to develop working code in a language, it represents a vested interest in that language. We can think of the vast collection of COBOL applications as the prime example.

New languages have had it tough. In the 80's and 90's, without an absolutely killer feature, such as infix expressions when the alternative is per line arithmetic, new languages were being developed, but perhaps not rapidly. I'd venture that new languages are only good for new projects and new people (called "youth"). The rise of the web has brought with it a slew of opportunities for both those factors to coexist, giving rising to a growth in the number of available languages to use both browser side and server side. As the web matures, I see companies still trying to take advantage of the momentum to adopt new languages (*cough* Adobe Flex *cough*), and while it's exciting I'm starting to see the other side. Is it really a boost in productivity if we're retooling our businesses and schools to use a new language and environment every few years? Or are we moving sandpiles around, taking the same basic program (no pun intended) from one person's vested interests to another? Be aware of the other edge of the language blade. Adopting or creating a new language may not be adding value. Or if it is, you need to know, adding value for whom?


For more reading:
Cylindrical types of DC connectors at Wikipedia

Language politics at Wikipedia

Culture of Finland at Wikipedia

Proofs, Education and Open Problems

Announced on the Types mailing list today: http://tlca.di.unito.it/opltlca/, a convenient list of open problems for research in the area of typed lambda calculus.

After reading Dijkstra's writings about formal proofs being essential to computer science, I feel ill at ease to call myself a computer scientist. After reading his derision of software engineering, I feel equally uncomfortable calling myself a software engineer. Maybe I should try to learn a bit more of my impulse purchase How to Prove It by Daniel Velleman. Then I can take on the whole list, starting with a direct proof of the confluence of βη  -strong reduction!

I was comparing notes with my wife on the role of proofs in education. She was introduced to proofs in sixth grade. She didn't understand how I could be so lacking in familiarity with proof technique, which is what "How to Prove It" teaches. My experience was simply the straightforward geometry proofs that I think are common to U.S. education, and my undergrad computer science degree was also relatively light on proofs, or at least on the level of rigor expected from my graduate school experience. Thus I frequently lament my weak background in proofs.

The Dijkstra essay has been Redditized a couple of times with good discussion. I characterize what Dijkstra wants to accomplish as this: sitting in a library in a comfortable chair with a stack of blank paper and pencils, crafting both algorithm and its proof. This is the Dijkstra picture of computer science, I gather, and he appears to scoff at testing. If someone wants to implement the algorithm, it has to be keyed in, perhaps translated to a programming language, and then what do you suppose should be done to ensure no errors were made in translating or keying this embodiment of the provably correct algorithm? Why, I suppose the program should be tested. I wonder if Dijkstra would call such errors "manufacturing defects."

One Reddit reader claims, that the problem with formal proofs of algorithms is that they run into the halting problem. That is, you end up needing to run the program being proved to understand it. I don't know if I buy it, but it's an intriguing idea, because a proof is done on a static representation of an algorithm.

I would like to improve my understanding of formal methods, so I'll hopefully get around to reading How to Prove It, and take a deeper look at Hoare logic. I'd also like to work on understanding the connection between programs, tests, and proofs because they do seem related in a deeper way than is usually acknowledged.

As I was reminded recently, "clear thinking is useful in any field", and whether you endorse formal proofs as the computing cure-all or if you're a web monkey, anything you can do to advance the frontier of knowledge, understanding and wisdom is a welcome contribution.

Wednesday, April 11, 2007

Dijkstra computer inversion quotes

For a long while I've been aware of this Edsger Dijkstra's twisty aphorism:

Computer Science is no more about computers than astronomy is about telescopes.
The way I've interpreted this is that computer science uses computers, certainly, but the ideas, especially those of algorithms, are applicable in other areas and perhaps other embodiments of the things we call "computers."

I say this quote is "twisty" because it smacks the common understanding onto its head, and makes you feel a bit silly for associating computers with something called computer science.

Today I ran across another Dijkstra gem in the same vein when answering his own question "What is programming?":
We can view the program as what turns the general-purpose computer into a special-purpose symbol manipulator, and does so without the need to change a single wire... I prefer to describe it the other way round: the program is an abstract symbol manipulator, which can be turned into a concrete one by supplying a computer to it.

--From "The Cruelty of Really Teaching Computer Science"
You can see the shadow of the first quotation in the second one. Modern digital computers considered a passing fad in the march of computer science progress, and something that could give concrete life to the abstract algorithm specification we call a program.

Tuesday, April 03, 2007

Algorithm Derby

How many of us have had this idea? You have a problem to solve, and more than one way to do it. You're not sure which is fastest, so you hold a race. Let all your algorithms take a whack at it on parallel processors and use whichever result shows up first.

Well, the patent for that idea was issued to NCR in 1997.

From the abstract:

A parallel search method for a parallel processing computer which employs different and competing search procedures to search a collection of data. A search problem is formulated by a data management application. The competing search procedures are created from the formulation of the search problem and pre-existing rules. The competing procedures are executed. Additionally, execution of the competing search procedures may be monitored to obtain information about the effectiveness of the competing procedures, and the competing procedures may be changed based upon the information. In those search cases where a time limit is necessary, a timer is started at the start of execution, and a solution is returned to the data management application.


At this point I have no word on whether NCR is licensing this patent or not.