Thursday, August 31, 2006

Web Security as it Really Is

I've been reading "Web Security: A Matter of Trust", the Summer 1997 issue of O'Reilly's "World Wide Web Journal". Wanting to understand the magic behind https: a little more, I read "Cryptography and the Web" by Simson Garfinkel and Gene Spafford, and "Introducing SSL and Certificates Using SSLeay" by Frederick J. Hirsch.

My knowledge of SSL and web security could be told like this. Surf to the front page of a site you use securely, You fill out your username and correct password, click "Login" (or hit tab-space, if you're a mouse avoider like me), and voila, you're securely surfing the space.

Meanwhile, my wife and I have been looking to buy a car, and I noticed something interesting about a popular car research website. I was logged in and happily doing my research, and happened to glance up at the URL. It started with "http:". With Ethereal, the free network sniffer, capturing packets, I re-accessed the front login page and examined the output. Sure enough, there was my email address, which the site uses for usernames, and my password, in very clear Courier New text. In crafting an email to their customer service, I came up with the following list of pros and cons for not using https: on this particular site:

  1. Pro: An attacker who captures your credentials and logs in can't really do much that reflects badly on you. They can do car research, and I guess they can send nasty email to customer service. The site actually does use SSL when you originally order the service and use your credit card.

  2. Pro: For such a popular site, maybe the server overhead of using https: on all the information queries is too burdensome

  3. Con: Since the usernames are email addresses, they can be collected and sold

  4. Con: Some people use the same password everywhere, so it still compromises a user's privacy and security to send it in the clear. Knowing one password may help gain entry elsewhere, especially when a username in other systems can be guessed from the email address used on the car site

This experience wasn't so bad. Plus, I like to be trusting, it wears me out to be suspicious of everyone and everything.

Then my heart skipped a beat when I noticed the same phenomenon happening with Google Mail. My wife had mentioned that she thought Gmail has an http: address after you log in, but I didn't think it possible (my faith in Google at work). But when I checked, she was right.

Ethereal was quickly called to the scene. I was horrified as the sniffer trace revealed email addresses of my quick contacts appear in clear text. But that ended up being all I found. The sniffer trace was loaded with SSL and TLS traffic. Google's extensive AJAX programming is apparently moving the sensitive pieces, such as credentials and mail content, via Javascript calls to https: addresses while keeping the main page framing the content as quickly-served static clear http.

A nice design, very economical, securing only what needs to be. Indeed, why waste Google server time encrypting who-knows how many copies of the same Gmail page with unique keys? Still, it seems to be moving in the wrong direction when a user needs a network sniffer, or an HTML/Javascript code inspection to find out what is secured and what's not. Remember, Google is not securing a list of email addresses found in my contacts list. I wonder if that's how my Gmail address finally got compromised to the spammers?

Tuesday, August 15, 2006

Kingdom of Nouns Response #1

My good friend recently mentioned the Kingdom of Nouns essay, hereafter referred to as KofN, which sparked a flurry of discussion back in March 2006.

Being both a Java programmer and a functional language programmer (I liked to tell people I wrote more lines in Standard ML during college than in any other language), I found the essay provocative and interesting. Rather than responding in kind to Steve Yegge's 3,164 word essay, I've decided to post little thoughts, this being the first.

My overall response to KofN is that Steve is right, Java is lacking in the way it works with functions, but his reasons are wrong, and the remedies he gives are not quite the remedies needed.

Nouns, Verbs, and Computation

Let's talk about where the nouns and verbs come from, first. After all this is programming we're talking about, and not linguistics. While linguistics, the study of human languages, often has compelling ideas that can be applied to computer languages, the terms noun and verb in KofN come from design analysis. When designing a new system you break down your problem domain by thinking of its nouns, and make them correspond to data, and then the verbs which become functions or code. Simple, nouns=data, verbs=code.

The basic statement from KofN I want to examine now is this one:

Nouns are things, and where would we be without things? But they're just things, that's all: the means to an end, or the ends themselves, or precious possessions, or names for the objects we observe around us. There's a building. Here's a rock. Any child can point out the nouns. It's the changes happening to those nouns that make them interesting.

I really struggle to find the point of this paragraph, the "topic sentence" as the writing teachers say. It has plenty of insinuations, like "any child can point out the nouns", which help build the emotional impact. But I think the point is that nouns are not interesting by themselves, and that they're the means to an end. Very well, if that's the point, then let me respond by invoking the simplest description of computation as

  1. Receive input

  2. Do some calculation on the input

  3. Emit some output

This doesn't say anything about functional vs. procedural, about programming language, this is how computers work. At every clock cycle (or for analog computers at every instant in time) the computer is the embodiment of a function, turning inputs into outputs. If it's a desktop or server, maybe it's sitting in an idle loop waiting for input. What people care about, the whole reason the computer is there, is the output, or information.

But KofN states "nouns are a means to an end", but I think it's the other way around. The end of a computation is the output, and the means is the function. If the point in bringing this up is to promote verbs in relation to nouns, I think the case is overstated.

In light of the above model of computation, I think the programmer's job can be phrased as modeling real-world nouns as computer data structures, and implementing the functions, the verbs, that transform those data structures. Both are important, of course. But I don't think nouns are uninteresting to the end user, and I don't think the "end" of a computation is the code, except maybe for the computer itself, if it subscribes to the life-is-a-journey-not-a-destination philosophy.

The question at hand in all this is how a programming language should be designed to help programmers get things done. The point of KofN seems to be to call for the option of invoking functions as f(x,y) rather than forcing f to be "owned" by the noun x, as in x.f(y). There's certainly a strong precedent for invoking functions as f(x,y). The static methods in java.lang.Math bear testament to that.

With that thought I leave you til next time.

Contract Violation in Tomcat or "Hey, I was still using that!"

This is a bug story. I don't have all the detail yet. But I know what I saw.

If you're really impatient, you can skip to the end ("Light Dawns").

Scene of the Crime

Picture a Java Server Page (JSP) which runs some code to check the server to see if some heavy computing job is done yet. If it is, the HTML page returned will immediately redirect to a page showing the results of the job. Otherwise the page will wait a few seconds and refresh. A polling loop.

The JSP takes two parameters, a job identifier and a status string, like this:

Now assuming that the job will take a long time, what would you expect to see on your browser? Probably not this sequence:

Fetch #1 of

Waiting on job 28.
Searching for ETs...

and three seconds later, Fetch #2 of

Waiting on job 28.
Searching for ETs...

and three seconds later, Fetch #3 of

Waiting on job null.

A program with certain inputs that produced one output the first two times it was invoked, and then a different output. Either my senses deceived me, the program had changed for some mysterious reason, or there were more inputs involved than I had in mind.

An Investigation is Opened

In trying to debug this, I found that while the HttpServletRequest's getParameter method was returning null, the getQueryString showed the right stuff, "jobid=28&whatsup=Searching%20for%20ETs...". I searched the web, and found there are categories of problems in Java servlet land where by forwarding and redirecting and various contortions you can end up losing your query string. But still, why should the result change after it was just working seconds earlier?

The Plot Thickens

The above simulated screenshots are not what I saw originally. In my JSP I had a null check, and redirected to an error page in that case. This stopped the looping. But when I removed the null check and let the polling loop continue, I found that the null condition went away. In other words, on one request the query string had the right stuff, but the request parameter map was empty. On the next request, the request parameter map was populated again.

A Correlation

This waiting page is what gets returned by a request to initiate the heavy duty compute job. What I was trying to do was emulate a common metaphor where the user makes a request of the server which will take some time to process, and a "waiting" page gets immediately returned. I decided to run Tomcat in the debugger and step through this initial request, where the waiting page URL got formulated and all that. Let's see roughly what that code looked like:

void handleSetiSearch(HttpServletRequest request, HttpServletResponse response)
String id = allocateJobId();
String waitingPage = "poller.jsp?jobid"+id

// Start heavy computing job in another thread
SearchParameters params = getSearchParameters(request);
TimerTask performSetiSearch = new SetiSearch(params,request);
Timer timer = allocateTimer();
timer.schedule(performSetiSearch, 0);

This is the method that is invoked from the servlet's doGet or doPost methods, and handles the web request. Unimportant details have been reduced to a method call above.

I put a breakpoint here and in the SetiSearch code (not the real code I was using, if you've been wondering). After the above shown method returned, so that the browser got its poller.jsp page, but with the debugger stopped at the beginning of the SetiSearch code, the browser would loop the poller page for a long time without mishap.

Only when I started stepping through the compute job code did the poller page suddenly get a null parameter.

You now have all the clues, Sherlock. If you're a Tomcat developer or a very experienced Java webapp developer, you're probably screaming the answer. But let everybody else think for a few moments to see if they can spot the error.

Light Dawns

The insight is in noticing that while the web request is handled when the handleSetiSearch method returns, the SetiSearch object running in a separate thread after the request is handled still has the request object in its hot little hands.

And guess what? Tomcat recycles HttpServletRequest objects.

As this forum discussion shows, back in Tomcat 3 they were finding a significant performance boost by recycling request and response objects, instead of always allocating and garbage collecting them.

My suspicions were confirmed when I started printing the request objects and noticed that the same request object used to handle the original "SETI search" request had been recycled into the poller.jsp request.

So, although I can't explain why I saw what I saw, I know that when I extracted, copied, cloned, and otherwise slurped all the information out of the request object that the big compute job required, and did not pass on the request object itself, all my troubles went away.


Whose mistake is this? Mine, of course. But it's a gray line. I'm sure the contract of "don't use a request or response object after the request is handled" is stated in some documentation somewhere. I just hope they add the line "Even if you're still post-processing the request."

So, even though Java has automatic garbage collection, this experience teaches me that memory management by the program can still be done.

Afterword: Functional languages

Could this have happened if I was writing in a functional language? Not nearly as likely, if requests are represented in a "usual" way of data constructed on the fly, and state updates modeled by creating new copies. The performance issue solved by Tomcat's management of its own memory would fall to the language implementation. For example, consider this SML snippet:

(* Sets a request attribute and forwards to a display JSP *)
fun valueAdd(request as HttpRequest{params=p,attrs=a},response) =
val newValue = database.getValue()
val request2 = HttpRequest{params=p, attrs=Hash.set(a,"value",newValue)}

The memory requirements are (1) the creation of a new HttpRequest record and depending on the hash implementation, (2) a new hash table with an additional value. This even though request and request2 probably share the same params value. Depending on analysis of the call chain, the system could possibly decide that request is never used by any valueAdd caller, and could perform an update in place of the original request object to create request2. But I'm not up on functional language implementations to know if such things are being done. Chez Scheme has a lot of performance enhancement built in. Does anybody know how Ruby would fare in a situation like this? Is there any language out there that would be up to the task of recycling very frequently allocated data as needed for web server request and response objects that would not try to reclaim an object in use after the immediate request was handled?

Monday, August 14, 2006

Paper and pen considered important personal coding tools

When I'm writing code, I like to keep a pen and paper handy. I find that by writing down the types of a function, or the fragment of code I'm thinking about, it somehow helps me get to the point where I'm ready to type in the code editor.

Today for example, I was contemplating some refactoring and found myself sketching a call tree, to confirm my plan would work. In my notes I also see a space where I've done some brainstorming, writing four possible names for a new class I wanted to write. There's also a spot where I jotted down the six related actions that my servlet handles which I was about to change, to help me make sure I methodically covered all of them at each stage.

Does anyone else work this way? I don't think I used to do it like that. I think it came about after grinding through my master's work where I was writing a 10,000+ line SML program and frequently needed to write down the types of functions I was planning to write.