Tuesday, August 15, 2006

Contract Violation in Tomcat or "Hey, I was still using that!"

This is a bug story. I don't have all the detail yet. But I know what I saw.

If you're really impatient, you can skip to the end ("Light Dawns").

Scene of the Crime

Picture a Java Server Page (JSP) which runs some code to check the server to see if some heavy computing job is done yet. If it is, the HTML page returned will immediately redirect to a page showing the results of the job. Otherwise the page will wait a few seconds and refresh. A polling loop.

The JSP takes two parameters, a job identifier and a status string, like this:


Now assuming that the job will take a long time, what would you expect to see on your browser? Probably not this sequence:

Fetch #1 of http://seti.example.com/search/poller.jsp?jobid=28&whatsup=Searching%20for%20ETs...

Waiting on job 28.
Searching for ETs...

and three seconds later, Fetch #2 of

Waiting on job 28.
Searching for ETs...

and three seconds later, Fetch #3 of http://seti.example.com/search/poller.jsp?jobid=28&whatsup=Searching%20for%20ETs...

Waiting on job null.

A program with certain inputs that produced one output the first two times it was invoked, and then a different output. Either my senses deceived me, the program had changed for some mysterious reason, or there were more inputs involved than I had in mind.

An Investigation is Opened

In trying to debug this, I found that while the HttpServletRequest's getParameter method was returning null, the getQueryString showed the right stuff, "jobid=28&whatsup=Searching%20for%20ETs...". I searched the web, and found there are categories of problems in Java servlet land where by forwarding and redirecting and various contortions you can end up losing your query string. But still, why should the result change after it was just working seconds earlier?

The Plot Thickens

The above simulated screenshots are not what I saw originally. In my JSP I had a null check, and redirected to an error page in that case. This stopped the looping. But when I removed the null check and let the polling loop continue, I found that the null condition went away. In other words, on one request the query string had the right stuff, but the request parameter map was empty. On the next request, the request parameter map was populated again.

A Correlation

This waiting page is what gets returned by a request to initiate the heavy duty compute job. What I was trying to do was emulate a common metaphor where the user makes a request of the server which will take some time to process, and a "waiting" page gets immediately returned. I decided to run Tomcat in the debugger and step through this initial request, where the waiting page URL got formulated and all that. Let's see roughly what that code looked like:

void handleSetiSearch(HttpServletRequest request, HttpServletResponse response)
String id = allocateJobId();
String waitingPage = "poller.jsp?jobid"+id

// Start heavy computing job in another thread
SearchParameters params = getSearchParameters(request);
TimerTask performSetiSearch = new SetiSearch(params,request);
Timer timer = allocateTimer();
timer.schedule(performSetiSearch, 0);

This is the method that is invoked from the servlet's doGet or doPost methods, and handles the web request. Unimportant details have been reduced to a method call above.

I put a breakpoint here and in the SetiSearch code (not the real code I was using, if you've been wondering). After the above shown method returned, so that the browser got its poller.jsp page, but with the debugger stopped at the beginning of the SetiSearch code, the browser would loop the poller page for a long time without mishap.

Only when I started stepping through the compute job code did the poller page suddenly get a null parameter.

You now have all the clues, Sherlock. If you're a Tomcat developer or a very experienced Java webapp developer, you're probably screaming the answer. But let everybody else think for a few moments to see if they can spot the error.

Light Dawns

The insight is in noticing that while the web request is handled when the handleSetiSearch method returns, the SetiSearch object running in a separate thread after the request is handled still has the request object in its hot little hands.

And guess what? Tomcat recycles HttpServletRequest objects.

As this forum discussion shows, back in Tomcat 3 they were finding a significant performance boost by recycling request and response objects, instead of always allocating and garbage collecting them.

My suspicions were confirmed when I started printing the request objects and noticed that the same request object used to handle the original "SETI search" request had been recycled into the poller.jsp request.

So, although I can't explain why I saw what I saw, I know that when I extracted, copied, cloned, and otherwise slurped all the information out of the request object that the big compute job required, and did not pass on the request object itself, all my troubles went away.


Whose mistake is this? Mine, of course. But it's a gray line. I'm sure the contract of "don't use a request or response object after the request is handled" is stated in some documentation somewhere. I just hope they add the line "Even if you're still post-processing the request."

So, even though Java has automatic garbage collection, this experience teaches me that memory management by the program can still be done.

Afterword: Functional languages

Could this have happened if I was writing in a functional language? Not nearly as likely, if requests are represented in a "usual" way of data constructed on the fly, and state updates modeled by creating new copies. The performance issue solved by Tomcat's management of its own memory would fall to the language implementation. For example, consider this SML snippet:

(* Sets a request attribute and forwards to a display JSP *)
fun valueAdd(request as HttpRequest{params=p,attrs=a},response) =
val newValue = database.getValue()
val request2 = HttpRequest{params=p, attrs=Hash.set(a,"value",newValue)}

The memory requirements are (1) the creation of a new HttpRequest record and depending on the hash implementation, (2) a new hash table with an additional value. This even though request and request2 probably share the same params value. Depending on analysis of the call chain, the system could possibly decide that request is never used by any valueAdd caller, and could perform an update in place of the original request object to create request2. But I'm not up on functional language implementations to know if such things are being done. Chez Scheme has a lot of performance enhancement built in. Does anybody know how Ruby would fare in a situation like this? Is there any language out there that would be up to the task of recycling very frequently allocated data as needed for web server request and response objects that would not try to reclaim an object in use after the immediate request was handled?

No comments: