Tuesday, March 13, 2007

Types of garbage

At some point, many a Java programmer has had a little bulb go on, and exclaimed with dropped jaw "I thought Java couldn't have memory leaks!"

Well, you don't have leaks like in C or C++, where leak occurs simply by forgetting to call free or delete. Objects leak by default:


new std::string("hello");

is a leak. In Java, objects get deleted by default:

new String("hello");

is instant garbage to be collected. You have to do something to remember an object, like assign it to a variable or store it in a live object. And that's actually the type of "leak" that can happen, where the programmer stores a reference to an object in a long-lived structure but never uses the object again.

I was pleased to discover that the Wikipedia article on garbage collection and on garbage give terms to distinguish these two cases:

Syntactic garbage

Memory that is unreachable from variables in the program.

Semantic garbage

Memory that is unusable by further execution of the program.


The latter category is not possible to algorithmically determine in the general case (halting problem type of stuff, bad idea to want by algorithm how the execution of a program will turn out), garbage collection algorithms tend to focus on the syntactic variety of garbage.

The next time someone asks you to take out the trash, you have fuel for a smart remark. Something like "which did you mean, the objects that we're not using because they're in the trashcan, or the objects that we could use but we haven't been using, and we never will, but we don't know that for sure now, so we better not discard them yet..." Let me know how far you get with this type of reasoning as a way of avoiding your GC chores.

No comments: