When writing "RC Filter with OCaml" I was surprised that list append performed so poorly (is not tail-recursive). As explained earlier, the RC algorithm is defined as processing elements from the input array from first to last, but in doing so new elements in the result list must be appended, not prepended (cons). Since OCaml didn't have a constant-time way to append to a list, I was forced to prepend and reverse the result later.
It depends on the list data structure I think. If lists are represented with a simple cons cell, namely a pair of head and tail pointers, then there's no other way to do it. But lists could be represented with a list header containing information such as length and last element pointer to speed up certain operations, as well as the pointer to a chain of traditional cons cells:
Representing lists with the header makes several operations constant-time: appending or prepending an element, concatenating two lists (called append in ML), and getting the list length. It does so at an obvious memory overhead cost, as well as the time cost of an extra level of memory indirection: getting the head or tail has to read header->first as well. As I found in the generated assembly, OCaml calls a hd function anyway, so as for it I don't think that's a concern. And I imagine the Ideal Functional Language Optimizing Compiler (IFLOC) would identify places in the source where the simpler LinkedList would work and just use that.
struct ConsCell {
struct ConsCell* hd;
struct ConsCell* tl;
};
typedef struct ConsCell* LinkedList;
struct ListHeader {
unsigned length;
ConsCell* first;
ConsCell* last;
};
typedef struct ListHeader* List;
2 comments:
With that optimized representation you break the persistent style, a fundamental property of the purely functional version which allows sharing. As usual there is a tradeoff between two optimizations, in fact.
mrpingouin: Yes, I think that's well-put, and I should have mentioned it in the post. The optimizations mentioned with the list header structure cannot be used in general, but only in cases where no sharing can occur, where the compiler can determine single ownership of the list, for example.
In the case of an anonymous function passed to fold or map, like fold_left (fun ys -> fun xi -> ys @ [f(xi,hd ys)]) x0 xs, it can be seen that the accumulated result list is not shared and the compiler can take advantage of destructive updates.
I'm interested to learn of actual systems that do this type of optimization though. When I started this series I had in mind Chez Scheme and OCaml as possibly compiling the RC filter benchmark to something close to C speeds.
Post a Comment