In looking at OCaml-generated x86 assembly last week, I noticed that having a language which keeps undefined identifiers in symbolic form can be handy. I started making notes in a Mathematica notebook to keep track of some heap shuffling code I was trying to understand. To explain the technique, let's use this OCaml snippet:

where the floating-point arithmetic becomes this x86 code:

((1.0 -. alpha) *. (Array.get y i))

In Mathematica, I translated lines of assembly into real assignments, using the literal register names for my variable names, except at the very beginning of the assembly fragement. For example, when starting the above fragment,

fld1

fsub REAL8 PTR [eax]

fmul REAL8 PTR [esi]

`eax`points to the

`alpha`value, so I would write the read reference to

`eax`as

`alpha`,

*without defining alpha*. Similarly

`esi`points to

`Array.get y i`, so I was able to write this sequence:

It was interesting, a little startling actually, to see the original expression re-emerge by simulating running the code. I didn't get very far into it, but you can also do things like simulate writes to memory with a register transfer definition:

In a traditional language,

Mem[eax] = 2301

`eax`

*has*to be bound to a value. But in a symbolic language,

`eax`can be a symbolic expression like

`_caml_young_ptr + 4`.

Do my partial-evaluation readers have any remarks about this?

## 4 comments:

Putting jumps etc. aside and staying within basic blocks, pretty much all low-level representations of code are just serialized expression graphs, as you've seen. It's similar and in some ways even simpler on machines like the JVM and CLR: the IR is just a post-order serializing traversal of the expression tree.

Arkon and I are writing a disassembler which will use similar ideas.

Check out http://www.ragestorm.net/distorm/ for the previous version of the disassembler, and my blog for posts about the subject.

We didn't yet write a lot about it, but it's coming.

Barry: The surprising thing here for me is not to see the mapping from high-level to low-level as just that, but to see (1) the reverse mapping pops out by keeping some parts symbolic, and (2) a somewhat novel use for a symbolic language like Mathematica, as opposed to doing it all by hand. Sure (1) is not surprising if you think about it for a second, but for something with practical implications I am a bit surprised I hadn't heard of it before. Anyone can do it by hand, but since this is tedious and error prone, (2) is relevant.

I'm not familiar with Mathematic but I know this can be done with Maple as well. Just have to watch out for the built-in automatic simplification, the result may not be an accurate representation of what is actually being computed. For example if you input (a + a) Maple will respond with 2a. The symbols are supposed to represent machine numbers but Maple will assume they are real numbers which may lead to incorrect results.

And why are you asking for your partial-evaluation readers to remark, are you trying to get at something specific?

Post a Comment