Saturday, May 20, 2006

Quine Dissected

A program that prints its own source code is one of the fun puzzles in computer science. I attempted this problem a couple years ago, and now I wanted to introspect a bit on the process that led me to a solution. Solutions for C++ and Standard ML are found at the end of the post.

A first attempt in some language would look something like this:


or rather it would start like this, because of course you get stuck. Your brain's Recursion Detector goes off, and you realize you just can't go on like this, into the infinite reflections of opposing mirrors, because you'll miss lunch, and your editor won't be able to hold it all anyway.

So we take a step back. We take stock of what we know:

1. The program has to start some way.
2. Similarly, the program has to end some way.
3. We assume that our technique will involve quoting something from our program. There may be quines that don't quote anything, but we'll try it.

Armed with these observations, let's go back to our program.


This looks good for the form of a program that is going to print something static (not based on any input). However, we already tried it, and ran into the recursion problem. But let's look at the program at or around the point things went wrong:


After this, our string starts recursing. You could perhaps say it starts recursing at the second Print, but that is the first time the program appears in a quoted string.

My thought was to put a variable marker ("S" for "self") in the quoted string to mark the recursion:


My idea was to write a function, let's call it f, which takes the string above and prints it contents, except when it hits the S. At that point don't print it but instead skip it and then print the remainder. Of course the function definition would need to precede the Print call, but as long as the function includes no quotes, it's static text and falls into Observation #1, the program must start some way.

I wrote a version like this, but then realized the marker was a little arbitrary, and that really you could simply split the program into two parts, the part before the recursion point, and the part after. Why not then make a function which takes two strings.

If our program had the form A"A","B"B, where A and B were arbitrary strings of source code which did not attempt to contain their own quoted source code, then we could write a function taking the strings A and B which would print this pattern: A+quote(A)+","+quote(B)+B. This function would then be printing the pattern of the source, without having to include the quoted form of the source itself. The function quote returns a quoted form of its argument string, using the type of quotation marks appropriate to the language, and including any escaping needed (e.g. if the given string includes quotation marks itself).

From here, we can deductively arrive at an actual implementation. We decompose A and B a little bit:

  • Apreamble: The code for f plus any setup or overhead needed at the beginning of a program in the language (e.g. #includes or Program SelfPrinter).

  • Bpostamble: Code to finish up the "main" part of the program and finish the program.

Hence A=[Apreamble]f(, and B=)[Bpostamble]. (If a function call looks different in the language, the f() syntax is easily changed.) Does this work? Does it solve the problem? Let's substitute these parts into the original form:

Original form:


Substitutions from A applied:


If Apreamble and Bpostamble don't include any "recursive strings", this will work. A full example will be a little difficult to read, due to the repetitions inherent in the pattern.

Enough symbolics, let's get to a real example, in Standard ML/New Jersey. The function f is given by this definition, with some supporting definitions:

val q="\""
fun quote s = concat[q,String.toString a,q]
fun f(a,b)=print(concat[a,quote a,",",quote b,b])

The code, with the string literals removed looks like this:

val q="\""
fun quote s = concat[q,String.toString s,q]
fun f(a,b)=print(concat[a,quote a,",",quote b,b])

Hopefully now you can see how A can be replaced by all the code leading up to, but not including, the quotation mark before A, and likewise B by the code after the quotation mark following B.

Finally then, here are full solutions following this pattern for SML and then C++.

SML Quine

val q="\""
fun quote s = concat[q,String.toString s,q]
fun f(a,b)=print(concat[a,quote a,",",quote b,b])
f("let\n val q=\"\\\"\"\n fun quote s = concat[q,String.toString s,q]\n fun f(a,b)=print(concat[a,quote a,\",\",quote b,b])\nin\n f(",")\nend\n")

C++ Quine

#include <iostream>
#include <string>
typedef std::string str;
str str2lit(str s)
str r;
for(str::iterator i=s.begin(); i != s.end(); ++i)
if(*i == '\n') r += "\\n";
else if(*i == '\\') r += "\\\\";
else if(*i == '\"') r += "\\\"";
else r += *i;
return r;
void p(str a,str b){ std::cout << a << str2lit(a) << "\",\"" << str2lit(b) << b; }
int main()
p("#include <iostream>\n#include <string>\ntypedef std::string str;\nstr str2lit(str s)\n{\n str r;\n for(str::iterator i=s.begin(); i != s.end(); ++i)\n if(*i == '\\n') r += \"\\\\n\";\n else if(*i == '\\\\') r += \"\\\\\\\\\";\n else if(*i == '\\\"') r += \"\\\\\\\"\";\n else r += *i;\n return r;\n}\nvoid p(str a,str b){ std::cout << a << str2lit(a) << \"\\\",\\\"\" << str2lit(b) << b; }\nint main()\n{\n p(\"","\");\n return 0;\n}\n");
return 0;

1 comment:

Anonymous said...

why pointers?

I forget what /n is