Gödel, Escher, Bach

When I read Gödel, Escher, Bach by Douglas Hofstadter, I went slowly and tried to work all of the exercises and puzzles. This page has all that I did that’s worthy of not being lost. There is suprisingly little on the web for some of these topics, for a book as old as GEB.

BLooP and FLooP

I wrote an interpreter for the BLooP and FLooP languages, which has its own page.

Flip sequences

Download: gflip.py hflip.py

As described on page 137, the recurrences

G(0) = 0
G(n) = n − G(G(n − 1)) for n > 0
H(0) = 0
H(n) = n − H(H(H(n − 1))) for n > 0

give rise to tree structures. (The node G(n) is the parent of node n for n > 0; likewise for H.) The challenge is to flip the trees from left to right, renumber the nodes left to right, and find an algebraic formula for the elements of the sequence newly implictly defined by the tree.

After trying on my own, I turned to the On-Line Encyclopedia of Integer Sequences to see if it had been solved. At the time, it hadn’t. G and H are A005206 and A005374 respectively.

The G-flip sequence was already in the database as A123070. Despite what the comments say, they are not an answer because they don’t have an algebraic form for the sequence. I contributed gflip.py to the database entry.

The H-flip entry was not in the database, so I made my mark on mathematics by submitting A154951. It includes hflip.py.

I have a conjecture for the definition of the G-flip sequence:

G′(0) = 0
G′(1) = 1
G′(n) = (n − 1) − G(G(n − 2) − 1) for n > 1

It is correct to at least n = 10,000. You can write the last recurrence more perspicuously by defining a helper:

X(0) = 0
X(n) = (n − 1) − G(X(n − 1)) for n > 0
G′(n) = (n − 1) − X(X(n − 1)) for n > 1

In 2015, Pierre Letouzey proved a recurrence for G-flip in terms of G-flip itself:

G′(0) = 0
G′(1) = 1
G′(2) = 1
G′(3) = 2
G′(n) = n + 1 − G′(1 + G′(n − 1))

oeis-py.py formats Python programs for submission to the OEIS. seq.py is the program I used to fool around with these sequences and find the conjectured G′.

Self-reproducing programs

Download: eniuq.py crab.py

Some ideas about self-reproducing programs are on pages 498 and 499. This is a quine in Python using the eniuq technique.

def eniuq(t):
	print t + "(" + repr(t) + ")"

eniuq('def eniuq(t):\n\tprint t + "(" + repr(t) + ")"\n\neniuq')

This is a “crab” program that prints itself backwards. There is a blank line at the beginning to to balance the one that print adds at the end.


def reverse(s):
	return "".join(reversed(list(s)))

def crab(t):
	print reverse(t + "(" + repr(t) + ")\n")

crab('def reverse(s):\n\treturn "".join(reversed(list(s)))\n\ndef crab(t):\n\tprint reverse(t + "(" + repr(t) + ")\\n")\n\ncrab')

Here is a crab program in the BLooP-like language. It assumes the existence of a REVERSE procedure but not any string concatenation operators such as those used by the Python version. You can see how similar it is to Hofstadter’s ENIUQ.

DEFINE PROCEDURE ''CRAB'' [TEMPLATE]: PRINT [PERIOD, RIGHT-BRACKET,
QUOTE-MARK, REVERSE [TEMPLATE], QUOTE-MARK, LEFT-BRACKET, REVERSE [TEMPLATE]].
CRAB['DEFINE PROCEDURE ''CRAB'' [TEMPLATE]: PRINT [PERIOD, RIGHT-BRACKET,
QUOTE-MARK, REVERSE [TEMPLATE], QUOTE-MARK, LEFT-BRACKET, REVERSE [TEMPLATE]].
CRAB'].

The program would produce this output when executed.

.]'BARC
.]]ETALPMET[ ESREVER ,TEKCARB-TFEL ,KRAM-ETOUQ ,]ETALPMET[ ESREVER ,KRAM-ETOUQ
,TEKCARB-THGIR ,DOIREP[ TNIRP :]ETALPMET[ ''BARC'' ERUDECORP ENIFED'[BARC
.]]ETALPMET[ ESREVER ,TEKCARB-TFEL ,KRAM-ETOUQ ,]ETALPMET[ ESREVER ,KRAM-ETOUQ
,TEKCARB-THGIR ,DOIREP[ TNIRP :]ETALPMET[ ''BARC'' ERUDECORP ENIFED

Quine link

There is a super quine page by David Madore.

Typogenetical self-rep

Download: typo.py.
Try ./typo.py --help for usage instructions.

The hardest challenge was the Typogenetical self-rep on page 512:

...it would be most interesting to devise a self-replicating strand. This would mean something along the following lines. A single strand is written down. A ribosome acts on it, to produce any or all of the enzymes which are coded for in the strand. Then the enzymes are brought into contact with the original strand, and allowed to work on it. This yields a set of “daughter strands”. The daughter strands themselves pass through the ribosomes, to yield a second generation of enzymes, which act on the daughter strands; and the cycle goes on and on. This can go on for any number of stages; the hope is that eventually, among the strands which are present at some point, there will be found two copies of the original strand (one of the copies may be, in fact, the original strand).

I spent fruitless hours with pencil and paper, thinking there must be a short, self-evident solution. To cut to the chase, I wrote a program that found these seven self-reps: CGTATCTCCG, CGTATCTCTG, CGTCTCTAAG, CGTCTCTAGG, CGTTTCTTTG, CGTTTTTCTG, and CGTTTTTTTG.

(I admit to being so frustrated that I did a web search to see if a solution was even known. When I saw that there were some (linked at the end), it doubled my resolve to find an original of my own, or at least to find someone else’s through the sweat of my brow.)

The program works by brute force, testing all strings of length 1, then of length 2, and so on. It runs each strand through two generations and looks for two copies of the original string. The rules for this game are left vague by Hofstadter, so people have been forced to formalize it, and in so doing decide for themselves just what qualifies as a self-rep. I first tried searching for unimpeachable strings: those that coded for a single unique enzyme at each generation, with enzyme binding always happening at the leftmost position. But this wasn’t finding results. So I resorted to trying every possible pairing of enzymes with strings, at each possible binding site, with the possibility that any enzyme is unused. Assuming that there are no errors in the program, it is possible to state that there are no strings shorter than ten units that self-replicate in two or fewer generations, at least under my own interpretation of the typogenetic rules.

What that means is that there are many possible evolutions for each of these strings, but at least one of them results in self-replication. I’ll show the steps by which each one self-replicates. They all work in a similar way, complementing an entire string and then complementing the complement to make the original. For enzymes I will write cop–rpy–rpu–rpu–cop(C) to indicate that it binds to C.

CGTATCTCCG

The string codes for the enzyme cop–rpy–rpu–rpu–cop(C). Let it bind to the C at the beginning. cop turns copy mode on. rpy goes to T, rpu goes to A, then rpu goes to the final G. The entire string is complemented. The final cop has no effect because copy mode is already active. At the end of the first generation we have the strings

CGTATCTCCG CGGAGATACG

CGTATCTCCG again codes for cop–rpy–rpu–rpu–cop(C), and CGGAGATACG codes for cop–ina–ina–rpy–cop(G). Discard the second enzyme and let cop–rpy–rpu–rpu–cop(C) bind to the first C in CGGAGATACG. After cop turns on copy mode, rpy finds T, rpu finds A, then rpu finds the final G. The whole string is complemented, making another copy of the original string. After the second generation there are the strings

CGTATCTCCG CGGAGATACG CGTATCTCCG

If discarding an enzyme that could have bound to a string is unsatisfying, this string can self-replicate without doing that in three generations. Let cop–rpy–rpu–rpu–cop(C) bind to the first C in CGGAGATACG as before, and cop–ina–ina–rpy–cop(G) bind to the final G in CGTATCTCCG. This latter binding turns on copy mode, inserts two As with their complement Ts, then falls of the right end. At the end of this second generation are the strings

CGTATCTCCG CGTATCTCCG CGTATCTCCGAA TTC

We have the same old cop–rpy–rpu–rpu–cop(C) and cop–ina–ina–rpy–cop(G), along with another copy of cop–ina–ina–rpy–cop(G) because the AA codes for nothing, and lpu(A) from TTC. We need to create another copy of CGTATCTCCG without destroying the one we already have, so let cop–rpy–rpu–rpu–cop(C) bind to the first C in CGTATCTCCG to produce the copy, and let lpu(A) bind to the A in CGTATCTCCG to do nothing. The other two enzymes can bind how they may.

CGTATCTCTG

The string codes for cop–rpy–rpu–rpu–lpy(C), which we let bind to the first C. The final lpy is a no-op just like the cop was in the previous example, so the net effect is to produce the complement, leaving the strings

CGTATCTCTG CAGAGATACG

The new string codes for mvr–ina–ina–rpy–cop(G). Now we don’t have the dilemma of whether to discard an enzyme. We can bind the new enzyme to the last G in CGTATCTCTG; the mvr ends its action without making any changes. The other enzyme is a copier like before; applying it to the second string makes another copy of the first.

CGTATCTCTG CAGAGATACG CGTATCTCTG

CGTCTCTAAG

The enzyme is cop–rpu–rpu–rpy–del(C); bind it to the leftmost C. After copy mode, rpu finds C, rpu finds A, and rpy falls off the end. This makes the complement CTTAGAGACG, which codes for off–rpy–ina–ina-cop(G). Apply the new enzyme to the end of the first string, where rpy falls off the end, leaving it unchanged. The original enzyme binds to the beginning of the second string and makes a copy of the original.

CGTCTCTAGG

The enzyme here is cop–rpu–rpu–rpy–ing(C). It works just like the previous example, making the complement CCTAGAGACG that codes for mvl–rpy–ina–ina–cop(G). This can bind to the last or second-to-last G in the original string, where the mvl–rpy combination falls off the end. The new enzyme makes a copy of the original string.

CGTTTCTTTG

This codes for cop–lpu–rpu–lpu–lpy(G). All the previous initial enzymes have bound to C, so this one has to work differently, copying right-to-left instead of left-to-right. It binds to the final G. lpu goes leftward to the G in the second unit, rpu goes back to the end, lpu goes back to the second unit, then lpy goes to the initial C, traversing the whole string with copy mode on. This leaves the original string and its complement:

CGTTTCTTTG CAAAGAAACG

The new string codes for three enzymes, split by the two AAs: mvr, ina, and cop. GEB does not say how enzymes of only one segment bind, but it doesn’t matter here. We of course bind the original enzyme to the final G of the new string to make a copy of the original. We may then either discard the short enzymes, or consider them to bind to A by default, so they have nowhere to bind in the original string.

CGTTTTTTTG

This makes the enzyme cop–lpu–lpu–lpu–lpy(G). As in the previous example, a copy is made right-to-left by binding to the final G. The complement, CAAAAAAACG, codes for mvr and cop. Just as in the previous example, we may disregard these short enzymes and use the original enzyme on the complement string.

One can imagine this string being constructed by hand. But then, why is it not shorter? Why exactly seven Ts? The answer in this case is “binding preference.” The enzyme from CGTTTTTG would bind to T, not G.

I tested more strings up to CGCACGGATAC at depth 2 and up to CGTCTA at depth 4 without finding any more self-reps. There's nothing special about those strings; it's just where I decided to kill the script.

Other self-reps

The review of chapters XVI and XVII by Kai-Mikael Jää-Aro has a good explanation of CGTTTTTTTG. I saw this one before I ran the full-fledged version of my program, so I knew it would terminate soon after it got to CGAAAAAAAA. I was surprised when it found self-reps that are slightly lexicographically smaller.

There is a typogenetics simulator written in Scheme. It was used to find palindromic strings that self-replicate in one generation. The shortest one, CAAAAAAACGTTTTTTTG, is of length 18, which my program would probably never reach.