Captain Comic randomizer seed algorithm

David Fifield <david@bamsoftware.com>

Last modified: 2022-04-10

Video game randomizers typically work by seeding a pseudorandom number generator, then calling into the PRNG whenever the randomization algorithm needs a random number. The same seed always results in the same randomized game, as long as the randomization algorithm makes the same calls to the PRNG in the same order.

The seed in the Captain Comic randomizer works differently. Rather than being the input for a PRNG, the seed directly encodes the item assignment. For every seed, there is one item assignment, and for every item assignment, there is one seed.

There's nothing wrong with the PRNG approach. The seed algorithm used in the Captain Comic randomizer is only possible because there's a fairly small space of randomized games—only about 47 bits. In more complicated games, a seed that directly encodes all the randomized parameters would be unreasonably long. The technique used in the Captain Comic randomizer has a couple of nice properties, though. First, it guarantees that the probability distribution over randomized item assignments is uniform: when you find an item in a certain location, you don't gain any information from that discovery beyond what can be inferred from the logic graph. In a PRNG-based algorithm, not all item assignments are equally likely, so you could (at least in theory) narrow down the set of possible seeds every time you see an item by eliminating the seeds that don't put that item at that location, and after observing a few items have a conditional probability for the remaining items that is far from uniform. Second, it means there is a seed for any item assignment you could want. For example there is a seed—225A396DE0DC8—that yields the vanilla item assignment; I would guess that most randomizers don't have that property.

Episode 10 of the live coding videos is devoted to elaborating the seed algorithm. Episode 32 is about making the seed representation less transparent.

Short description of the algorithm

The implementation of the algorithm is the function multiset_unrank in src/seed.rs of the source code.

Start with the input r, which is an integer representing a rank. r is derived from the seed (think of the seed as a string encoding of the rank).
Initialize a list slots to [0, ..., 23]. This is the list of item slots that have not yet been assigned. slots will be empty when the algorithm is done.
For each item i and its multiplicity m:
- Set n to the length of slots (the number of item slots remaining).
- Set b = $(\binom{n}{m})$ ; i.e., the number of combinations of n things, taken m at a time.
- Set comborank = r mod b; i.e., the remainder of dividing r by b.
- Set combo = combination_unrank(comborank, m). combo is now a list of m distinct integers between 0 and n−1.
- Assign item i to the slots of slots indexed by combo, and remove those slots from slots.
- Set r = r / b, rounded down.

The subroutine combination_unrank:

Start with the input comborank and m.
Initialize combo to an empty list.
While m > 0:
- Set c to the maximum integer such that $(\binom{c}{m})$ ≤ comborank.
- Append c to combo.
- Set comborank = comborank − $(\binom{c}{m})$ .
- Set m = m − 1.
Return combo.

Background

There are 24 stages in The Adventures of Captain Comic, with one item slot per stage.

There are 24 items in the game, not all distinct. There are 8 unique items (Corkscrew, Door Key, Boots, Lantern, Teleport Wand, Gems, Crown, Gold), 5 Blastola Colas, and 11 Shields.

The items with their multiplicities form a multiset. The output of the randomizer is a permutation of this multiset; i.e., an assignment of one item to each of the 24 item slots. We are looking for a way to assign a number to each distinct permutation. That number is called a rank and the process of recovering a permutation from a rank is unranking. A seed encodes a rank which determines the item assignment.

Warm-up: factorial number system

There's a straightforward way to rank and unrank permutations, the factorial number system.

Let's try unranking an item assignment from the rank r = 123456789012345678901234. We start with 24 empty item slots, numbered 0 to 23.

To place the Corkscrew, we take r mod 24, which will be a number between 0 and 23. In this case, 123456789012345678901234 mod 24 = 10, so we put the Corkscrew in item slot 10. Then we renumber the 23 remaining slots from 0 to 22, and divide out the Corkscrew's contribution to the rank by dividing r by 24 and rounding down. The new r = 123456789012345678901234 / 24 = 5144032875514403287551.

The next item to place is the Door Key, and there are 23 slots it might go in. Now we take r mod 23 and find 5144032875514403287551 mod 23 = 16, so the Door Key goes in the item slot that is now numbered 16. We divide r by 23 to get the new r = 5144032875514403287551 / 23 = 223653603283234925545. We renumber the 22 remaining slots from 0 to 21.

We repeat this process for every item. The assigned slot refers to the renumbered slots at each step; e.g. when the assigned slot is 0, it means to choose the leftmost slot that has not yet been assigned.

`r`	# slots	assigned slot
123456789012345678901234	24	10
5144032875514403287551	23	16
223653603283234925545	22	19
10166072876510678433	21	12
484098708405270401	20	1
24204935420263520	19	13
1273943969487553	18	13
70774664971530	17	10
4163215586560	16	0
260200974160	15	10
17346731610	14	12
1239052257	13	1
95311712	12	8
7942642	11	4
722058	10	8
72205	9	7
8022	8	6
1002	7	1
143	6	5
23	5	3
4	4	0
1	3	1
0	2	0
0	1	0

Here is the item assignment that results from the initial rank r = 123456789012345678901234:

The problem with using the factorial number system for unranking a seed is that it treats all the items as distinct—and they are not. There are multiple ranks that result in the same permutation, because the algorithm thinks each Blastola Cola is different from the other Blastola Colas, and each Shield is different from the other Shields. Because of this multiple counting, seeds are longer than they need to be: the number of seeds is 24! = 620448401733239439360000 ≈ 79 bits. Next, we will see that by not multiply counting indistinguishable permutations, we can reduce the size of the seed space to 24! / 5! / 11! = 129529505064960 ≈ 47 bits.

Permutations of a multiset

Only a small change is needed to make the above algorithm work for a multiset. Instead of assigning one item at a time, we will assign one group of distinct items at a time. We'll start by assigning the 8 unique items—up to this point the algorithm is unchanged. When we come to the point that we need to assign the 5 Blastola Colas into 16 remaining slots, we use what's left of r to select one of the $(\binom{16}{5})$ = 4368 possible combinations of 16 items taken 5 at a time. We convert an integer into a combination using the combinatorial number system, which provides an unrank function for combinations. I won't go into detail on the combinatorial number system, but you can see the combination_unrank subroutine in the algorithm description.

We'll work through an example of unranking r = 123456789012345. We start with 24 empty item slots, numbered 0 to 23.

We handle each unique item as a "group" of 1 item. There are $(\binom{24}{1})$ = 24 ways to choose 1 slot from 24 slots. To place the Corkscrew, we take r mod 24 = 9. We place the Corkscrew in item slot 9, renumber the remaining slots from 0 to 22, and divide out the Corkscrew's contribution to the rank by dividing r by 24 and rounding down. The new r = 123456789012345 / 24 = 5144032875514.

The process continues like this for all 8 unique items. Up to this point, the algorithm has been the same as the one using the factorial number system.

Now r = 4163, and we have to assign the 5 Blastola Colas into 16 remaining slots. The number of ways to do the assignment is $(\binom{16}{5})$ = 4368. We compute r mod 4368 = 4163, and therefore call combination_unrank(4163, 5) to get [15, 14, 10, 9, 3], and assign Blastola Colas to those five slots. Then we renumber the remaining slots and set r = 4163 / 4368 = 0 (rounded down).

Now we have to assign the final item group of 11 Shields to 11 remaining slots. $(\binom{11}{11})$ = 1 tells us there is only one way to do that. combination_unrank(0, 11) gives us that single combination, [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]. We assign the Shields to those slots and we are done.

Here is the trace of unranking 123456789012345.

item group	`r`	# combinations	assigned slots
1×	123456789012345	24	[9]
1×	5144032875514	23	[5]
1×	223653603283	22	[11]
1×	10166072876	21	[8]
1×	484098708	20	[8]
1×	24204935	19	[18]
1×	1273943	18	[11]
1×	70774	17	[3]
5×	4163	4368	[15, 14, 10, 9, 3]
11×	0	1	[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

The number of distinct item assignments, and therefore the number of ranks needed to represent them all, is 24! / 5! / 11! = 129529505064960. Another way to understand this number is as $(\binom{24}{1})$ × $(\binom{23}{1})$ × $(\binom{22}{1})$ × $(\binom{21}{1})$ × $(\binom{20}{1})$ × $(\binom{19}{1})$ × $(\binom{18}{1})$ × $(\binom{17}{1})$ × $(\binom{16}{5})$ × $(\binom{11}{11})$ .

Seed encoding

A rank is an integer; for input to the program we have to encode it as a string. I defined two seed encodings, which are distinguished by their first character.

1XXXXXXXXXXXX

Seeds that start with 1 directly encode a rank as 12 hexadecimal digits.

For example, the seed 1070032C6453A contains hexadecimal 070032C6453A which represents the rank 7697433249082.

2XXXXXXXXXXXX

Seeds that start with 2 are obfuscated. Like the 1 format, there are 12 hexadecimal digits representing an integer n. To recover the rank, you take r = (43ⁿ mod 129529505064961) − 1.

For example, the seed 225A396DE0DC8 contains hexadecimal 25A396DE0DC8 which represents n = 41384541031880. Then (43ⁿ mod 129529505064961) − 1 = 7697433249082 recovers the rank.

Originally, I only had the 1-format seeds. But encoding the rank so directly makes it possible to mentally calculate some information about the item assignment. To find the location of the Corkscrew, you just have to take the rank mod 24, which is just about possible to do mentally when the rank is represented as a hexadecimal numeral. The number of distinct item assignments is x = 129529505064960, and x + 1 just happens to be a prime number, which means the multiplicative group of integers mod x + 1 is a cyclic group. 43 is a generator of this group, so 43ⁿ for 0 ≤ n < x gives a permutation of the integers between 1 and x + 1 inclusive. From there we subtract 1 to put it back into the range 0 to x − 1.

The randomizer now generates 2-format seeds by default, but it still accepts 1-format seeds. So that's your reward for reading this page, you know that the randomizer accepts an additional seed format not documented on the main page. Episode 32 of the live coding videos covers the implementation of 2-format seeds.

If you want the 2-format seed for a specific item assignment, you have to do a discrete log, but a discrete log on 48 bits is no big deal. I got the rank of the vanilla item assignment, 7697433249082, by running the algorithm in reverse. Then I converted it to a 2-format seed as follows:

sage: x = 129529505064960
sage: K = GF(x+1, 'b')
sage: K
Finite Field of size 129529505064961
sage: b = K.multiplicative_generator()
sage: b
43
sage: b.multiplicative_order()
129529505064960
sage: r = 7697433249082
sage: "%012X" % r
'070032C6453A'
sage: "%012X" % discrete_log(r+1, b, x)
'25A396DE0DC8'

References

Donald E. Knuth, The Art of Computer Programming.
- Exercise 1.2.6–56 and solution, gives an algorithm to find the combinatorial representation of n.
- Section 6.5, Retrieval on Secondary Keys, Superimposed coding, shows how to convert a combinatorial representation to n.
- Theorem 7.2.1.3L, interpretation of the combinatorial representation.
James McCaffrey, "Generating the mth Lexicographical Element of a Mathematical Combination."

Appendix: A lexicographic algorithm

While developing the randomizer, I found that the rank and unrank methods of Permutations_mset in SageMath 9.0 used an inefficient algorithm (inherited from EnumeratedSets.rank and EnumeratedSets.unrank). unrank(r), for example, works by enumerating every permutation starting from 0 and taking the rth one. I wanted to improve them; however Sage's functions are specified to number permutations in lexicographic order, and the algorithm I developed for the randomizer produces an order that is not lexicographic, so it's not a drop-in replacement. After some more research I came up with more efficient algorithms suitable for Sage. The changes are in Sage ticket #29942 and became part of Sage 9.2.