Skip to content

8. Solving the Teleporter

David Clamage edited this page Jun 24, 2016 · 1 revision

I go to 178b, set breakpoints, step through, and eventually come up with this simple transcription of what's going on:

main()
{
	r00 = 4
	r01 = 1
	Func_178B()
	if (r00 == 6)
	{
		(continue at 157A)
	}
}

Func_178B()
{
	if (r00 == 0)
	{
		r00 = r01 + 1
		return
	}
	if (r01 == 0)
	{
		r00 -= 1
		r01 = r07
		Func_178B()
		return
	}
	tmp = r00
	r01 -= 1
	Func_178B()
	r01 = r00
	r00 = tmp
	r00 -= 1
	Func_178B()
}

This can then be massaged into this C++ code:

uint16_t mystery(uint16_t a, uint16_t b)
{
	if (a == 0)
	{
		return (b + 1) & 0x7FFF;
	}

	if (b == 0)
	{
		return mystery((a - 1) & 0x7FFF, r7);
	}

	return mystery((a - 1) & 0x7FFF, mystery(a, (b - 1) & 0x7FFF));
}

Notice that because this needs to emulate how math on the VM works, all adds and subtracts need to truncate to 15 bits.

Well, maybe I'm lucky, but I've seen this algorithm before, and so had Trevor. However, there's two key differences:

  1. The truncated math operations
  2. The b == 0 case uses the r7 value instead of 1 for the second argument.

Knowing that the arguments the VM passes are (4, 1), I look up on wiki that the answer should be 65533 for the r7=1 case. I test this and get 32765, which is 65533 truncated to 15-bits. So far so good. Now I run it with r7=2 and... stack overflow. Well now, this must be why it said you need to optimize.

Trevor's the mathier of us two, so he pointed out some important properties of this function:

  1. a only ever decreases, so we don't have to worry about a > 4
  2. Values are always non-negative, so we only need to truncate the result of the one add, and not any subtracts.
  3. Changing that second b==0 argument from 1 to something else increases the execution time, but it's still guaranteed to terminate with a value.
  4. The same inputs to the function are going to be re-visited over and over, so perhaps a caching scheme could be beneficial.
  5. Unlike the real Ackermann function, the modular math means there are only a finite number of possible inputs. For (4,1) this is going to be 5 * 2^15 = 163840. This means with a good caching scheme, we'll only have to evaluate the function a maximum of this many times.

I then set out to do two things:

  1. Change the recursion to a loop
  2. Add a caching scheme to avoid recalculating known results.

Here's the result:

static uint16_t mystery(uint16_t a, uint16_t b, uint16_t mysteryVal)
{
	std::vector<std::pair<uint16_t, uint16_t>> stack;
	stack.emplace_back(a, b);

	std::array<uint16_t, 5 * (1 << 15)> cache = {};
	std::vector<std::vector<uint32_t>> cacheStack;
	cacheStack.emplace_back(1, key(a, b));

	for (;;)
	{
		uint16_t a = stack.back().first;
		uint16_t b = stack.back().second;
		stack.pop_back();
		if (a == 0)
		{
			const uint16_t r = (b + 1) & 0x7FFF;
			if (stack.empty())
			{
				return r;
			}

			for (uint32_t curKey : cacheStack.back())
			{
				cache[curKey] = r;
			}
			cacheStack.pop_back();

			for (auto itr = cacheStack.back().rbegin(); itr != cacheStack.back().rend(); itr++)
			{
				if ((*itr & 0x7FFF) == 0x7FFF)
				{
					*itr &= ~0x7FFF;
					*itr |= r;
					break;
				}
			}

			assert(stack.back().second == 0x7FFF);
			stack.back().second = r;
			continue;
		}

		const uint16_t cached = cache[key(a, b)];
		if (cached != 0)
		{
			const uint16_t r = cached;
			if (stack.empty())
			{
				return r;
			}
			for (uint32_t curKey : cacheStack.back())
			{
				cache[curKey] = r;
			}
			cacheStack.pop_back();

			for (auto itr = cacheStack.back().rbegin(); itr != cacheStack.back().rend(); itr++)
			{
				if ((*itr & 0x7FFF) == 0x7FFF)
				{
					*itr &= ~0x7FFF;
					*itr |= r;
					break;
				}
			}

			assert(stack.back().second == 0x7FFF);
			stack.back().second = r;
			continue;
		}

		if (b == 0)
		{
			cacheStack.back().push_back(key(a - 1, mysteryVal));
			stack.emplace_back(a - 1, mysteryVal);
			continue;
		}

		cacheStack.back().push_back(key(a - 1, 0x7FFF));
		cacheStack.emplace_back(1, key(a, b - 1));
		stack.emplace_back<uint16_t, uint16_t>(a - 1, 0x7FFF);
		stack.emplace_back<uint16_t, uint16_t>(std::move(a), b - 1);
	}
	return (uint16_t)-1;
}

Quite a bit more complex! But also much, much faster. I made a loop to call this with all possible r7 values and sprinkled in some multi-threading for good measure, and it pops out a result for r7 in less than a minute! A billion years down to 1 minute... not bad, right?

So, armed with this correct r7 value, I use the teleporter and...

7/8 codes complete!

9. The Orb

Clone this wiki locally