Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deleting/clearing a virtual collection will leak all vref-backed keys #9959

Closed
warner opened this issue Aug 26, 2024 · 2 comments
Closed

deleting/clearing a virtual collection will leak all vref-backed keys #9959

warner opened this issue Aug 26, 2024 · 2 comments
Labels
bug Something isn't working liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Aug 26, 2024

While writing a test for #9939, I discovered a second problem with virtual collections and vref-backed keys, on top of #9956 (where retired keys don't get removed from the collection).

In this new one, deleting a collection, or using .clear(), will fail to decref all their vref-based keys.

The problem is this section of clearInternalFull():

      for (const dbKey of enumerateKeysStartEnd(syscall, start, end)) {
        const value = JSON.parse(syscall.vatstoreGet(dbKey));
        doMoreGC =
          value.slots.map(vrm.removeReachableVref).some(b => b) || doMoreGC;
        syscall.vatstoreDelete(dbKey);
        if (isEncodedRemotable(dbKey)) {
          const keyVref = vrefFromDBKey(dbKey);
          if (hasWeakKeys) {
            vrm.removeRecognizableVref(keyVref, `${collectionID}`, true);
          } else {
            doMoreGC = vrm.removeReachableVref(keyVref) || doMoreGC;
          }
          syscall.vatstoreDelete(prefix(`|${keyVref}`));
        }
      }

This code is meant to walk through the backing store of the collection (e.g. from makeScalarBigMapStore() or makeScalarBigWeakSetStore()), look at each entry to determine the key and value it encodes, release the refcounts for any slots in the values, and then if the key was also a vref, either release its refcount (for strong collections) or remove it as a recognizer (for weak collections). Then we delete the backing store key.

The vatstore is a string-to-string key/value table, where syscall.vatstore{Get,Set,Delete,GetNextNey} is the API that liveslots can use. From the vat side of the syscall boundary, liveslots can use any string it likes (the kernel is going to apply a v${vatID}.vs. prefix to that key, to separate one vat from another, but of course the vat cannot see that, or avoid having it be applied).

Liveslots uses a vc. prefix for everything involving virtual collections, and also applies a collectionID prefix. So everything involved with collection 6 would have a vc.6. prefix.

Userspace can use all sorts of things as keys in their MapStore: strings, numbers, BigInts, and vref-backed passables like Presences and Remotables and Representatives. But the backing store only accepts strings. So keys are encoded, using makeEncodePassable from the @endo/marshal package. This uses a one-character prefix to indicate the type, and then some clever encoding to get the sort order right. A string like foo is encoded into sfoo. A BigInt is encoded with an n (for negative) or p (for positive, noting that p sorts after n), a fixed-length size prefix, and then a decimal expansion of the value, with negative values inverted so they sort as expected. And so on.

The vref-backed passables must get converted to "ordinals" before we can get a string for them, because we don't want their sort order to be influenced by the exact vref we receive (which would reveal the order of assignment, and which would yield different iteration orders in different vats). We use a fixed length for the ordinal field, so they always iterate in their insertion order. And, we append the actual vref, so we don't need to do a second DB fetch to learn the vref during iteration. So the first one that gets inserted (e.g. with a vref of o-1) is assigned ordinal 1, and the key we use would be r0000000001:o-1

When clearInternalFull() walks the keys, it uses enumerateKeysStartEnd, from vc.6. (inclusive) to vc.6.{ (exclusive). This is the range that gets all keys whose encodePassable encoding starts with a letter or digit (i.e. all of them), and excludes the range that starts with vc.6.|, which is where we store metadata about the collection, rather than collection members. For example, vc.6.|schemata is where we store the keyShape/valueShape for the collection, vc.6.|entryCount is where we store the .length, and vc.6.|o-1 is where we store the ordinal number we've assigned to vref o-1.

The loop then uses isEncodedRemotable to ask whether this key is a encodePassable representation of something with an ordinal: it looks for the r prefix (as opposed to the s for strings, or p for positive BigInts). If true, it extracts the vref (the part after the :, although it relies upon the fixed-length of the ordinal integer rather than splitting at a colon), and decrements the refcount for it (or deletes the recognizer). It also deletes the matching ordinal-assignment entry (vc.6.|o-1 -> 1).

Now, let's call the r0000000001:o-1 the "passable key", vc.6.r0000000001:o-1 (with the collection-specific prefix) as the "vatstore key", and v44.vs.vc.6.r0000000001:o-1 as the "kvStore key" (which the vat never sees).

enumerateKeysStartEnd takes vatstore keys as bounds, and yields all the vatstore keys in between. We store this in a variable named dbKey. We need this kind of key to use in vatstoreGet and vatstoreDelete.

But, that is not the value we should be giving to isEncodedRemotable, which wants a passable key. isEncodedRemotable() is looking for a string that starts with r. When we pass vc.6.r0000000001:o-1 to isEncodedRemotable(), it will think we're asking about an encoding that starts with v, which is actually how we encode a JavaScript null value (plus an ignored nonsense suffix of c.6.r0000000001:0-1), which is not an encoded remotable. So it will always return false, which means we'll always skip the clause that decrements refcounts and deletes the ordinal-assignment entry.

Even if isEncodedRemotable() returned true, we make the second mistake of passing our dbKey to vrefFromDBKey(), which is designed to remove the r0000000001: prefix from the passable key:

    const vrefFromDBKey = dbKey => dbKey.substring(BIGINT_TAG_LEN + 2);

By giving it a vatstore key instead, it will set keyVref to 0001:o-1. When we pass that to removeRecognizableVref or removeRecognizableVref, the first thing it will probably do is call parseVatSlot on it, which will throw an error because it expects o-1 or o+2 or the like. That would probably show up as an error during collection.clear() (in the best possible case), or an error during bringOutYourDead, which might kill the vat.

Fix

The immediate fix is to strip the vc.6. prefix from dbKey, and use the remaining suffix for the operations that expect a passable key.

    function removePrefix(dbKey) {
      assert(dbKey.startsWith(dbKeyPrefix), dbKey);
      return dbKey.substring(dbKeyPrefix.length);
    }
        syscall.vatstoreDelete(dbKey);
        const dbEntryKey = removePrefix(dbKey);
        if (isEncodedRemotable(dbEntryKey)) {
          const keyVref = vrefFromDBKey(dbEntryKey);
          if (hasWeakKeys) {
            vrm.removeRecognizableVref(keyVref, `${collectionID}`, true);
          } else {
            doMoreGC = vrm.removeReachableVref(keyVref) || doMoreGC;
          }
          syscall.vatstoreDelete(prefix(`|${keyVref}`));
        }

The deeper improvements are:

  • define the types ("passable key" vs "vatstore key") at the top of the file, with explanations
  • include the name of the type in the variables we used to hold those values
  • improve the unit tests to exercise .clear() and look critically at the vatstore to ensure the data is removed as we expect, plus the refcount changes. I suspect we have tests that call .clear() and assert that the normal collection APIs return sensible values (.length = 0 and .keys() returning nothing), but nothing to look at the backing store or the refcount of vref-based keys.

We can (and should) define TypeScript types for these things, but I don't think it would help here, because structural types mean that VatstoreKey (a string) is merely an alias for the same underlying String type as PassableKey. If this were Rust, we could have two different kinds of strings, and the compiler would catch our attempts to use them incorrectly.

Impact

This bug will leak objects used as keys in collections when the collection is cleared using clearInternalFull. We use this function when the collection is deleted, or when .clear() is called on a (non-weak) collection without providing a keyPatt or valuePatt which would perform a limited clear. (Note that weak collections do not have .clear() at all).

When clear() is given any pattern, it falls back to a loop using .keys(), which deletes each item one at a time with deleteInternal(), which accepts a passable key, and does not have the bug.

So far, we haven't been in the habit of clearing collections much, or allowing them to be deleted, so it's possible that we haven't triggered this bug before.

Measurement

The walkthrough above shows that isEncodedRemotable failures mean we won't decrement the key's vref (or remove it as a recognizer), nor will we delete the matching ordinal assignment. This means we can look for r0000000001:${vref} keys without a matching |${vref} key to count how many of these have been mishandled.

We could/should also build a tool to scan the whole vatstore and extract the durable reference graph, then compare it against the refcounts to see if they match. Entries which have been mishandled this way will have had their r0000000001:${vref} key removed, but their refcount will not have been decremented. This would appear as a surplus of refcounts (e.g. refCount=1 but no actual reference edges found in the graph).

Remediation

Messy but it could be worse. We could take advantage of the leftover ordinal records to get a list of the problematic vrefs, decref them, and delete the ordinal record. We'd do this with a vatstore version bump, so we know if/when it has been done, to avoid doing the remediation more than once.

@warner warner added bug Something isn't working SwingSet package: SwingSet liveslots requires vat-upgrade to deploy changes labels Aug 26, 2024
@Agoric Agoric deleted a comment Aug 26, 2024
@Agoric Agoric deleted a comment Aug 26, 2024
@Agoric Agoric deleted a comment Aug 26, 2024
warner added a commit that referenced this issue Aug 26, 2024
These tests will fail without the fixes in the next commit

One new test is disabled because of additional pending bugs that
interfere with the test setup (a combo of #9956 and #9959).
@warner
Copy link
Member Author

warner commented Aug 26, 2024

oh, this is a dup of #8756 . I knew this problem sounded familiar

@warner
Copy link
Member Author

warner commented Aug 26, 2024

closing as a dup

@warner warner closed this as not planned Won't fix, can't repro, duplicate, stale Aug 26, 2024
mergify bot pushed a commit that referenced this issue Aug 30, 2024
These tests will fail without the fixes in the next commit

One new test is disabled because of additional pending bugs that
interfere with the test setup (a combo of #9956 and #9959), which will
be re-enabled in PR #9961 (to address #8756 and others)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants
@warner and others