Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swingset hierarchical object identifiers #455

Closed
warner opened this issue Jan 27, 2020 · 6 comments
Closed

swingset hierarchical object identifiers #455

warner opened this issue Jan 27, 2020 · 6 comments
Assignees
Labels
SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Jan 27, 2020

We had a long meeting today to discuss Dean's idea (which I tried and failed to capture back in #54 (comment)) about nested object identifiers. (for internal reference, the recording of our meeting is in the Agoric internal drive, Engineering / 2020-01-24 hierarchical object refs in swingset)

The problem this addresses is the vat-side liveslots tables. All of the kernel-side tables are presumed to live primarily on disk, so they can be of arbitrarily large size without causing memory problems. The vat-side liveslots table maps from inbound o+4-type reference identifiers to local Javascript objects/promises that can be the targets of inbound messages or arguments to the same. This same (bidirectional) table is also used to serialize objects/promises in outbound arguments. We only talked about object references, but I suspect promise references can be handled in basically the same way.

IMG_6199

kernel changes

Kernel-side object references (generally spelled koNN, and used to index a table that tracks which vat owns each, so messages can be routed to the correct vat) remain unchanged. The kernel-side c-lists map these kernel-wide koNN identifiers to vat-specific o+NN or o-NN identifiers. These vat-side IDs will be changed to have a hierarchical identifier: o+NN.MM (or beyond: o+NN.MM.SS.QQ etc). The kernel does not parse or interpret any of that: o+1.2 and o+1.3 are entirely different objects, as far as it knows. It's important to note that one client vat having access to e.g. o+1.2 has no bearing on it getting access to o+1.3: each sub-id has an entirely separate identity and access-control meaures.

inbound liveslots

Within liveslots is where things get interesting. The inbound table lookup will parse the kernel-provided identifier into an initial portion and a tail (the car and the cdr, for us Lisp fans). In this simple case, o+1.2 gets split into o+1 and 2 (but more complex nested cases, like o+1.2.3.4, are conceivable).

The inbound deserialization code then looks o+1 up in the table to get a "container" object. It then invokes a .hydrate() method on that container object and provides tail (2) as an argument. (in the complex case, it would call .hydrate('2.3.4'), and recursive lookup/construction would be performed).

The container object is responsible for creating a new representative object to serve as the target for (or argument of) an inbound message. It will use some TBD database syscall to fetch the data necessary to construct this object, and will then invoke a constructor function provided back when the object identifier was created. The resulting object should be short-lived: it might be retained by a Promise .then or two, but it should not be held in any long-lived table.

The liveslots layer will maintain a WeakMap from this generated object to the kernel-side object ID. If this object is sent outbound, this weakmap is used to serialize it back into the same object ID. This weakmap should remain small: only objects used by in-flight operations should appear here, whereas objects that are referenced by other vats but not actually involved in current operations should not be instantiated or tracked by the weakmap.

creating containers and items

The process starts when application-level code invokes some liveslots facility to create a "Container", perhaps c = liveslots.createContainer(). At this time, the liveslots layer (probably) allocates an object ID for the container (e.g. o+1).

Later, some application-level code can create a new item within this container (e.g. o = c.create(initialData, constructor)). This allocates a sub-id (e.g. .2), uses the DB interface to store the initial data under that ID, then invokes the constructor function with the initial data to build the object representative. This representative is added to the WeakMap, then returned. If/when application code includes the representative in a message (or promise resolution), the WeakMap will recognize it and serialize it as o+1.2 to the kernel. When the application-level code in the vat stops referencing the representative, it will be GCed and removed. At that point, no Javascript object exists that represents the item. The kernel-side c-list will have a row that includes the o+1.2 identifier, and there will be a corresponding koNN kernel object table entry. The vat-side liveslots table (on disk, somehow) will have an entry mapping o+1.2 to the current data of that object. But there will be no actual Javascript Object for it, until a new message arrives referencing o+1.2.

The constructors could be tracked in a WeakMap (and assigned an integer), so the DB record could record the fact that o+1.2 is associated with constructor number 3, and then map from 3 to the specific Function object. This way we don't need to track N separate constructor functions for N virtual target objects (which would defeat the purpose). Or we could require that each container have exactly one constructor function (which might be easier to reason about, and would perhaps make upgrade / schema changes easier in the future).

At some point in the future, o+1.2 is delivered back into the vat. Liveslots maps o+1 to the container object, upon which .hydrate('2') is called. This looks up 2 in the DB, which gets us the constructor function and the current data contents. A new representative is built with the constructor, and either delivered the message (as a target) or is referenced as an argument to some other message.

object identity

We very much want to avoid using WeakRefs (Javascript doesn't have them yet, but they've been proposed), because it is difficult to make the finalizers run in a deterministic order (necessary for consensus-based swingset hosts). As a result, we aren't going to guarantee that the same object reference appearing in two successive inbound messages will be deserialized to the same object representative. (We might decide that two copies of the same reference appearing in the same message would get the same representative, by using one WeakMap per inbound message, and discarding it at the end of the crank, but we didn't come to a firm conclusion about it).

Therefore application code should not be performing EQ on these objects or using them as the keys of any tables. They are to be used for their encapsulated data, not their identity.

We didn't use this terminology during the meeting, but I'm now starting to like the idea of calling these things "representatives", in contrast to the virtual object that they represent. The virtual object lives only in the database. Each time it gets referenced, the liveslots deserialization creates a new short-lived representative to perform the object's functions.

This raises an interesting question of how (or whether) multiple representative for the same virtual object might coordinate with each other.

object constructors

Our basic assumption is that these objects will implement behavior that needs to read the object state from the DB, modify it in some way, then write the modified state back (after which point the object representative can be dropped). We talked about the "React pattern", in which the container.create(initialData, constructor) call would get a constructor function that looked something like this:

function makeCounter(useState, initialData) {
  const [count, updateCount] = useState(initialData);
  return harden({
    increment() {
      updateCount(count+1);
    },
    decrement() {
      updateCount(count-1);
    },
  });
}

Some ideas here:

  • the schema is positional: each call to useState() gets a successive slot of the database record
  • the constructor cannot tell whether it's being called for the first time (in which case useState() is creating a new record and filling it with initialData), or the second/etc time (in which case useState() is reading the current state from the DB and ignoring initialData)
  • a fresh makeCounter() call is made for each message (one per representative)
  • the behavior inside the returned object must be prepared for count to not change during its operation: we define some granularity or window of time, and exactly one read happens at the beginning of that window, and any write won't happen until the end

We talked about giving the code an accessor for its state instead, so it could survive being used for multiple operations, but didn't come up with anything conclusive:

  increment() {
    updateCount(getCount()+1));
  }
@warner warner added the SwingSet package: SwingSet label Jan 27, 2020
@FUDCo
Copy link
Contributor

FUDCo commented Jan 31, 2020

I think the concept embodied by your "representative" idea is expressed by the word "plenipotentiary", defined (in the context of diplomacy) as "a person, especially a diplomat, invested with the full power of independent action on behalf of their government, typically in a foreign country", i.e., a thing that has the full power to act as the object without actually being the object. That said, "plenipotentiary" is a mouthful and also hard to type.

@FUDCo
Copy link
Contributor

FUDCo commented Jan 31, 2020

I don't care for the notional constructor API, at least as it is used in the makeCounter example. The only use for initialData is to pass it to useState, but the two are passed in together as parameters. It looks to me like a chance to just make a mistake writing boilerplate. If there's no other use for initialData it seems like it would be cleaner for it to be closed over by useState. I also think that the positional schema idea is an opportunity to make coding errors. (n.b. I don't think either of these concerns is fundamental to the soundness of the underlying idea)

@warner
Copy link
Member Author

warner commented Jan 31, 2020

Hm, is there any way we could prototype the constructor/dehydrator part without first figuring out the data model / useState part? I think we have a lot of experimentation to do (and need a lot of user/developer feedback) before we can feel we've gotten that second part right.

@warner
Copy link
Member Author

warner commented Jul 29, 2020

In today's meeting we sketched out some approaches for the user-level API.

Dean's initial version

const purse = (me, c) => ({
    deposit(other) {
      const otherBalance = c.get(other).balance;
      c.update(me, { balance: me.balance + otherBalance});
      c.update(other, { balance: 0 });
    }
});

const c = liveslots.createContainer(purse);

function mint(initialBalance) {
  return c.create({ balance: initialBalance});
}

Dean's explicit state version, just balance

🚀 🚀 🚀 Current winner

const purse = (me, c) => ({
    deposit(other) {
      const myBalance = c.get(me);
      const otherBalance = c.get(other);
      c.update(me, myBalance + otherBalance);
      c.update(other, 0);
    }
});

const c = liveslots.createContainer(purse);

function mint(initialBalance) {
  return c.create(initialBalance);
}

Dean's "me" is external version

const purse = (me, c) => ({
    deposit(other) {
      const otherBalance = c.get(other).balance;
      const myBalance = c.get(me).balance;
      c.update(me, { balance: myBalance + otherBalance});
      c.update(other, { balance: 0 });
    }
});

const c = liveslots.createContainer(purse);

function mint(initialBalance) {
  return c.create({ balance: initialBalance});
}

Promise version

// Dean's explicit state version, just balance, promise
const purse = (me, c) => ({
    async deposit(other) {
      const o = await other;
      // AWAIT /////
      const myBalance = c.get(me);
      const otherBalance = c.get(o);
      c.update(me, myBalance + otherBalance);
      c.update(o, 0);
    }
});

const c = liveslots.createContainer(purse);

function mint(initialBalance) {
  return c.create(initialBalance);
}

@warner
Copy link
Member Author

warner commented Jul 29, 2020

Another idea Dean presented was how to save space by effectively compressing the kernel tables. In my original thinking, the exporting vat has a clist entry like ko1 <-> o2.3 for each Purse, the kernel has a kernel-object-table entry like ko1 -> vat-mint for each Purse, and the client vat has a clist entry like o4 <-> ko1 for each Purse. This maintains the correct isolation between objects (vat-client cannot access any Purse that hasn't been granted to at least one object within vat-client), and the only kernel change is to the shape of vat-exported identifiers in clists, but there are still three tables of size N: exporting vat, kernel-object-table, importing vat. These tables are in secondary storage, so it's not like we're consuming RAM (and the whole hierarchical-identifier/slowcap thing lets vat-mint keep them out of RAM too), but we can still do better.

Dean's first point was to make the kernel more aware of this scheme. The kernel object table only exists to figure out where to route messages to a given object, and we could effectively compress it by allowing kernel object IDs to be hierarchical as well. In this form, the client vat clist would say o4 <-> ko1.2, the kernel object table would say ko1 -> vat-mint (compressing all kernel object IDs for a given Container into a single entry), and the vat-mint clist would say ko1.2 <-> o2.3. This changes the shape of kernel object identifiers, as well as the kernel-side column of each clist.

So far, the comms vat sees a zillion separate identifiers just like any other client vat (although we should be able to use this same Container scheme to keep that state out of RAM). But in the second step, we could choose to give the comms vat more power, by letting clist entries point to an entire prefix, and allowing the vat to make up whatever suffix it wants. Here, the comms vat clist would say o5 <-> ko1, or maybe o4.* <-> ko1.* (to be more explicit). The comms vat still has internal clists for each remote machine, which would point to e.g. o4.1, but now the comms vat does a syscall referencing o4.1, and its clist translates it into ko1.1, the kernel says "ko1.* lives on vat-mint", the vat-mint clist translates ko1.1 into o2.1 (the suffix remaining the same all the way through), and then vat-mint uses o2.1 to look up the Purse data in the Container.

In this approach, the comms vat can access Purses that nobody has sent through it, but we already rely upon the comms vat for a lot. The benefit is that the comms vat clist is now compressed (one entry per Issuer, not per Purse), and we aren't adding vat-mint c-lists entries for each Purse that goes out to an external machine.

The comms vat must still maintain internal clists of size N, because we wouldn't want to grant the remote machine access to all purses.

warner added a commit that referenced this issue Oct 21, 2020
Liveslots does not yet provide any `vatGlobals`, but this ensures that any
ones it does provide in the future will be added to the `endowments` on the
vat's Compartment. We'll use this to populate `makeExternalStore` -type
functions, which want to be globals (because threading them from `vatPowers`
into modules that need them would be too annoying), but must also be in
cahoots with closely-held WeakMaps and "Representative" state management code
from liveslots, as described in #455 and #1846.

closes #1867
@warner
Copy link
Member Author

warner commented Nov 10, 2020

Everything we talked about here was implemented in #1907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants