Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(marshal): cheap capData stringify and parse #1558

Closed
wants to merge 2 commits into from

Conversation

erights
Copy link
Contributor

@erights erights commented Apr 19, 2023

Given that capData is a CapData whose slots are themselves simple strings (see def in PR), and whose body is a smallcaps string, then stringifyCapData(capData) cheaply stringifies the capData as a whole to a JSON-parsable string, while avoiding the double encoding of the body. This should make the body considerable shorter and more readable.

parseCapData cheaply does the reverse of course.

Example:

  roundTrips(t, { body: '#[]', slots: ['a'] }, '{"slots":["a"],"#body":[]}');

@arirubinstein this uses a regexp. Are both directions safe against malicious input? @gibson042 , does this seem fastcheck-able?

Everyone, assume this is not urgent. But it is small and, if robust, would be pleasant.

@erights erights self-assigned this Apr 19, 2023
@erights
Copy link
Contributor Author

erights commented Apr 19, 2023

Won't really be cheap unless we can remove the calls to JSON.parse(scBody), which is there just to validate. I conjecture that it is robust even with these checks removed. But I am not yet confident of that.

};
harden(stringifyCapData);

const DSL = /^\{"slots":(\[[^\]]*\]),"#body":(.*)\}$/;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to move to a less greedy search and string boundaries if possible
\A\{"slots":(\[[^\]]+?\]),"#body":(.+?)\}\Z

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had never seen \A or \Z before. Searching for it in JS RegExp documentation finally found it at https://docstore.mik.ua/orelly/webprog/jscript/ch10_01.htm in a section titled
"Perl RegExp Features Not Supported in JavaScript"

But good point about newlines. Where the (.*) appears, we should accept newlines along with everything else. IIUC, adding the s flag (aka "dotAll") on the end will give "." that meaning. Of course there are other ways to get the same effect by changing the pattern rather than adding a flag. What do you recommend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imposing an ordering constraint on the entries of a JSON object (which is defined to be unordered) would definitely merit explanation, as would the existence of this regular expression—I assume it's in anticipation of being able to look at slots without even parsing the body?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For checking the } at the end, s.endsWith('}') seems more straightforward. And playing around with RegExr suggest that \] doesn't match ].

Suggested change
const DSL = /^\{"slots":(\[[^\]]*\]),"#body":(.*)\}$/;
const DSL = /^{"slots":(\[[^\]]*]),"#body":/;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And playing around with RegExr suggest that \] doesn't match ].

image

harden(assertSimpleString);

/**
* @param {CapData<SimpleString>} capData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since SimpleString = String this won't give a type error for plain string.

if you want to ensure that assertSimpleString was called first, the SimpleString type could have a tag so it's a superset of a regular string: The Opaque utility from type-fest makes this easy

/** @typedef {string} SimpleString */

/**
* A SimpleString is one where JSON.strigify-ing the string just
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A SimpleString is one where JSON.strigify-ing the string just
* A SimpleString is one where JSON.stringify-ing the string just

@mhofman
Copy link
Contributor

mhofman commented Apr 19, 2023

While the approach is interesting, before we merge this I'd like to have a discussion about the use cases.

@FUDCo
Copy link
Contributor

FUDCo commented Apr 19, 2023

While getting rid of the double encoding is definitely a benefit, I'm not sure I buy the "this is cheap" assertion. All the pre-flight and post-flight checking seem to me to add up to considerably more overhead than what this replaces (though of course we'd need to do some benchmarking to be sure). We might decide that the gain in clarity and simplicity in the encoding is worth it, but if so we should make that decision with our eyes open.

I'd also second @mhofman's concern about use cases. This smells to me like it may be overly fixated on one particular use case (albeit an important one) -- Note that I don't as yet have a good justification for that sensation, I just have it.

@erights erights marked this pull request as draft April 19, 2023 20:39
@erights
Copy link
Contributor Author

erights commented Apr 19, 2023

Converted to draft until we understand

  • if this is needed?
  • if it is safe with the extra JSON.parse(scBody) validation step?
  • if it is cheap when that check is absent?

We do not yet have answers for any of those.

In the meantime, I will continue to keep this warm -- to revise and respond to reviewer comments as if this is an Open PR.

Thanks.

@erights
Copy link
Contributor Author

erights commented Apr 19, 2023

Agoric/agoric-sdk#6646 has the following code

          return JSON.stringify(readonly.serialize(val));
...
          const data = JSON.parse(str);
          return readonly.unserialize(data);

Would these cases be better without the double encoding? Should we care?

Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely looks possible to cover with fastcheck, and #body seems like a fine way to differentiate a non-serialized payload (thereby reducing the size and increasing the human-friendliness of messages) but I share the thoughts of others regarding this approach to getting there.

};
harden(stringifyCapData);

const DSL = /^\{"slots":(\[[^\]]*\]),"#body":(.*)\}$/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imposing an ordering constraint on the entries of a JSON object (which is defined to be unordered) would definitely merit explanation, as would the existence of this regular expression—I assume it's in anticipation of being able to look at slots without even parsing the body?

slots: [...slots],
} = capData;
assert(typeof body === 'string');
assert(body.startsWith('#'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, I hope it's in a way that expands the definition of "CapData" from { body: string, slots: Copyable[] } to e.g. { body: string, slots: Copyable[] } | { "#body": Copyable<encoding: 'smallcaps'>, slots: Copyable[] } and allows for direct production of the latter (Passable → CapData<format: 'smallcapsObject'> from e.g. marshaller.toCapData(passable), with subsequent JSON.stringify as warranted [and possibly after embedding in another structure]) rather than imposing a multi-step and non-embeddable process (Passable → CapData<bodyFormat: 'smallcapsString'> → JSONText).

@mhofman
Copy link
Contributor

mhofman commented Apr 20, 2023

If we do this, I hope it's in a way that expands the definition of "CapData" from { body: string, slots: Copyable[] } to e.g. { body: string, slots: Copyable[] } | { "#body": Copyable<encoding: 'smallcaps'>, slots: Copyable[] } and allows for direct production of the latter (Passable → CapData<format: 'smallcapsObject'> from e.g. marshaller.toCapData(passable), with subsequent JSON.stringify as warranted [and possibly after embedding in another structure]) rather than imposing a multi-step and non-embeddable process (Passable → CapData<bodyFormat: 'smallcapsString'> → JSONText).

Mark and I discussed this PR yesterday. The disconnect was indeed how CapData is usually nested into a larger structure, and where the serialization through JSON parse is usually disconnected and in a completely unrelated module (e.g. just before sending it over a socket).

However this serialization effectively is a regular JSON serialization of an object with a differently named #body property, and is actually really close to the generalized CapData definition described above and in #1478

@erights
Copy link
Contributor Author

erights commented Apr 29, 2023

In light of @mhofman's compelling arguments that this will not be needed, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants