-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(marshal): cheap capData stringify and parse #1558
Conversation
Won't really be cheap unless we can remove the calls to |
}; | ||
harden(stringifyCapData); | ||
|
||
const DSL = /^\{"slots":(\[[^\]]*\]),"#body":(.*)\}$/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to move to a less greedy search and string boundaries if possible
\A\{"slots":(\[[^\]]+?\]),"#body":(.+?)\}\Z
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had never seen \A
or \Z
before. Searching for it in JS RegExp documentation finally found it at https://docstore.mik.ua/orelly/webprog/jscript/ch10_01.htm in a section titled
"Perl RegExp Features Not Supported in JavaScript"
But good point about newlines. Where the (.*)
appears, we should accept newlines along with everything else. IIUC, adding the s
flag (aka "dotAll") on the end will give "." that meaning. Of course there are other ways to get the same effect by changing the pattern rather than adding a flag. What do you recommend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imposing an ordering constraint on the entries of a JSON object (which is defined to be unordered) would definitely merit explanation, as would the existence of this regular expression—I assume it's in anticipation of being able to look at slots without even parsing the body?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For checking the }
at the end, s.endsWith('}')
seems more straightforward. And playing around with RegExr suggest that \]
doesn't match ]
.
const DSL = /^\{"slots":(\[[^\]]*\]),"#body":(.*)\}$/; | |
const DSL = /^{"slots":(\[[^\]]*]),"#body":/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And playing around with RegExr suggest that
\]
doesn't match]
.
harden(assertSimpleString); | ||
|
||
/** | ||
* @param {CapData<SimpleString>} capData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since SimpleString = String
this won't give a type error for plain string.
if you want to ensure that assertSimpleString
was called first, the SimpleString type could have a tag so it's a superset of a regular string: The Opaque
utility from type-fest makes this easy
/** @typedef {string} SimpleString */ | ||
|
||
/** | ||
* A SimpleString is one where JSON.strigify-ing the string just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* A SimpleString is one where JSON.strigify-ing the string just | |
* A SimpleString is one where JSON.stringify-ing the string just |
While the approach is interesting, before we merge this I'd like to have a discussion about the use cases. |
While getting rid of the double encoding is definitely a benefit, I'm not sure I buy the "this is cheap" assertion. All the pre-flight and post-flight checking seem to me to add up to considerably more overhead than what this replaces (though of course we'd need to do some benchmarking to be sure). We might decide that the gain in clarity and simplicity in the encoding is worth it, but if so we should make that decision with our eyes open. I'd also second @mhofman's concern about use cases. This smells to me like it may be overly fixated on one particular use case (albeit an important one) -- Note that I don't as yet have a good justification for that sensation, I just have it. |
Converted to draft until we understand
We do not yet have answers for any of those. In the meantime, I will continue to keep this warm -- to revise and respond to reviewer comments as if this is an Open PR. Thanks. |
Agoric/agoric-sdk#6646 has the following code return JSON.stringify(readonly.serialize(val));
...
const data = JSON.parse(str);
return readonly.unserialize(data); Would these cases be better without the double encoding? Should we care? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely looks possible to cover with fastcheck, and #body
seems like a fine way to differentiate a non-serialized payload (thereby reducing the size and increasing the human-friendliness of messages) but I share the thoughts of others regarding this approach to getting there.
}; | ||
harden(stringifyCapData); | ||
|
||
const DSL = /^\{"slots":(\[[^\]]*\]),"#body":(.*)\}$/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imposing an ordering constraint on the entries of a JSON object (which is defined to be unordered) would definitely merit explanation, as would the existence of this regular expression—I assume it's in anticipation of being able to look at slots without even parsing the body?
slots: [...slots], | ||
} = capData; | ||
assert(typeof body === 'string'); | ||
assert(body.startsWith('#')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, I hope it's in a way that expands the definition of "CapData" from { body: string, slots: Copyable[] }
to e.g. { body: string, slots: Copyable[] } | { "#body": Copyable<encoding: 'smallcaps'>, slots: Copyable[] }
and allows for direct production of the latter (Passable → CapData<format: 'smallcapsObject'> from e.g. marshaller.toCapData(passable)
, with subsequent JSON.stringify
as warranted [and possibly after embedding in another structure]) rather than imposing a multi-step and non-embeddable process (Passable → CapData<bodyFormat: 'smallcapsString'> → JSONText).
Mark and I discussed this PR yesterday. The disconnect was indeed how CapData is usually nested into a larger structure, and where the serialization through JSON parse is usually disconnected and in a completely unrelated module (e.g. just before sending it over a socket). However this serialization effectively is a regular JSON serialization of an object with a differently named |
In light of @mhofman's compelling arguments that this will not be needed, closing. |
Given that
capData
is a CapData whoseslots
are themselves simple strings (see def in PR), and whosebody
is a smallcaps string, thenstringifyCapData(capData)
cheaply stringifies the capData as a whole to a JSON-parsable string, while avoiding the double encoding of the body. This should make the body considerable shorter and more readable.parseCapData
cheaply does the reverse of course.Example:
@arirubinstein this uses a regexp. Are both directions safe against malicious input? @gibson042 , does this seem fastcheck-able?
Everyone, assume this is not urgent. But it is small and, if robust, would be pleasant.