Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*New* Add parseYaml() Function #142

Merged
merged 1 commit into from
Aug 18, 2023
Merged

*New* Add parseYaml() Function #142

merged 1 commit into from
Aug 18, 2023

Conversation

agrawroh
Copy link
Member

@agrawroh agrawroh commented Jan 15, 2022

Description

This PR adds a new parseYaml() function to sjsonnet.

JSONNET Reference Doc


Signed-off-by: Rohit Agrawal rohit.agrawal@databricks.com

@agrawroh agrawroh added the enhancement New feature or request label Jan 15, 2022
@agrawroh agrawroh requested a review from lihaoyi January 15, 2022 15:18
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
@lihaoyi-databricks
Copy link
Contributor

@szeiger could you review this? since you guys own sjsonnet these days :)

@szeiger
Copy link
Collaborator

szeiger commented Jan 17, 2022

Can we at least skip the intermediate JSON serialization and translate the SnakeYAML AST directly? If we add YAML parsing and use it in our codebase, I expect performance to be an issue. Ideally, we'd parse directly to an Sjsonnet AST (like in ValVisitor for parsing JSON) but I don't know if SnakeYAML supports that. I tried to look it up but the project appears to be inaccessible in a private BitBucket repository, which is a bit worrisome for a new dependency. Do you know more about the status of SnakeYAML?

@agrawroh
Copy link
Member Author

agrawroh commented Jan 17, 2022

Can we at least skip the intermediate JSON serialization and translate the SnakeYAML AST directly? If we add YAML parsing and use it in our codebase, I expect performance to be an issue. Ideally, we'd parse directly to an Sjsonnet AST (like in ValVisitor for parsing JSON) but I don't know if SnakeYAML supports that. I tried to look it up but the project appears to be inaccessible in a private BitBucket repository, which is a bit worrisome for a new dependency. Do you know more about the status of SnakeYAML?

@szeiger I don't know much about SnakeYAML except that it appears in almost all the threads I could find for parsing YAML in scala. It's also used at some other places in our universe. I could also find this other library which converts SnakeYAML AST to Circle AST (JSON) [Ref]
My use-case was to parse and validate the GOC file(s) which are YAMLs inside my service's JSONNET template. I am not sure if we'd be using the YAML parsing at a lot of other places but the performance could be a concern.

JSONNET seems to be using this CPP library called RapidYAML for implementing the parseYaml() function.

Any suggestions on how we can implement this?

@szeiger
Copy link
Collaborator

szeiger commented Jan 17, 2022

@agrawroh I also found most recommendations pointing to SnakeYAML. I was surprised to see that it doesn't have a public repo or docs. Maybe this is a recent change?

Transforming the AST for at least decent performance is probably easy. It looks like SnakeYAML already uses normal Java maps for objects. The primitives are probably simple wrappers, similar to a JSON AST. This shouldn't take more than a few lines of code for a huge performance improvement.

The ability to easily skip the SnakeYAML AST entirely depends on the library's design (which is why I was looking for docs). uPickle makes this very easy (which allows us to implement a JSON to Sjsonnet AST parser with no intermediate JSON AST in 46 LOC). It may be harder to do and the expected pay-off is much smaller.

@agrawroh
Copy link
Member Author

@szeiger BTW Did you try using this link to access SnakeYAML repository?
https://bitbucket.org/snakeyaml/snakeyaml/src/master/

@szeiger
Copy link
Collaborator

szeiger commented Jan 17, 2022

Thanks, that link works. Every page I found so far referred to https://bitbucket.org/asomov/snakeyaml

@agrawroh agrawroh closed this Sep 16, 2022
@yywandb yywandb reopened this Aug 18, 2023
Copy link
Collaborator

@szeiger szeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping to get this unblocked. My concerns about the performance still stand. We should at least skip the intermediate JSON serialization and deserialization. If this ends up being used with large YAML files, we're likely to run into performance issues with the current design.

@agrawroh
Copy link
Member Author

Stamping to get this unblocked. My concerns about the performance still stand. We should at least skip the intermediate JSON serialization and deserialization. If this ends up being used with large YAML files, we're likely to run into performance issues with the current design.

Will try to get this in to unblock Cloud Team. I will have a few free cycles next month. Would definitely try to work on the performance optimizations for this as well as compression which started taking a lot of time.

@agrawroh agrawroh merged commit 82a6f97 into master Aug 18, 2023
1 check passed
@lihaoyi-databricks lihaoyi-databricks mentioned this pull request Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants