Skip to content

FInal Project – SCIF Compiler Blogpost #543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: 2025sp
Choose a base branch
from

Conversation

KabirSamsi
Copy link
Contributor

Resolves #529.

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given your extreme time constraints, it's great that you were able to accomplish some useful things for the SCIF project. It looks like you did about 2/3 of the original proposal from #529: multi-contract compilation, general software-engineering/documentation improvements, but not improved error messages. By their nature, it was very hard to think of a way to do a systematic evaluation of this kind of project, but it's great that you added some tests for the multi-contract stuff.

I have a few important questions/suggestions here, mostly around the theme of being as specific as you can about what you actually did.

Comment on lines 6 to 8
Kabir Samsi is a third-year undergrad, interested in building programming languages and compilers that can
target new areas.br>
Stephanie Ma is a first-year MS student interested in PL and compilers.<br>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a < on one BR tag. And you don't want one at the end.


As both a language design and implementation paper, the [SCIF technical report](https://arxiv.org/abs/2407.01204) extensively discusses the SCIF compiler and the correctness and performance of the Solidity code it generates for SCIF programs. As such, its authors hope to eventually publish the compiler as a research artifact similar to what is required for a [PLDI Research Artifact](https://pldi25.sigplan.org/track/pldi-2025-pldi-research-artifacts) or [OOPSLA Artifact](https://2025.splashcon.org/track/splash-2025-oopsla-artifacts). The [ACM discusses different badges](https://www.acm.org/publications/policies/artifact-review-and-badging-current) an artifact can be awarded. The goal for SCIF is that the compiler can be easily setup and run by any open source contributor and that the compiler can easily validate results described in papers.

In our project, we focused on a few primary aspects – adding on the much-desired feature of defining **multiple contracts in one file**, and allowing this to work with our the compiler's current control flow and functionality; improving the quality of error messaging for malformed files; and improving the structure of the compiler's build system.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and allowing this to work with our the compiler's current control flow and functionality

I'm not sure what this means… is there something more specific you can say about the requirements here?


In our project, we focused on a few primary aspects – adding on the much-desired feature of defining **multiple contracts in one file**, and allowing this to work with our the compiler's current control flow and functionality; improving the quality of error messaging for malformed files; and improving the structure of the compiler's build system.

The existing compiler is frustrating to setup and finicky to run. Furthermore, for potential contributors, it's challenging to know if they introduce a regression or improvement to the codebase. We aim to improve the experience for users and contributors.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a separate goal from the concrete ones listed above, or is it just context for the goals you have already listed? Please try to keep it specific—it can be confusing to state general goals that aren't clearly tied to specific objectives.


SCIF presently uses [Cup](https://www2.cs.tum.edu/projects/cup/) as its parsing mechanism to define its grammar. Previously we defined a grammar mechanism that would allow for any number of imports, followed by either a single contract definition or a single interface definition – this would be parsed as a `SourceFile`, a superclass of both `ContractFile` and `InterfaceFile`.

Our new infrastructure now parses a single source file importing multiple contracts into a list of `Sourcefiles`.It does so by initially creating a new term, `SourceFiles` without imports defined initially. Subsequently, we then
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sourcefiles -> SourceFiles
Missing space after the "."


SCIF presently uses [Cup](https://www2.cs.tum.edu/projects/cup/) as its parsing mechanism to define its grammar. Previously we defined a grammar mechanism that would allow for any number of imports, followed by either a single contract definition or a single interface definition – this would be parsed as a `SourceFile`, a superclass of both `ContractFile` and `InterfaceFile`.

Our new infrastructure now parses a single source file importing multiple contracts into a list of `Sourcefiles`.It does so by initially creating a new term, `SourceFiles` without imports defined initially. Subsequently, we then
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird that there is a thing called SourceFile, which sounds singular, but there are actually multiple SourceFiles per actual file?


Though we initially parse and then verify each file separately, this is only for the purpose of intermediate representation and validation – in the target language, we fuse these contracts and their interfaces back together in solidity.

To do so, we mark the 'first' `ContractFile` or `InterfaceFile` defined in our series of source files, based on which was defined first in the original code, with the `firstInFile` tag. Subsequently, in the resultant solidity code, we remove imports for all but the `firstInFile` attribute. Code generation then only uses `firstInFile` to generate the relevant import information, while all subsequently fused files only retain the contracts and interfaces defined within them.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 'first' in quotes? If you actually want the quotes, please use double quotes (not single quotes).


Though we initially parse and then verify each file separately, this is only for the purpose of intermediate representation and validation – in the target language, we fuse these contracts and their interfaces back together in solidity.

To do so, we mark the 'first' `ContractFile` or `InterfaceFile` defined in our series of source files, based on which was defined first in the original code, with the `firstInFile` tag. Subsequently, in the resultant solidity code, we remove imports for all but the `firstInFile` attribute. Code generation then only uses `firstInFile` to generate the relevant import information, while all subsequently fused files only retain the contracts and interfaces defined within them.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the goal here. Why is it a good idea to only emit imports for the firstInFile contract? Is the point that a given (actual) file might contain several ContractFiles, but your parser associates the same set of imports with all of them?

If so, it seems like this is masking a deeper problem: contracts are not 1-1 with files anymore. So what you really want here is to separate the concept of a Contract from a ContractFile, and associate the imports with the file itself and not the contract.


### Build System for Research Artifact

SCIF uses [SHErrLoc](https://www.cs.cornell.edu/projects/SHErrLoc/). It is written in Java but uses a different build system and uses different versions of the same dependencies that SCIF uses. Previously, SHErrLoc was duplicated in the repo and not properly linked as a submodule, causing conflicts with compiled bytecode class versions. The build system also did not properly include SHErrLoc, and conflicting versions of CUP could cause sporadic compilation and runtime issues. We work to fix this.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We work to fix this.

What did you do, specifically?


## Challenges

Our two biggest challenges can both generally be summarized as having needed to work in a time-crunch, due to changing our project track later in the game; and the difficulties of adding on optimizations onto a codebase which was not entirely our own. We were able to get over the time hurdle by readjusting the scope of the project and working concurrently in a couple of sprints; it was also helpful to divide up tasks based on expertise and familiarity with the SCIF project. Working wtih the codebase was challenging due to the somewhat sparse documentation in places; to this effect, we've added as a part of our goal with this compiler to improve overall documentation of methods going forward.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding on optimizations

I didn't see any actual optimizations here. Do you maybe mean, like, changes, more generically?


As both a language design and implementation paper, the [SCIF technical report](https://arxiv.org/abs/2407.01204) extensively discusses the SCIF compiler and the correctness and performance of the Solidity code it generates for SCIF programs. As such, its authors hope to eventually publish the compiler as a research artifact similar to what is required for a [PLDI Research Artifact](https://pldi25.sigplan.org/track/pldi-2025-pldi-research-artifacts) or [OOPSLA Artifact](https://2025.splashcon.org/track/splash-2025-oopsla-artifacts). The [ACM discusses different badges](https://www.acm.org/publications/policies/artifact-review-and-badging-current) an artifact can be awarded. The goal for SCIF is that the compiler can be easily setup and run by any open source contributor and that the compiler can easily validate results described in papers.

In our project, we focused on a few primary aspects – adding on the much-desired feature of defining **multiple contracts in one file**, and allowing this to work with our the compiler's current control flow and functionality; improving the quality of error messaging for malformed files; and improving the structure of the compiler's build system.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improving the quality of error messaging for malformed files

Unless I somehow missed it, the rest of your post does not discuss this. Please either omit it from this list (if you did not actually do it) or add a section about it.

@sampsyo sampsyo added the 2025sp label May 16, 2025
@sampsyo
Copy link
Owner

sampsyo commented May 28, 2025

Hi, @KabirSamsi @calciiium @noschiff! I would love to publish your blog post. Can you please wrap up the revisions discussed above so I can hit the green button?

@sampsyo
Copy link
Owner

sampsyo commented Jun 5, 2025

Just one last reminder about the above, @KabirSamsi @calciiium @noschiff—if you don't want to wrap up the small additions above, we can just close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Project Proposal: SCIF Compiler Improvement
4 participants