-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNG-8258] activate Reproducible Builds by default #1726
base: master
Are you sure you want to change the base?
Conversation
@hboutemy I think you need to update the following ones instead: |
wow, we have so many root poms? I'm lost in all these copies |
The whole model builder using the v3 api is kept to not break some plugins too much, but it's not the one used by default. We should mark it as deprecated clearly... |
Alternatively, if the build time is considered a problem, why not just excluding it completely? It is not part of the JAR file specification as far as I can see (I don't see it in the list of attribute names). If we fix a value in a |
Do I understand correctly that then maven defines a default timestamp for the build?!? This looks quite odd to be honest. Should the default not be something like... well
I think one should give an example on how to do that then? |
I agree that it would be better to not include that information if it's not provided. Is there any easy way to do that ? |
I agree too here if we could have another solution. |
So this will be then only explicit opt-out? |
There is a hidden config property |
if anybody knows how to do a zip that does not contain any timestamp, I'm all ears open.
no, you have 2 options:
if you want, we can use this: impact is that to rebuild the exact same jar, we'll need to download the reference binary, extract the value used by the release manager, then inject to the rebuild recipe if you prefer, we can put 1/1/1970, or any other conventional value that you prefer and looks "more common" |
The |
that's not the result I get:
or if we don't trust
|
I thought that we were talking about the content of the
The JAR file created by the
|
Nope, we are about "reproducible builds". In short, if you build a (let's assume git tag), then if I re-build same tag (on same OS/Java -- but this has some leeway), I should end up with same (binary wise) JAR output, like you. In other words, if you do |
Just tested, I thought that Maven was adding automatically the For the time stamp of the ZIP file itself, maybe it could be set to the time stamp of the most recent entry? |
I'm neither talking about the content of |
Ah okay. I guess that whether they could be set to the timestamp of the source files or git commit has already been discussed then. |
But is this then not more the Another one would be as you described to download the real jar first and then extract the used timestamp value from there, then inject it into the reproducible build. |
quite works, but complex and does not give one simple workflow: as a developer, I want to build my source code twice and get the same output (which will also help build-cache) I don't see how we can be less basic than a fixed timestamp by default in Maven core: perhaps a less strange default value could lower bad feelings about it, something like |
To be honest I never wanted that in the last 10+years :-D Also if it is really about zip time stamps then I thing it is really something that should be handled in the jar-plugin (or even archiver component), e.g. for me a more sensible default would be to use the last modification time of the oldest file (there are even options to sync git time with local time) and maybe give a warning that it is not 100% portable (what is even not the case for Linux/Windows or different JVMs already anyways). |
Yes, I think this would be more obvious, but why not |
yeah,
all that is independent improvement that we can work out in a separate stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Correct me if I'm wrong, but it seems to me that one of the main goals of reproducible build is security: allowing developers to verify that the released JAR files do not contain altered byte codes (e.g. malicious code injected by a compromised compiler). For this goal, the timestamp of ZIP entries does not matter. Only the content of ZIP entries matter. In my understanding, a verification focussed on what matter is called "semantically reproducible build" or "semantic equivalency". Microsoft seems to propose a tool for semantic equivalency at least for NPM packages. Are we pushing a bit for bit reproducible build because we have no easy tool for semantic equivalency? If yes, what about instead developing a new Maven plugin or modifying This proposal would allow the following workflow during release: the release manager deploys the JAR files on a staging repository and give the URL to other developers. Other developers would use that URL with the above-cited new plugin, which would automatically build the project and compare semantically with the JARs on the staging repository.
Same for me. What I want is security check. Actually, I would rather not desire bit for bit reproducibility, as I would find more useful to keep the (non-standard) |
Tycho has exactly this kind of "semantic equivalency" here: it is not used for "reproducibility" instead it is used to check if an artifact only differs in version, and in this case the artifact is not deployed. Additionally if it differs but version has not changed one gets an error / warning that one needs to increment the version (this is similar to this use-case here: If i build the same version the jar should be "semantic equivalent" but bit to bit equivalence is not important). This currently even can detect if a file only differs in line endings (e.g. |
I disagree. Having the binaries being stable allows some optimisation in the build downstream. I'd really like to keep that. This allows the compiler to skip as the input and dependencies have not changed, same for resources, which cascades to the entire build. If the generated jar for a dependency is changed (with a different timestamp in the zip), the compiler needs to recompile for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
One can always use the file modification time, e.g as far as I know maven already tries to not overwrite a file if it is the same bytes on some places, now the same must only be applied to jar (what actually can be seen as a FileSystem where individual item might or might not be updated / deleted / added).
But now with a fixed timestamp by default how will one know the dependency has "changed"? Especially for this case a "semantic equivalence" would pay of, e.g. compilation must not be performed if only a resource changed in a jar or a property file but only with class file changes. One can even go a step further and say that recompilation is even only needed if a |
I agree with this goal, but I don't think that we need reproducible build for that. By default, Relying on reproducible build for avoiding unnecessary recomputation is useful only if the previous step has already done unnecessary recomputation anyway, since it rewrote an identical JAR file. So the goal have been half-missed, and would be more efficiently achieved by the approach proposed in the previous paragraph. |
Yes, that's what we do, we don't overwrite if nothing has changed. But if you change the timestamp of the zip entries, the binary zip file will differ, and maven will overwrite. Which would break the whole thing.
This is not the timestamped of the files afaik. When you copy a file, maven does not set the timestamp to the value we're talking about here. This is irrelevant here. |
I don't think so. Try it on maven. Just run In all cases, even if we have smarter plugins, if the input data has changed somehow, you will have to run again. The only way to avoid that is to not change the input. And dependencies are part of the input. So even if we have a smarter api, we'll need stable artifacts during a build, else we'll loose any possibility of optimisation. |
projects can opt-out if they want or override with their preferred timestamp value, but by default, having Reproducible Builds is a nice improvement
Following this checklist to help us incorporate your
contribution quickly and easily:
for the change (usually before you start working on it). Trivial changes like typos do not
require a JIRA issue. Your pull request should address just this issue, without
pulling in other changes.
[MNG-XXX] SUMMARY
,where you replace
MNG-XXX
andSUMMARY
with the appropriate JIRA issue.[MNG-XXX] SUMMARY
.Best practice is to use the JIRA issue title in both the pull request title and in the first line of the commit message.
mvn clean verify
to make sure basic checks pass. A more thorough check willbe performed on your pull request automatically.
If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.
To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.
I hereby declare this contribution to be licenced under the Apache License Version 2.0, January 2004
In any other case, please file an Apache Individual Contributor License Agreement.