Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable reproducible builds #24366

Merged
merged 1 commit into from
Apr 8, 2023
Merged

Conversation

hboutemy
Copy link
Contributor

@hboutemy hboutemy commented Apr 1, 2023

just applied fixes proposed by mvn artifact:check-buildplan, because release 7.0.3 is not reproducible at all (outputTimestamp fully dropped instead of setting an initial value)

see https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/org/glassfish/main/README.md

pom.xml Outdated Show resolved Hide resolved
@hboutemy
Copy link
Contributor Author

hboutemy commented Apr 2, 2023

PR updated to have a current timestamp: reflects more reality of the value that you'd have had when switching to 7.0.4-SNAPSHOT (FYI I did it simply by running mvn versions:set)

this PR is about Reproducible Builds: https://reproducible-builds.org/ = the objective is to have same binary as output

having a reproducible timestamp is a necessity to get the same output binary from 2 builds
And from that reproducible timestamp, many recent plugins do their job to ensure reproducible binary output (

you can get the result with versions:set, maven-release-plugin or by using git commit timestamp, as you wish (no resignation in any strategy): you don't need to change your current release practice

@dmatej
Copy link
Contributor

dmatej commented Apr 2, 2023

the objective is to have same binary as output

Hmmm, originally I wanted to say that binaries are still not the same and the date doesn't ensure that, but to my surprise the simplest GJULE.jar has always the same sha after your change and differs without that. But I think I understand you. Is it related to something new in Maven 4?

So I tried that also with the distributed glassfish.zip:
First run: 119157795 B, sha512: 2551d2e01864e12c0eaf0748de3c95cb945705c49abd25c6a5de4695790172f7a687d9f1c892e6edb9761bb737352ccabacf42e045639f0ada61b9a8beef8f2e
Second run: 119157586 B, sha512: 344d81b146dcc8787a447c9b069101dd2b6cbd2e3ed9add01c30d6ff3c75e63a723fdb382dbe53d214ec7faa7f65b0408e812a5dba7b670bb62cf1a9c0a54afa
Third run: 119157663 B, sha512:
1809afa52dfd4f2ec3e38a02dde8840b2e7dfc6696384e754b59c8896abbca88b8d55a7a01dab3eb52aa1d488167319cbc978e4c4df010576b57b2124ed4d4b2

So no, we still don't have the same binary output. And this artifact is the most important.

I still have doubts about this idea, but I understand now why @pzygielo wrote that he doesn't need the timestamp at all. After we merge this PR, we simply have to remove also all usages from properties and the code, because it provides false information and is worthless.

As a result we lose some information just to pass mvn artifact:check-buildplan, which in fact doesn't verify that the artifact will be the same in every build of the same commit - it passed on this PR, but I proved that the zip file is different in every build.

So then we have to find out why the zip changes between builds. Alright, we can merge this, temporarily as this is not the final fix. I don't like that I am losing the simplest possible way to check if I started the right artifact, but what can I do ...

@dmatej
Copy link
Contributor

dmatej commented Apr 2, 2023

@hboutemy Thank you for your patience, btw. ;-)

@dmatej dmatej added this to the 7.0.4 milestone Apr 2, 2023
@hboutemy
Copy link
Contributor Author

hboutemy commented Apr 2, 2023

But I think I understand you. Is it related to something new in Maven 4?

everything happens at plugins level, nothing at Maven level itself: if you want to see more details, the hard core ones are here https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74682318

artifact:check-buildplan is a quick check of basics, with help for quick fixes. You're right, it's not a real check.
Really checking is covered in https://maven.apache.org/guides/mini/guide-reproducible-builds.html#how-to-test-my-maven-build-reproducibility
And for studying differences between different outputs, diffoscope is our friend https://diffoscope.org/

I have reproduced more than 1200 releases until now https://github.com/jvm-repo-rebuild/reproducible-central , from smallest projects to biggest ones
Big projects usually require a few releases to fix issues step by step: see for example https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/org/apache/nifi/nifi/README.md

@hboutemy
Copy link
Contributor Author

hboutemy commented Apr 2, 2023

i ran a real test with this PR applied: mvn clean install -Pfastest && mvn clean verify -Pfastest artifact:compare

many remaining issues are just

diffoscope target/reference/console-jca-plugin-7.0.4-SNAPSHOT.jar appserver/admingui/jca/target/console-jca-plugin.jar
--- target/reference/console-jca-plugin-7.0.4-SNAPSHOT.jar
+++ appserver/admingui/jca/target/console-jca-plugin.jar
├── META-INF/hk2-locator/default
│ @@ -1,7 +1,7 @@
│  #
│ -# Generated on Mon Apr 03 00:02:42 CEST 2023 by hk2-inhabitant-generator
│ +# Generated on Mon Apr 03 00:05:12 CEST 2023 by hk2-inhabitant-generator
│  #

easy to solve: i can provide more PRs after the current first one to make sure release 7.0.4 will have those simple issues fixed

but as you mentioned in #24367, you have a choice to do: do you want Reproducible Builds or not?

@dmatej dmatej requested review from arjantijms and hs536 April 3, 2023 06:12
@hs536
Copy link
Contributor

hs536 commented Apr 3, 2023

IMHO, setting the value of project.build.outputTimestamp at the time of release on a release branch will solve both problems.

from(current release commit proc):

master
+-@someone: Something latest commit
+-@glassfish-bot: Prepare release 7.0.4
+-@glassfish-bot: Prepare next development cycle for 7.0.5-SNAPSHOT

to:

master
+-@someone: Something latest commit
|  +-@glassfish-bot: Prepare release 7.0.4 (* set timestamp and create tag here)
+-@glassfish-bot: Prepare next development cycle for 7.0.5-SNAPSHOT

What do you think?

@hs536
Copy link
Contributor

hs536 commented Apr 3, 2023

IMHO, setting the value of project.build.outputTimestamp at the time of release on a release branch will solve both problems.

This can be done with a few changes to Jenkins' settings. (This does not solve the problem of the zip's sha changing for each build, though)

@dmatej
Copy link
Contributor

dmatej commented Apr 3, 2023

IMHO, setting the value of project.build.outputTimestamp at the time of release on a release branch will solve both problems.

This can be done with a few changes to Jenkins' settings. (This does not solve the problem of the zip's sha changing for each build, though)

Better would be to use maven release plugin, because it can do that automatically. But if I am the only person which uses this "feature" in logs, it is not worth of the effort.
I would not like to implement it in Jenkins, I would like to make it more simple, so use standard release:prepare and release:perform.
Another argument (and even more important) is that some tools may be affected, as Herve wrote; see #24367 where I tried to simplify description of the issue. Basically some tools check that built artifacts did not change. But it the value is generated, they do change -> ie. editor will rebuild the whole workspace every time. Then it would be an issue even for snapshots, @hboutemy is right.

@hboutemy
Copy link
Contributor Author

hboutemy commented Apr 3, 2023

I suppose the current release process uses mvn versions:set: it is perfectly supported and gives same result as maven-release-plugin, you are not forced to change

@hs536
Copy link
Contributor

hs536 commented Apr 4, 2023

My intention may not have been clear. I propose that we do not explicitly set a value for "project.build.outputTimestamp" during development, and only set it for release. This way, SNAPSHOT versions can still get the usual timestamp.

@hboutemy
Copy link
Contributor Author

hboutemy commented Apr 4, 2023

oh, now I better understand: that would mean non-Reproducible Builds during SNAPSHOT, and Reproducible Builds only on releases. This is not a scenario I expected, it will require adding and removing configuration forth and back: looks complex.
If you absolutely want to have a moving timestamp even during development, I suppose the best option is then using last commit timestamp

@dmatej
Copy link
Contributor

dmatej commented Apr 4, 2023

Yes, that was my idea too, but if tools like Eclipse IDE check checksums of generated artifacts, it would be still wrong, unreproducible build. For example if m2e would use this instead of current monitoring of classes for quick rebuilds of the workspace:

  • old way, builds not reproducible: I change something in one dependency, used by many other -> direct dependencies are rebuilt too -> their dependencies -> ... ... -> everything was rebuilt.
  • reproducible builds: I change something in one dependency, used by many other -> they will rebuild too -> but their checksum did not change -> no need to rebuild transitive dependencies, we are done.

I believe this is maybe more important than reproducible builds of releases, which are always built just once, uploaded, and not changed forever, that is ensured by rules of Maven Central. If it works how I described here :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants