Skip to content

8360037: Refactor ImageReader in preparation for Valhalla support #26054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: pr/26044
Choose a base branch
from

Conversation

david-beaumont
Copy link
Contributor

@david-beaumont david-beaumont commented Jun 30, 2025

Refactoring ImageReader to make it easy to add preview mode functionality for Valhalla.

This PR is a large change to ImageReader (effectively a rewrite) but reduces the surface area of the API significantly, reduces code complexity and increases performance/memory efficiency. The need for this sort of change comes from the need to support "preview mode" in Valhalla for classes loaded from core modules.

Preview Mode

In the proposed preview mode support for Valhalla, a resource with the name /modules/<module>/<path> may come from one of two locations in the underlying jimage file; /<module>/<path> or /<module>/META-INF/preview/<path>. Furthermore, directories (represented as directory nodes via the API) will have a potentially different number of child nodes depending on whether preview mode is in effect, and some directories may only exist at all in preview mode.

Furthermore, the directories and symbolic link nodes in /packages/... will also be affected by the existence of new package directories. To retain consistency and avoid "surprises" later, all of this needs to be taken into account.

Below is a summary of the main changes for mainline JDK to better support preview mode later:

1: Improved encapsulation for preview mode

The current ImageReader code exposes the data from the jimage file in several ways; via the Node abstraction, but also with methods which return an ImageLocation instance directly. In preview mode it would not be acceptable to return the ImageLocation, since that would leak the existence of resources in the META-INF/preview namespace and lets users see resource nodes with names that don't match the underlying ImageLocation from which their contents come.

As such, the PR removes all methods which can leak ImageLocation or other "underlying" semantics about the jimage file. Luckily most of these are either used minimally and easily migrated to using nodes, or they were not being used at all. This PR also removes any methods which were already unused across the OpenJDK codebase (if I discover any issues with over-trimming of the API during full CI testing, it will be easy to address).

2. Simplification of directory child node handling

The current ImageReader code attempts to create parent directories "on demand" for any child nodes it creates. This results in parent directories having a non-empty but incomplete set of child nodes present. This is referred to as the directory being "incomplete" and requires users of the API to understand this and potentially "complete" these nodes by calling back into the ImageReader API.

This approach is an attempt to provide lazy node creation (which is a necessary trait since there are ~31,000 nodes in a typical jimage hierarchy) but isn't actually necessary since a child node has no "get parent" functionality and simply not creating parent nodes at the point is functionally equivalent.

The partially complete directory nodes also pose an issue for preview mode support, making it more complex to reason about where and when preview child nodes should be applied.

This PR simplifies directory node handling in two ways:

  1. Directory nodes are either empty (incomplete) or have all child nodes present.
  2. Directory nodes no longer have an API to return their (possibly incomplete) child nodes to users.

This results in simpler to reason about code for directory management and simpler changes later to support preview mode. Removing the on-demand parent directory creation and the partial child list management also has a noticeable performance improvement (based on the new ImageReaderBenchmark in pr/26044).

3. More comments and code assertions

Since ImageReader is a critical bit of internal tooling for OpenJDK, I made sure to add significant comments explaining the behaviour, as well as adding many new in-code assert statements. The version of ImageReader in this PR is 70 lines shorter than the current version, but if you account for new comments, it's really a reduction of almost 40% (640 -> 400 lines of code) and has over 200 more comment lines.

4. New tests

I added a new unit test for ImageReader since, until now, its API was not really being tested directly. These tests are useful now, but will really help when preview mode is added in Valhalla, since then there will need to be significant edge-case testing. I'm happy to improve or change tests in this PR, but they definitely cover the main cases.

5. Adding TODO comments for review discussion.

There are obviously some open questions about the exact design of the APIs, and some questions around behaviour. To make reviewing easier, I've added inline TODOs which are there to illicit feedback. I will account for an remove all of these before the PR is integrated, but I want reviewers to read and comment on them (even if there is no change expected).

Performance Improvements

On my laptop (not objectively interesting, but good for comparison) I am seeing a significant performance improvement over all benchmarks and a reduction in timing variability. These benchmarks are awkward because of the need to test things in a "cold start" state, so timings will have larger variability than expected, but the performance improvements are consistent and non trivial:

Run with make test TEST="micro:jdk.internal.jrtfs.ImageReaderBenchmark" using the benchmark currently being added in pr/26044:

Current version:

Benchmark                                            Mode  Cnt   Score    Error  Units
ImageReaderBenchmark.warmCache_CountAllNodes         avgt    5   0.784 ±  0.023  ms/op
ImageReaderBenchmark.coldStart_CountOnly               ss    5  37.910 ± 28.554  ms/op
ImageReaderBenchmark.coldStart_InitAndCount            ss    5  37.471 ± 18.775  ms/op
ImageReaderBenchmark.coldStart_LoadJavacInitClasses    ss    5   4.279 ±  7.959  ms/op

This PR:

Benchmark                                            Mode  Cnt   Score   Error  Units
ImageReaderBenchmark.warmCache_CountAllNodes         avgt   10   0.880 ± 0.086  ms/op
ImageReaderBenchmark.coldStart_CountOnly               ss   10  14.303 ± 3.975  ms/op
ImageReaderBenchmark.coldStart_InitAndCount            ss   10  14.032 ± 3.068  ms/op
ImageReaderBenchmark.coldStart_LoadJavacInitClasses    ss   10   2.530 ± 0.348  ms/op

Shows a >2.5x speedup for traversing all nodes in a "cold start" state, and ~1.7x speedup for loading the core set of classes needed to start javac.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Integration blocker

 ⚠️ Dependency #26044 must be integrated first

Issue

  • JDK-8360037: Refactor ImageReader in preparation for Valhalla support (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26054/head:pull/26054
$ git checkout pull/26054

Update a local copy of the PR:
$ git checkout pull/26054
$ git pull https://git.openjdk.org/jdk.git pull/26054/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26054

View PR using the GUI difftool:
$ git pr show -t 26054

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26054.diff

Using Webrev

Link to Webrev Comment

* Stacking JDK-8359808 (dependent PR) changes on top temporarily.
* Removing the last misleading/incorrect API from ImageReader.
* WIP - looking at adding tests.
* Removed all APIs which leak ImageLocation to ImageReader users.
* Tidying comments, removing dead methods and adding more todos for review discussion.
* Moving and merging methods for better ordering, and tidying comments.
* Tidying tests.
* Reworking to fix symbolic link node creation.
* Minor tidy up of comments.
* Updating tests for new API.
* Stacking JDK-8359808 (dependent PR) changes on top temporarily.
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 30, 2025

👋 Welcome back david-beaumont! A progress list of the required criteria for merging this PR into pr/26044 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 30, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jun 30, 2025

@david-beaumont this pull request can not be integrated into pr/26044 due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout jdk_8360037_reader
git fetch https://git.openjdk.org/jdk.git pr/26044
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge pr/26044"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jun 30, 2025
@openjdk
Copy link

openjdk bot commented Jun 30, 2025

@david-beaumont The following labels will be automatically applied to this pull request:

  • build
  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added build build-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jun 30, 2025
Copy link
Contributor Author

@david-beaumont david-beaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything marked "TODO" is for discussion during the coming review. I don't intend to leave any of these in the code after, but they raise specific issues I'd like address.

@@ -92,143 +131,93 @@ private void requireOpen() {
}
}

// directory management interface
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large number of the methods below were either:

  1. Unused or effectively no-ops.
  2. Breaking encapsulation and leaking underlying types such as ImageLocation (not a big problem now, but prevents cleanly implementing the preview mode logic for Valhalla).

The new API has essentially 3 top level methods here:

  1. findNode()
  2. getResource(Node)
  3. getResourceBuffer(Node)

Any navigation of the node hierarchy is done via getChildNames() and unlike now, no user can obtain a reference to an "incomplete" node.

Copy link
Contributor Author

@david-beaumont david-beaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these pre-preview comments, I think this is finally ready for review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this file, it's part of the PR to add the benchmark. I'll merge and sort everything out once that's in.

@@ -119,9 +119,9 @@ byte[] getContent() throws IOException {
}

@Override
public List<Node> getChildren() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExplodedImage makes a custom subclass of Node to "fake" the ImageReader behaviour. While this will need to change for preview mode, for now it's enough to just modify the one affected API.

@@ -49,7 +49,7 @@ final class JrtFileAttributes implements BasicFileAttributes {
//-------- basic attributes --------
@Override
public FileTime creationTime() {
return node.creationTime();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no benefit to having all these one-line getter methods duplicated in ImageReader since it already provides the file attributes object directly.

@@ -225,19 +223,19 @@ Iterator<Path> iteratorOf(JrtPath path, DirectoryStream.Filter<? super Path> fil
throw new NotDirectoryException(path.getName());
}
if (filter == null) {
return node.getChildren()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic, just based on the child names instead of the nodes.

@@ -58,7 +58,6 @@ static SystemImage open() throws IOException {
if (modulesImageExists) {
// open a .jimage and build directory structure
final ImageReader image = ImageReader.open(moduleImageFile);
image.getRootDirectory();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method no longer serves any purpose (it used to initialize the root directory entries, but it's not needed now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just uses child names rather than nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this class is best reviewed as if it were a new implementation, rather than trying to reason about the specific changes between the versions. Hopefully the comments will make the design clear and the reduced complexity/lines-of-code will help it be understood in a more stand-alone way.

return reader.getHeader();
}

/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method and the getResourceBuffer() method below are, I think, only called in one place and could, in theory, be factored out of this class altogether.

@@ -263,8 +252,8 @@ public void close(ImageReader image) throws IOException {

if (openers.isEmpty()) {
close();
// TODO (review note): Should this be synchronized by "this" ??
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I genuinely this this might be an existing issue.

protected Node(String name, BasicFileAttributes fileAttrs) {
this.name = Objects.requireNonNull(name);
this.fileAttrs = Objects.requireNonNull(fileAttrs);
}

/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these were either never called or no longer needed in the new implementation.

@david-beaumont david-beaumont marked this pull request as ready for review July 1, 2025 14:13
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 1, 2025
@mlbridge
Copy link

mlbridge bot commented Jul 1, 2025

Webrevs

@AlanBateman
Copy link
Contributor

/reviewers 2

@openjdk
Copy link

openjdk bot commented Jul 1, 2025

@AlanBateman
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

import jdk.internal.jimage.ImageReader;
import jdk.internal.jimage.ImageReaderFactory;
import jdk.internal.access.JavaNetUriAccess;
import jdk.internal.access.SharedSecrets;
import jdk.internal.util.StaticProperty;
import jdk.internal.module.ModuleHashes.HashSupplier;

import static java.util.Objects.requireNonNull;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing Objects.requireNonNull are okay, no need to change them all as it is nothing to do with the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 1, jvmArgs = {"--add-exports", "java.base/jdk.internal.jimage=ALL-UNNAMED"})
public class ImageReaderBenchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the benchmark from the other PR, did you mean to include it?

@@ -253,6 +248,26 @@ private static ModuleFinder ofModuleInfos() {
return new SystemModuleFinder(mrefs, nameToModule);
}

private static Stream<ModuleInfo.Attributes> getModuleAttributes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this to allModuleAttributes and add a method description to match the other methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return getNode(reader, "/modules")
.getChildNames()
.map(mn -> getNode(reader, mn + "/module-info.class"))
// This fails with ISE if the node isn't a resource (corrupt JImage).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you drop this comment and check getNode to thrown an Error (ISE isn't right when we have a broken image).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNode() is called twice and while it can throw the error, there's no nice way for getResourceBuffer() to be that strict (it's a separate API).
After fiddling a bit I reworked the module attribute loading into a separate helper and caught & escalated any exceptions relating to that into Error.
The only question now is that ModuleInfo.read() has at least one other runtime exception (InvalidModuleDescriptorException) it can throw which we could catch here. Should I add that to the list?

String nodeName = "/modules/" + module + "/" + name;
Optional<ImageReader.Node> node = Optional.ofNullable(reader.findNode(nodeName));
if (node.isPresent() && !node.get().isResource()) {
throw new IllegalStateException("Not a resource node: " + node.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IllegalStateException is problematic here, maybe you mean to add an assert?

Copy link
Contributor Author

@david-beaumont david-beaumont Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called externally so cannot trust the given name really references a resource.
I can't therefore make it an assertion, but I can just return empty if it didn't point at a resource (even if it did point at a directory etc.). Or I can make it an Error.

Copy link
Contributor Author

@david-beaumont david-beaumont Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it return empty for now and documented it as such.

this.modulesStringOffset = findLocation("/modules/java.base").getModuleOffset();
this.packagesStringOffset = findLocation("/packages/java.lang").getModuleOffset();

// Node creation is very lazy, se can just make the top-level directories
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

Suggested change
// Node creation is very lazy, se can just make the top-level directories
// Node creation is very lazy, so can just make the top-level directories

this.nodes = new HashMap<>();
// TODO (review note): These should exist under all circumstances, but there's
// probably a more robust way of getting it these offsets.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// probably a more robust way of getting it these offsets.
// probably a more robust way of getting these offsets.

Comment on lines +543 to +544
// A node is completed when all its direct children have been built. As
// such, non-directory nodes are always complete.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might flow better:

Suggested change
// A node is completed when all its direct children have been built. As
// such, non-directory nodes are always complete.
// A node is completed when all its direct children have been built.
// As such, non-directory nodes are always complete.

}

/**
* Returns the JRY file system compatible name of this node (e.g.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "JRY"

@@ -274,7 +289,7 @@ private static class SystemModuleFinder implements ModuleFinder {

@Override
public Optional<ModuleReference> find(String name) {
Objects.requireNonNull(name);
requireNonNull(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference is avoid static imports making the code readable by retaining the "Objects." qualifier otherwise the non-local method call appears out of thin air. YMMV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, given a couple of qualified ones existed here I'll qualify them all.
Personally I've evolved my opinion on this from preferring qualified use to thinking that the static import is more concise (and no less clear given the very unambiguous name).

private final Path image = buildJImage(IMAGE_ENTRIES);

@Test
public void testModuleDirectories() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please, it will make failures easier to identity.

@magicus
Copy link
Member

magicus commented Jul 1, 2025

/label -build

@openjdk openjdk bot removed the build build-dev@openjdk.org label Jul 1, 2025
@openjdk
Copy link

openjdk bot commented Jul 1, 2025

@magicus
The build label was successfully removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org merge-conflict Pull request has merge conflict with target branch rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

5 participants