-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8360037: Refactor ImageReader in preparation for Valhalla support #26054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: pr/26044
Are you sure you want to change the base?
8360037: Refactor ImageReader in preparation for Valhalla support #26054
Conversation
* Stacking JDK-8359808 (dependent PR) changes on top temporarily. * Removing the last misleading/incorrect API from ImageReader. * WIP - looking at adding tests. * Removed all APIs which leak ImageLocation to ImageReader users.
* Tidying comments, removing dead methods and adding more todos for review discussion. * Moving and merging methods for better ordering, and tidying comments. * Tidying tests. * Reworking to fix symbolic link node creation. * Minor tidy up of comments. * Updating tests for new API. * Stacking JDK-8359808 (dependent PR) changes on top temporarily.
👋 Welcome back david-beaumont! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@david-beaumont this pull request can not be integrated into git checkout jdk_8360037_reader
git fetch https://git.openjdk.org/jdk.git pr/26044
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge pr/26044"
git push |
@david-beaumont The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything marked "TODO" is for discussion during the coming review. I don't intend to leave any of these in the code after, but they raise specific issues I'd like address.
@@ -92,143 +131,93 @@ private void requireOpen() { | |||
} | |||
} | |||
|
|||
// directory management interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A large number of the methods below were either:
- Unused or effectively no-ops.
- Breaking encapsulation and leaking underlying types such as
ImageLocation
(not a big problem now, but prevents cleanly implementing the preview mode logic for Valhalla).
The new API has essentially 3 top level methods here:
- findNode()
- getResource(Node)
- getResourceBuffer(Node)
Any navigation of the node hierarchy is done via getChildNames()
and unlike now, no user can obtain a reference to an "incomplete" node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these pre-preview comments, I think this is finally ready for review.
make/test/BuildMicrobenchmark.gmk
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore this file, it's part of the PR to add the benchmark. I'll merge and sort everything out once that's in.
@@ -119,9 +119,9 @@ byte[] getContent() throws IOException { | |||
} | |||
|
|||
@Override | |||
public List<Node> getChildren() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExplodedImage makes a custom subclass of Node to "fake" the ImageReader behaviour. While this will need to change for preview mode, for now it's enough to just modify the one affected API.
@@ -49,7 +49,7 @@ final class JrtFileAttributes implements BasicFileAttributes { | |||
//-------- basic attributes -------- | |||
@Override | |||
public FileTime creationTime() { | |||
return node.creationTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no benefit to having all these one-line getter methods duplicated in ImageReader
since it already provides the file attributes object directly.
@@ -225,19 +223,19 @@ Iterator<Path> iteratorOf(JrtPath path, DirectoryStream.Filter<? super Path> fil | |||
throw new NotDirectoryException(path.getName()); | |||
} | |||
if (filter == null) { | |||
return node.getChildren() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same logic, just based on the child names instead of the nodes.
@@ -58,7 +58,6 @@ static SystemImage open() throws IOException { | |||
if (modulesImageExists) { | |||
// open a .jimage and build directory structure | |||
final ImageReader image = ImageReader.open(moduleImageFile); | |||
image.getRootDirectory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method no longer serves any purpose (it used to initialize the root directory entries, but it's not needed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just uses child names rather than nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this class is best reviewed as if it were a new implementation, rather than trying to reason about the specific changes between the versions. Hopefully the comments will make the design clear and the reduced complexity/lines-of-code will help it be understood in a more stand-alone way.
return reader.getHeader(); | ||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method and the getResourceBuffer()
method below are, I think, only called in one place and could, in theory, be factored out of this class altogether.
@@ -263,8 +252,8 @@ public void close(ImageReader image) throws IOException { | |||
|
|||
if (openers.isEmpty()) { | |||
close(); | |||
// TODO (review note): Should this be synchronized by "this" ?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I genuinely this this might be an existing issue.
protected Node(String name, BasicFileAttributes fileAttrs) { | ||
this.name = Objects.requireNonNull(name); | ||
this.fileAttrs = Objects.requireNonNull(fileAttrs); | ||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these were either never called or no longer needed in the new implementation.
Webrevs
|
/reviewers 2 |
@AlanBateman |
import jdk.internal.jimage.ImageReader; | ||
import jdk.internal.jimage.ImageReaderFactory; | ||
import jdk.internal.access.JavaNetUriAccess; | ||
import jdk.internal.access.SharedSecrets; | ||
import jdk.internal.util.StaticProperty; | ||
import jdk.internal.module.ModuleHashes.HashSupplier; | ||
|
||
import static java.util.Objects.requireNonNull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing Objects.requireNonNull are okay, no need to change them all as it is nothing to do with the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@State(Scope.Benchmark) | ||
@OutputTimeUnit(TimeUnit.MILLISECONDS) | ||
@Fork(value = 1, jvmArgs = {"--add-exports", "java.base/jdk.internal.jimage=ALL-UNNAMED"}) | ||
public class ImageReaderBenchmark { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the benchmark from the other PR, did you mean to include it?
@@ -253,6 +248,26 @@ private static ModuleFinder ofModuleInfos() { | |||
return new SystemModuleFinder(mrefs, nameToModule); | |||
} | |||
|
|||
private static Stream<ModuleInfo.Attributes> getModuleAttributes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this to allModuleAttributes and add a method description to match the other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return getNode(reader, "/modules") | ||
.getChildNames() | ||
.map(mn -> getNode(reader, mn + "/module-info.class")) | ||
// This fails with ISE if the node isn't a resource (corrupt JImage). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you drop this comment and check getNode to thrown an Error (ISE isn't right when we have a broken image).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getNode()
is called twice and while it can throw the error, there's no nice way for getResourceBuffer()
to be that strict (it's a separate API).
After fiddling a bit I reworked the module attribute loading into a separate helper and caught & escalated any exceptions relating to that into Error
.
The only question now is that ModuleInfo.read()
has at least one other runtime exception (InvalidModuleDescriptorException
) it can throw which we could catch here. Should I add that to the list?
String nodeName = "/modules/" + module + "/" + name; | ||
Optional<ImageReader.Node> node = Optional.ofNullable(reader.findNode(nodeName)); | ||
if (node.isPresent() && !node.get().isResource()) { | ||
throw new IllegalStateException("Not a resource node: " + node.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IllegalStateException is problematic here, maybe you mean to add an assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called externally so cannot trust the given name really references a resource.
I can't therefore make it an assertion, but I can just return empty if it didn't point at a resource (even if it did point at a directory etc.). Or I can make it an Error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it return empty for now and documented it as such.
this.modulesStringOffset = findLocation("/modules/java.base").getModuleOffset(); | ||
this.packagesStringOffset = findLocation("/packages/java.lang").getModuleOffset(); | ||
|
||
// Node creation is very lazy, se can just make the top-level directories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
// Node creation is very lazy, se can just make the top-level directories | |
// Node creation is very lazy, so can just make the top-level directories |
this.nodes = new HashMap<>(); | ||
// TODO (review note): These should exist under all circumstances, but there's | ||
// probably a more robust way of getting it these offsets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// probably a more robust way of getting it these offsets. | |
// probably a more robust way of getting these offsets. |
// A node is completed when all its direct children have been built. As | ||
// such, non-directory nodes are always complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might flow better:
// A node is completed when all its direct children have been built. As | |
// such, non-directory nodes are always complete. | |
// A node is completed when all its direct children have been built. | |
// As such, non-directory nodes are always complete. |
} | ||
|
||
/** | ||
* Returns the JRY file system compatible name of this node (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "JRY"
@@ -274,7 +289,7 @@ private static class SystemModuleFinder implements ModuleFinder { | |||
|
|||
@Override | |||
public Optional<ModuleReference> find(String name) { | |||
Objects.requireNonNull(name); | |||
requireNonNull(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference is avoid static imports making the code readable by retaining the "Objects." qualifier otherwise the non-local method call appears out of thin air. YMMV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, given a couple of qualified ones existed here I'll qualify them all.
Personally I've evolved my opinion on this from preferring qualified use to thinking that the static import is more concise (and no less clear given the very unambiguous name).
private final Path image = buildJImage(IMAGE_ENTRIES); | ||
|
||
@Test | ||
public void testModuleDirectories() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please, it will make failures easier to identity.
/label -build |
@magicus |
Refactoring
ImageReader
to make it easy to add preview mode functionality for Valhalla.This PR is a large change to
ImageReader
(effectively a rewrite) but reduces the surface area of the API significantly, reduces code complexity and increases performance/memory efficiency. The need for this sort of change comes from the need to support "preview mode" in Valhalla for classes loaded from core modules.Preview Mode
In the proposed preview mode support for Valhalla, a resource with the name
/modules/<module>/<path>
may come from one of two locations in the underlying jimage file;/<module>/<path>
or/<module>/META-INF/preview/<path>
. Furthermore, directories (represented as directory nodes via the API) will have a potentially different number of child nodes depending on whether preview mode is in effect, and some directories may only exist at all in preview mode.Furthermore, the directories and symbolic link nodes in
/packages/...
will also be affected by the existence of new package directories. To retain consistency and avoid "surprises" later, all of this needs to be taken into account.Below is a summary of the main changes for mainline JDK to better support preview mode later:
1: Improved encapsulation for preview mode
The current
ImageReader
code exposes the data from the jimage file in several ways; via theNode
abstraction, but also with methods which return anImageLocation
instance directly. In preview mode it would not be acceptable to return theImageLocation
, since that would leak the existence of resources in theMETA-INF/preview
namespace and lets users see resource nodes with names that don't match the underlyingImageLocation
from which their contents come.As such, the PR removes all methods which can leak
ImageLocation
or other "underlying" semantics about the jimage file. Luckily most of these are either used minimally and easily migrated to using nodes, or they were not being used at all. This PR also removes any methods which were already unused across the OpenJDK codebase (if I discover any issues with over-trimming of the API during full CI testing, it will be easy to address).2. Simplification of directory child node handling
The current
ImageReader
code attempts to create parent directories "on demand" for any child nodes it creates. This results in parent directories having a non-empty but incomplete set of child nodes present. This is referred to as the directory being "incomplete" and requires users of the API to understand this and potentially "complete" these nodes by calling back into theImageReader
API.This approach is an attempt to provide lazy node creation (which is a necessary trait since there are ~31,000 nodes in a typical jimage hierarchy) but isn't actually necessary since a child node has no "get parent" functionality and simply not creating parent nodes at the point is functionally equivalent.
The partially complete directory nodes also pose an issue for preview mode support, making it more complex to reason about where and when preview child nodes should be applied.
This PR simplifies directory node handling in two ways:
This results in simpler to reason about code for directory management and simpler changes later to support preview mode. Removing the on-demand parent directory creation and the partial child list management also has a noticeable performance improvement (based on the new
ImageReaderBenchmark
in pr/26044).3. More comments and code assertions
Since
ImageReader
is a critical bit of internal tooling for OpenJDK, I made sure to add significant comments explaining the behaviour, as well as adding many new in-code assert statements. The version ofImageReader
in this PR is 70 lines shorter than the current version, but if you account for new comments, it's really a reduction of almost 40% (640 -> 400 lines of code) and has over 200 more comment lines.4. New tests
I added a new unit test for
ImageReader
since, until now, its API was not really being tested directly. These tests are useful now, but will really help when preview mode is added in Valhalla, since then there will need to be significant edge-case testing. I'm happy to improve or change tests in this PR, but they definitely cover the main cases.5. Adding TODO comments for review discussion.
There are obviously some open questions about the exact design of the APIs, and some questions around behaviour. To make reviewing easier, I've added inline TODOs which are there to illicit feedback. I will account for an remove all of these before the PR is integrated, but I want reviewers to read and comment on them (even if there is no change expected).
Performance Improvements
On my laptop (not objectively interesting, but good for comparison) I am seeing a significant performance improvement over all benchmarks and a reduction in timing variability. These benchmarks are awkward because of the need to test things in a "cold start" state, so timings will have larger variability than expected, but the performance improvements are consistent and non trivial:
Run with
make test TEST="micro:jdk.internal.jrtfs.ImageReaderBenchmark"
using the benchmark currently being added inpr/26044
:Current version:
This PR:
Shows a >2.5x speedup for traversing all nodes in a "cold start" state, and ~1.7x speedup for loading the core set of classes needed to start javac.
Progress
Integration blocker
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26054/head:pull/26054
$ git checkout pull/26054
Update a local copy of the PR:
$ git checkout pull/26054
$ git pull https://git.openjdk.org/jdk.git pull/26054/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26054
View PR using the GUI difftool:
$ git pr show -t 26054
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26054.diff
Using Webrev
Link to Webrev Comment