Skip to content

Commit 13c6001

Browse files
JDK-8180450 article (#4)
* JDK-8180450 article * Apply suggestions from code review Co-authored-by: John O'Hara <johara@redhat.com> * Apply youtube embed suggestion * Apply suggestions from code review Co-authored-by: John O'Hara <johara@redhat.com> * Adding mentioning JDK work and adjusting stack size * changed jdk master branch to proper commit * added Infinispan team to the party * Add scary header image * Some additional changes Co-authored-by: John O'Hara <johara@redhat.com>
1 parent 4ab077e commit 13c6001

File tree

5 files changed

+213
-0
lines changed

5 files changed

+213
-0
lines changed
Loading
Loading
Loading
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
---
2+
title: "Journey to a N-fold improvement due to a stealthy scalability issue"
3+
date: 2022-11-16T00:00:00Z
4+
categories: ['performance', 'benchmarking', 'methodology']
5+
summary: 'Learn about fixing a severe JIT scalability issue'
6+
image: 'type_pollution.png'
7+
related: ['']
8+
authors:
9+
- Francesco Nigro
10+
---
11+
12+
This article is a detailed and complementary content to *Quarkus Insights #107: Quarkus Runtime performance - a peek into JVM internals*:
13+
++++
14+
<div class="imageblock right text-center">
15+
<div class="content" style="text-align: center;">
16+
<iframe width="560" height="315" src="https://www.youtube.com/embed/G40VfIsnCdo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
17+
</div>
18+
</div>
19+
++++
20+
This episode treats a breed of issues caused by the scary (was registered at Halloween on purpose) JDK issue; https://bugs.openjdk.org/browse/JDK-8180450["JDK-8180450: secondary_super_cache does not scale well"]
21+
that can causes massive performance degradation on several middleware frameworks e.g. Netty, Vert-x, Mutiny, Hibernate, Quarkus...
22+
23+
== What's JDK-8180450?
24+
25+
For those who haven't yet watched https://youtu.be/G40VfIsnCdo[Quarkus Insights #107], `JDK-8180450` is a severe scalability penalty happening while
26+
performing type checks (`instanceof`, `checkcast` and similar) against interfaces (so-called secondary super types).
27+
28+
This scalability issue is triggered by massive concurrent updates to `Klass::_secondary_super_cache`
29+
from multiple threads, which in turn causes https://en.wikipedia.org/wiki/False_sharing[False Sharing] with its surrounding fields eg `Klass::_secondary_supers`.
30+
31+
This https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/share/oops/klass.hpp#L133[JDK 11 snippet] shows both fields and
32+
what they are used for:
33+
34+
{{< highlight cpp >}}
35+
class Klass : public Metadata {
36+
// ...
37+
// Cache of last observed secondary supertype
38+
Klass* _secondary_super_cache;
39+
// Array of all secondary supertypes
40+
Array<Klass*>* _secondary_supers;
41+
// ...
42+
{{< / highlight >}}
43+
44+
To make it more concrete:
45+
46+
* https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Integer.html[`Integer`] 's implement both `Comparable` and `Serializable`
47+
* `_secondary_supers` contains both `Comparable` and `Serializable`,
48+
* `_secondary_super_cache` is used byt he JVM like this:
49+
{{< highlight java >}}
50+
Object o = new Integer(1);
51+
// _seconday_super_cache is empty
52+
// search Comparable on [ Comparable, Serializable ]
53+
if (o instanceof Comparable) {
54+
// _seconday_super_cache := Comparable -> cache HIT!
55+
useComparable((Comparable) o);
56+
}
57+
{{< / highlight >}}
58+
The JVM does not need to search for `Comparable` in the `_secondary_supers` array when evaluating `checkcast` in the `useComparable`, instead it can
59+
use the `_secondary_super_cache` field, saving useless work.
60+
61+
So, *What causes massive amounts of updates to the cache field*?
62+
63+
The answer is simpler then it seems: making it ping-pong between multiple values contained in `_secondary_supers`!
64+
65+
The code below shows an example of this dangerous code pattern:
66+
{{< highlight java>}}
67+
// o.getClass() == Integer.class
68+
public static void compareAndSerialize(Object o) {
69+
if (o instanceof Comparable) {
70+
// _seconday_super_cache := Comparable
71+
useComparable((Comparable) o);
72+
}
73+
if (o instanceof Serializable) {
74+
// _seconday_super_cache := Serializable
75+
useSerializable((Serializable) o);
76+
}
77+
}
78+
{{< / highlight >}}
79+
The comments in the code show what happens if this method is called by a single thread, causing two `_secondary_super_cache`
80+
invalidations for every `compareAndSerializable` method execution, but
81+
82+
*What happens if there are more than one thread*?
83+
84+
`Integer::_secondary_super_cache` is shared among all threads, leading to an increased chance to invalidate it at *every type check* (including `checkcast` 's on `useComparable`/`useSerializable`!).
85+
86+
== Why's that scary?
87+
88+
As mentioned on https://youtu.be/G40VfIsnCdo[Quarkus Insights #107], the Red Hat Middleware Team uses different versions of
89+
https://www.techempower.com/[Techempower Benchmark] to validate https://quarkus.io[Quarkus]'s performance regressions.
90+
Hitting such scalability bottleneck in our lab has caused a performance degradation quite significant while running on our `Intel(R) Xeon(R) Gold 5218 CPU` with 64 logical cores (32 without Hyper-Threading, splitted in 2 https://en.wikipedia.org/wiki/Non-uniform_memory_access[NUMA] nodes):
91+
92+
image::64_cores.png[64 cores plaintext results]
93+
94+
But what about running the same test with just 2 cores?
95+
96+
image::2_cores.png[2 cores plaintext results]
97+
98+
As shown, even with few cores, if the mentioned bad code pattern is hit in the hot path, the effects are still dramatic!
99+
100+
=== A stealthy performance killer!
101+
102+
The scariest part has yet to come: what we discovered while investigating the https://github.com/netty/netty/issues/12708[most severe case]
103+
affecting our Quarkus results, is that "traditional" Java profiling tools didn't seem to help diagnosing it
104+
(eg https://access.redhat.com/documentation/en-us/openjdk/11/html/using_jdk_flight_recorder_with_openjdk/index[Java Flight Recorder], https://github.com/jvm-profiling-tools/async-profiler[Async Profiler]).
105+
Furthermore, a few weeks after the Quarkus Insights recording, Netflix has published an https://netflixtechblog.com/seeing-through-hardware-counters-a-journey-to-threefold-performance-increase-2721924a2822[article]
106+
highlighting the same problem while using traditional Java Profilers.
107+
108+
Specifically, https://access.redhat.com/documentation/en-us/openjdk/11/html/using_jdk_flight_recorder_with_openjdk/index[Java Flight Recorder] wasn't reporting
109+
any guilty stack trace pointing to the source of the issue, while https://github.com/jvm-profiling-tools/async-profiler[Async Profiler] was reporting
110+
broken stack traces (with the right cost/samples count). In both cases the bad code pattern was unnoticed, although
111+
the latter provided some hint that something incorrect was occurring.
112+
113+
image::async_profiler_2_8_broken.png[Async Profiler 2.8 plaintext CPU flamegraph]
114+
115+
In the above https://netflixtechblog.com/java-in-flames-e763b3d32166[flamegraph], the bottom left of the graph shows
116+
some costy methods, but failing to report the correct methods: not useful to fix it!
117+
118+
We've further investigated and discovered that the JFR mechanism to collect stack traces
119+
(implemented as http://psy-lob-saw.blogspot.com/2016/06/the-pros-and-cons-of-agct.html[AsyncGetCallTrace])
120+
isn't able to collect meaningful ones. This is due to https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5486[JDK's type check code],
121+
which modifies the stack size and by consequence the x86 register used to track the https://en.wikipedia.org/wiki/Stack_register[Stack Pointer]
122+
(more on this https://github.com/jvm-profiling-tools/async-profiler/issues/673#issuecomment-1304504745[Async Profiler issue's comment]).
123+
124+
Thanks to this investigation, a recent https://bugs.openjdk.org/browse/JDK-8287425[JDK change] fixing the mentioned behaviour
125+
(which has been https://github.com/openjdk/jdk11u-dev/pull/1524[back-ported to JDK 11]) ,
126+
making this issue more easily spotted with existing (`AsyncGetCallTrace`-based) Java Profilers: a proper "Community Happy End".
127+
128+
==== ...and False Sharing?
129+
130+
In addition to the above issue, https://en.wikipedia.org/wiki/False_sharing[False Sharing] confuses profiling tools even further.
131+
132+
For every invalidation of `Klass::_secondary_super_cache`, a search over `Klass::_secondary_supers` is performed,
133+
causing a severe penalty (if stored in the same cache-line of the "noisy" `Klass` field) and impacting other innocent code paths too:
134+
135+
{{< highlight java>}}
136+
// o.getClass() == Integer.class
137+
// _secondary_super_cache always != CharSequence
138+
// slow path search on _secondary_supers := [ Comparable, Serializable ]
139+
if (o instanceof CharSequence) {
140+
// never happen
141+
} else {
142+
// always happen
143+
}
144+
{{< / highlight >}}
145+
The snippet above shows that performing a type check for a non-implemented interface causes the JVM to perform a slow path search
146+
over `_secondary_supers`. This in turn forces loading of the related field and causes
147+
a ping-pong of cache coherency traffic. This has the effect of slowing down the innocent load instruction and causes a profiler
148+
to highlight it as a cpu bottleneck.
149+
150+
The issue is detailed in this https://youtu.be/G40VfIsnCdo?t=1411[Quarkus Insights part]
151+
152+
=== OpenJDK compiler is (too) good
153+
154+
Some noticeable optimization strategies of OpenJDK are:
155+
156+
* https://wiki.openjdk.org/display/HotSpot/Inlining[Inlining]
157+
* https://wiki.openjdk.org/display/HotSpot/TypeProfile[Type Profile]
158+
159+
Thanks to both optimizations, the JDK Compiler can drop type-check operations preventing the issue from happening.
160+
Unlucky runtime conditions can then cause such optimizations to fail (as mentioned in https://youtu.be/G40VfIsnCdo?t=1692[this Quarkus Insights part]),
161+
unexpectedly biting end-users.
162+
163+
This smart compiler behaviour, together with the previous mentioned profiler issues, has meant this issue has lay dormant and unnoticed for some time,
164+
especially in benchmarking environments, where overly simplistic use cases were leaving the compiler more
165+
room to optimize and hide it.
166+
167+
== What's next?
168+
In order to support our developers and the Java Community to diagnose this stealthy and severe performance issue,
169+
we've developed a https://github.com/raphw/byte-buddy[ByteBuddy]-powered https://github.com/RedHatPerf/type-pollution-agent[Java Agent]. This has helped fix the
170+
most severe https://bugs.openjdk.org/browse/JDK-8180450?focusedCommentId=14536886&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14536886[code paths] of our products
171+
(and the widely used https://github.com/netty/netty/issues/12708[Netty Framework]) and hopefully will be used by other
172+
players of the Java Community to diagnose if their code can be affected by it.
173+
174+
Thanks to the reports of our developers and the rest of Java Community (including Netflix and Intel engineers),
175+
the https://bugs.openjdk.org/browse/JDK-8180450[scalability JDK issue] now has raised a priority and a fix will be targeted for the Java 21 release.
176+
177+
As a separate effort we've evaluated with the OpenJDK Red Hat team alternatives fixes to the infamous JDK issue as well, ending up rejecting ad-hoc flags to disable such optimization (as it was designed to be)
178+
because too harmful for existing users, which unconsciously have benefited from it till now. We are trusty that our feedbacks
179+
will help making a better decision for the Java Community, present and future.
180+
181+
=== Is it still worthy to fix our code?
182+
183+
As usual: *Depends(TM)*.
184+
185+
As mentioned in Quarkus Insights, some fixes have helped to use different and more idiomatic code
186+
solutions to replace the misuse (often abuse) of the existing type check-based logic, often leading to better performance and,
187+
moreover, way better code.
188+
189+
In addition, there isn't yet (at the time this article has been published) any mention of back-porting plans,
190+
meaning that legacy or just slow adopters (and existing users too) will be left behind, running code with suboptimal (at best) performance.
191+
192+
=== but I don't know yet how to use the Agent!
193+
194+
In the next articles of our blog we'll present more concrete examples of fixed code using
195+
https://github.com/RedHatPerf/type-pollution-agent[the agent] (presented on this https://youtu.be/G40VfIsnCdo?t=2235[Quarkus Insights part] too)
196+
and the impact of the OpenJDK compiler optimization strategies (briefly mentioned <<_openjdk_compilers_are_too_good,here>>).
197+
198+
== Special Thanks
199+
====
200+
* Andrew Haley
201+
* Andrew Dinn
202+
* Rafael Winterhalter
203+
* Sanne Grinovero
204+
* Norman Maurer
205+
* Andrei Pangin
206+
* Johannes Bechberger
207+
* Quarkus, Hibernate, Vert-x, Mutiny and Infinispan teams for the patience and the efforts to try the agent and fix our products trusting our tools
208+
* Netflix & Intel team and their https://netflixtechblog.com/seeing-through-hardware-counters-a-journey-to-threefold-performance-increase-2721924a2822[awesome article]
209+
mentioned in <<_a_stealthy_performance_killer,this>> section
210+
====
211+
212+
213+
Loading

0 commit comments

Comments
 (0)