|
| 1 | +--- |
| 2 | +title: "Journey to a N-fold improvement due to a stealthy scalability issue" |
| 3 | +date: 2022-11-16T00:00:00Z |
| 4 | +categories: ['performance', 'benchmarking', 'methodology'] |
| 5 | +summary: 'Learn about fixing a severe JIT scalability issue' |
| 6 | +image: 'type_pollution.png' |
| 7 | +related: [''] |
| 8 | +authors: |
| 9 | + - Francesco Nigro |
| 10 | +--- |
| 11 | + |
| 12 | +This article is a detailed and complementary content to *Quarkus Insights #107: Quarkus Runtime performance - a peek into JVM internals*: |
| 13 | +++++ |
| 14 | +<div class="imageblock right text-center"> |
| 15 | +<div class="content" style="text-align: center;"> |
| 16 | +<iframe width="560" height="315" src="https://www.youtube.com/embed/G40VfIsnCdo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> |
| 17 | +</div> |
| 18 | +</div> |
| 19 | +++++ |
| 20 | +This episode treats a breed of issues caused by the scary (was registered at Halloween on purpose) JDK issue; https://bugs.openjdk.org/browse/JDK-8180450["JDK-8180450: secondary_super_cache does not scale well"] |
| 21 | +that can causes massive performance degradation on several middleware frameworks e.g. Netty, Vert-x, Mutiny, Hibernate, Quarkus... |
| 22 | + |
| 23 | +== What's JDK-8180450? |
| 24 | + |
| 25 | +For those who haven't yet watched https://youtu.be/G40VfIsnCdo[Quarkus Insights #107], `JDK-8180450` is a severe scalability penalty happening while |
| 26 | +performing type checks (`instanceof`, `checkcast` and similar) against interfaces (so-called secondary super types). |
| 27 | + |
| 28 | +This scalability issue is triggered by massive concurrent updates to `Klass::_secondary_super_cache` |
| 29 | +from multiple threads, which in turn causes https://en.wikipedia.org/wiki/False_sharing[False Sharing] with its surrounding fields eg `Klass::_secondary_supers`. |
| 30 | + |
| 31 | +This https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/share/oops/klass.hpp#L133[JDK 11 snippet] shows both fields and |
| 32 | +what they are used for: |
| 33 | + |
| 34 | +{{< highlight cpp >}} |
| 35 | +class Klass : public Metadata { |
| 36 | + // ... |
| 37 | + // Cache of last observed secondary supertype |
| 38 | + Klass* _secondary_super_cache; |
| 39 | + // Array of all secondary supertypes |
| 40 | + Array<Klass*>* _secondary_supers; |
| 41 | + // ... |
| 42 | +{{< / highlight >}} |
| 43 | + |
| 44 | +To make it more concrete: |
| 45 | + |
| 46 | +* https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Integer.html[`Integer`] 's implement both `Comparable` and `Serializable` |
| 47 | +* `_secondary_supers` contains both `Comparable` and `Serializable`, |
| 48 | +* `_secondary_super_cache` is used byt he JVM like this: |
| 49 | +{{< highlight java >}} |
| 50 | + Object o = new Integer(1); |
| 51 | + // _seconday_super_cache is empty |
| 52 | + // search Comparable on [ Comparable, Serializable ] |
| 53 | + if (o instanceof Comparable) { |
| 54 | + // _seconday_super_cache := Comparable -> cache HIT! |
| 55 | + useComparable((Comparable) o); |
| 56 | + } |
| 57 | +{{< / highlight >}} |
| 58 | +The JVM does not need to search for `Comparable` in the `_secondary_supers` array when evaluating `checkcast` in the `useComparable`, instead it can |
| 59 | +use the `_secondary_super_cache` field, saving useless work. |
| 60 | + |
| 61 | +So, *What causes massive amounts of updates to the cache field*? |
| 62 | + |
| 63 | +The answer is simpler then it seems: making it ping-pong between multiple values contained in `_secondary_supers`! |
| 64 | + |
| 65 | +The code below shows an example of this dangerous code pattern: |
| 66 | +{{< highlight java>}} |
| 67 | + // o.getClass() == Integer.class |
| 68 | + public static void compareAndSerialize(Object o) { |
| 69 | + if (o instanceof Comparable) { |
| 70 | + // _seconday_super_cache := Comparable |
| 71 | + useComparable((Comparable) o); |
| 72 | + } |
| 73 | + if (o instanceof Serializable) { |
| 74 | + // _seconday_super_cache := Serializable |
| 75 | + useSerializable((Serializable) o); |
| 76 | + } |
| 77 | + } |
| 78 | +{{< / highlight >}} |
| 79 | +The comments in the code show what happens if this method is called by a single thread, causing two `_secondary_super_cache` |
| 80 | +invalidations for every `compareAndSerializable` method execution, but |
| 81 | + |
| 82 | +*What happens if there are more than one thread*? |
| 83 | + |
| 84 | +`Integer::_secondary_super_cache` is shared among all threads, leading to an increased chance to invalidate it at *every type check* (including `checkcast` 's on `useComparable`/`useSerializable`!). |
| 85 | + |
| 86 | +== Why's that scary? |
| 87 | + |
| 88 | +As mentioned on https://youtu.be/G40VfIsnCdo[Quarkus Insights #107], the Red Hat Middleware Team uses different versions of |
| 89 | +https://www.techempower.com/[Techempower Benchmark] to validate https://quarkus.io[Quarkus]'s performance regressions. |
| 90 | +Hitting such scalability bottleneck in our lab has caused a performance degradation quite significant while running on our `Intel(R) Xeon(R) Gold 5218 CPU` with 64 logical cores (32 without Hyper-Threading, splitted in 2 https://en.wikipedia.org/wiki/Non-uniform_memory_access[NUMA] nodes): |
| 91 | + |
| 92 | +image::64_cores.png[64 cores plaintext results] |
| 93 | + |
| 94 | +But what about running the same test with just 2 cores? |
| 95 | + |
| 96 | +image::2_cores.png[2 cores plaintext results] |
| 97 | + |
| 98 | +As shown, even with few cores, if the mentioned bad code pattern is hit in the hot path, the effects are still dramatic! |
| 99 | + |
| 100 | +=== A stealthy performance killer! |
| 101 | + |
| 102 | +The scariest part has yet to come: what we discovered while investigating the https://github.com/netty/netty/issues/12708[most severe case] |
| 103 | +affecting our Quarkus results, is that "traditional" Java profiling tools didn't seem to help diagnosing it |
| 104 | +(eg https://access.redhat.com/documentation/en-us/openjdk/11/html/using_jdk_flight_recorder_with_openjdk/index[Java Flight Recorder], https://github.com/jvm-profiling-tools/async-profiler[Async Profiler]). |
| 105 | +Furthermore, a few weeks after the Quarkus Insights recording, Netflix has published an https://netflixtechblog.com/seeing-through-hardware-counters-a-journey-to-threefold-performance-increase-2721924a2822[article] |
| 106 | +highlighting the same problem while using traditional Java Profilers. |
| 107 | + |
| 108 | +Specifically, https://access.redhat.com/documentation/en-us/openjdk/11/html/using_jdk_flight_recorder_with_openjdk/index[Java Flight Recorder] wasn't reporting |
| 109 | +any guilty stack trace pointing to the source of the issue, while https://github.com/jvm-profiling-tools/async-profiler[Async Profiler] was reporting |
| 110 | +broken stack traces (with the right cost/samples count). In both cases the bad code pattern was unnoticed, although |
| 111 | +the latter provided some hint that something incorrect was occurring. |
| 112 | + |
| 113 | +image::async_profiler_2_8_broken.png[Async Profiler 2.8 plaintext CPU flamegraph] |
| 114 | + |
| 115 | +In the above https://netflixtechblog.com/java-in-flames-e763b3d32166[flamegraph], the bottom left of the graph shows |
| 116 | +some costy methods, but failing to report the correct methods: not useful to fix it! |
| 117 | + |
| 118 | +We've further investigated and discovered that the JFR mechanism to collect stack traces |
| 119 | +(implemented as http://psy-lob-saw.blogspot.com/2016/06/the-pros-and-cons-of-agct.html[AsyncGetCallTrace]) |
| 120 | +isn't able to collect meaningful ones. This is due to https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5486[JDK's type check code], |
| 121 | +which modifies the stack size and by consequence the x86 register used to track the https://en.wikipedia.org/wiki/Stack_register[Stack Pointer] |
| 122 | +(more on this https://github.com/jvm-profiling-tools/async-profiler/issues/673#issuecomment-1304504745[Async Profiler issue's comment]). |
| 123 | + |
| 124 | +Thanks to this investigation, a recent https://bugs.openjdk.org/browse/JDK-8287425[JDK change] fixing the mentioned behaviour |
| 125 | +(which has been https://github.com/openjdk/jdk11u-dev/pull/1524[back-ported to JDK 11]) , |
| 126 | +making this issue more easily spotted with existing (`AsyncGetCallTrace`-based) Java Profilers: a proper "Community Happy End". |
| 127 | + |
| 128 | +==== ...and False Sharing? |
| 129 | + |
| 130 | +In addition to the above issue, https://en.wikipedia.org/wiki/False_sharing[False Sharing] confuses profiling tools even further. |
| 131 | + |
| 132 | +For every invalidation of `Klass::_secondary_super_cache`, a search over `Klass::_secondary_supers` is performed, |
| 133 | +causing a severe penalty (if stored in the same cache-line of the "noisy" `Klass` field) and impacting other innocent code paths too: |
| 134 | + |
| 135 | +{{< highlight java>}} |
| 136 | + // o.getClass() == Integer.class |
| 137 | + // _secondary_super_cache always != CharSequence |
| 138 | + // slow path search on _secondary_supers := [ Comparable, Serializable ] |
| 139 | + if (o instanceof CharSequence) { |
| 140 | + // never happen |
| 141 | + } else { |
| 142 | + // always happen |
| 143 | + } |
| 144 | +{{< / highlight >}} |
| 145 | +The snippet above shows that performing a type check for a non-implemented interface causes the JVM to perform a slow path search |
| 146 | +over `_secondary_supers`. This in turn forces loading of the related field and causes |
| 147 | +a ping-pong of cache coherency traffic. This has the effect of slowing down the innocent load instruction and causes a profiler |
| 148 | +to highlight it as a cpu bottleneck. |
| 149 | + |
| 150 | +The issue is detailed in this https://youtu.be/G40VfIsnCdo?t=1411[Quarkus Insights part] |
| 151 | + |
| 152 | +=== OpenJDK compiler is (too) good |
| 153 | + |
| 154 | +Some noticeable optimization strategies of OpenJDK are: |
| 155 | + |
| 156 | +* https://wiki.openjdk.org/display/HotSpot/Inlining[Inlining] |
| 157 | +* https://wiki.openjdk.org/display/HotSpot/TypeProfile[Type Profile] |
| 158 | + |
| 159 | +Thanks to both optimizations, the JDK Compiler can drop type-check operations preventing the issue from happening. |
| 160 | +Unlucky runtime conditions can then cause such optimizations to fail (as mentioned in https://youtu.be/G40VfIsnCdo?t=1692[this Quarkus Insights part]), |
| 161 | +unexpectedly biting end-users. |
| 162 | + |
| 163 | +This smart compiler behaviour, together with the previous mentioned profiler issues, has meant this issue has lay dormant and unnoticed for some time, |
| 164 | +especially in benchmarking environments, where overly simplistic use cases were leaving the compiler more |
| 165 | +room to optimize and hide it. |
| 166 | + |
| 167 | +== What's next? |
| 168 | +In order to support our developers and the Java Community to diagnose this stealthy and severe performance issue, |
| 169 | +we've developed a https://github.com/raphw/byte-buddy[ByteBuddy]-powered https://github.com/RedHatPerf/type-pollution-agent[Java Agent]. This has helped fix the |
| 170 | +most severe https://bugs.openjdk.org/browse/JDK-8180450?focusedCommentId=14536886&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14536886[code paths] of our products |
| 171 | +(and the widely used https://github.com/netty/netty/issues/12708[Netty Framework]) and hopefully will be used by other |
| 172 | +players of the Java Community to diagnose if their code can be affected by it. |
| 173 | + |
| 174 | +Thanks to the reports of our developers and the rest of Java Community (including Netflix and Intel engineers), |
| 175 | +the https://bugs.openjdk.org/browse/JDK-8180450[scalability JDK issue] now has raised a priority and a fix will be targeted for the Java 21 release. |
| 176 | + |
| 177 | +As a separate effort we've evaluated with the OpenJDK Red Hat team alternatives fixes to the infamous JDK issue as well, ending up rejecting ad-hoc flags to disable such optimization (as it was designed to be) |
| 178 | +because too harmful for existing users, which unconsciously have benefited from it till now. We are trusty that our feedbacks |
| 179 | +will help making a better decision for the Java Community, present and future. |
| 180 | + |
| 181 | +=== Is it still worthy to fix our code? |
| 182 | + |
| 183 | +As usual: *Depends(TM)*. |
| 184 | + |
| 185 | +As mentioned in Quarkus Insights, some fixes have helped to use different and more idiomatic code |
| 186 | +solutions to replace the misuse (often abuse) of the existing type check-based logic, often leading to better performance and, |
| 187 | +moreover, way better code. |
| 188 | + |
| 189 | +In addition, there isn't yet (at the time this article has been published) any mention of back-porting plans, |
| 190 | +meaning that legacy or just slow adopters (and existing users too) will be left behind, running code with suboptimal (at best) performance. |
| 191 | + |
| 192 | +=== but I don't know yet how to use the Agent! |
| 193 | + |
| 194 | +In the next articles of our blog we'll present more concrete examples of fixed code using |
| 195 | +https://github.com/RedHatPerf/type-pollution-agent[the agent] (presented on this https://youtu.be/G40VfIsnCdo?t=2235[Quarkus Insights part] too) |
| 196 | +and the impact of the OpenJDK compiler optimization strategies (briefly mentioned <<_openjdk_compilers_are_too_good,here>>). |
| 197 | + |
| 198 | +== Special Thanks |
| 199 | +==== |
| 200 | +* Andrew Haley |
| 201 | +* Andrew Dinn |
| 202 | +* Rafael Winterhalter |
| 203 | +* Sanne Grinovero |
| 204 | +* Norman Maurer |
| 205 | +* Andrei Pangin |
| 206 | +* Johannes Bechberger |
| 207 | +* Quarkus, Hibernate, Vert-x, Mutiny and Infinispan teams for the patience and the efforts to try the agent and fix our products trusting our tools |
| 208 | +* Netflix & Intel team and their https://netflixtechblog.com/seeing-through-hardware-counters-a-journey-to-threefold-performance-increase-2721924a2822[awesome article] |
| 209 | +mentioned in <<_a_stealthy_performance_killer,this>> section |
| 210 | +==== |
| 211 | + |
| 212 | + |
| 213 | + |
0 commit comments