[Java][CI] Enable JDK 21 #36994

danepitkin · 2023-08-02T18:12:01Z

Describe the enhancement requested

Java 21 is the next long-term support (LTS) version after Java 17 and is scheduled to be released in September 2023.

Component(s)

Java

jarohen · 2023-09-28T16:54:23Z

I was getting Unhandled java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

Thankfully fixed in #35053 before I got here (but not yet released).

raulcd · 2023-10-10T12:52:34Z

@davisusanibar @danepitkin @lidavidm what is the status of this one? Is this a release blocker?

danepitkin · 2023-10-10T16:04:27Z

We can push it to Arrow v15. It would have been really nice to verify Java 21 in CI, but the docker image from Eclipse Temurin is still not ready.

danepitkin · 2023-10-10T16:21:03Z

Actually, Eclipse Temurin JDK is starting to be published now. Docker images will take a little bit longer, but maybe it will be ready in time! 🤞

### Rationale for this change Verify JDK 21 in CI in time for the Arrow v14 release. ### What changes are included in this PR? * Bump latest Java version from 20 -> 21 in CI ### Are these changes tested? Yes, via CI. ### Are there any user-facing changes? No. * Closes: #36994 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>

### Rationale for this change Verify JDK 21 in CI in time for the Arrow v14 release. ### What changes are included in this PR? * Bump latest Java version from 20 -> 21 in CI ### Are these changes tested? Yes, via CI. ### Are there any user-facing changes? No. * Closes: apache#36994 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>

### What changes were proposed in this pull request? This pr upgrade Apache Arrow from 13.0.0 to 14.0.0. ### Why are the changes needed? The Apache Arrow 14.0.0 release brings a number of enhancements and bug fixes. ‎ In terms of bug fixes, the release addresses several critical issues that were causing failures in integration jobs with Spark([GH-36332](apache/arrow#36332)) and problems with importing empty data arrays([GH-37056](apache/arrow#37056)). It also optimizes the process of appending variable length vectors([GH-37829](apache/arrow#37829)) and includes C++ libraries for MacOS AARCH 64 in Java-Jars([GH-38076](apache/arrow#38076)). ‎ The new features and improvements focus on enhancing the handling and manipulation of data. This includes the introduction of DefaultVectorComparators for large types([GH-25659](apache/arrow#25659)), support for extended expressions in ScannerBuilder([GH-34252](apache/arrow#34252)), and the exposure of the VectorAppender class([GH-37246](apache/arrow#37246)). ‎ The release also brings enhancements to the development and testing process, with the CI environment now using JDK 21([GH-36994](apache/arrow#36994)). In addition, the release introduces vector validation consistent with C++, ensuring consistency across different languages([GH-37702](apache/arrow#37702)). ‎ Furthermore, the usability of VarChar writers and binary writers has been improved with the addition of extra input methods([GH-37705](apache/arrow#37705)), and VarCharWriter now supports writing from `Text` and `String`([GH-37706](apache/arrow#37706)). The release also adds typed getters for StructVector, improving the ease of accessing data([GH-37863](apache/arrow#37863)). The full release notes as follows: - https://arrow.apache.org/release/14.0.0.html ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43650 from LuciferYang/arrow-14. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### Rationale for this change Verify JDK 21 in CI in time for the Arrow v14 release. ### What changes are included in this PR? * Bump latest Java version from 20 -> 21 in CI ### Are these changes tested? Yes, via CI. ### Are there any user-facing changes? No. * Closes: apache#36994 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>

Midhunpottammal · 2024-03-05T09:36:21Z

while enabling arrow with spark in java 21 unhandled java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available . is this issue fixed or how to proceed

raulcd · 2024-03-05T09:52:13Z

@Midhunpottammal can you share more details? which version of Arrow and spark are you using and the stack trace.

Midhunpottammal · 2024-03-05T10:07:27Z

@Midhunpottammal can you share more details? which version of Arrow and spark are you using and the stack trace.
@raulcd
java -version

java version "21.0.2" 2024-01-16 LTS
Java(TM) SE Runtime Environment (build 21.0.2+13-LTS-58)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.2+13-LTS-58, mixed mode, sharing)

Spark - Version

pyspark==3.5.0

pyarrow -Version

pyarrow 15.0.0

PySpark code

import time
import pandas as pd
from pyspark.sql import SparkSession
extra_java_options = os.getenv("SPARK_EXECUTOR_EXTRA_JAVA_OPTIONS", "")
spark = SparkSession.builder \
    .appName("ArrowPySparkExample") \
    .getOrCreate()
spark.conf.set("Dio.netty.tryReflectionSetAccessible", "true")
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
pdf = pd.DataFrame(["midhun"])
df = spark.createDataFrame(pdf)
result_pdf = df.select("*").toPandas()

Error Log
TaskContextImpl: Error in TaskCompletionListener
java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (49152)
Allocator(toArrowBatchIterator) 0/49152/49152/9223372036854775807 (res/actual/peak/limit)

at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:476)
at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.$anonfun$new$2(ArrowConverters.scala:97)
at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.$anonfun$new$2$adapted(ArrowConverters.scala:95)
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:132)
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:144)
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:144)
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:199)
at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:144)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:137)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:172)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)

24/03/05 14:02:59 ERROR Executor: Exception in task 11.0 in stage 0.0 (TID 11)
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available
at org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)

Full Stack Trace link

link

iskandari · 2024-03-06T20:59:07Z

Facing the same issue:

pyarrow 15.0.0
pyspark 3.5.0
java = openjdk/21.0.1

java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

danepitkin · 2024-03-06T22:12:15Z

I don't think you can set spark.conf.set("Dio.netty.tryReflectionSetAccessible", "true") at runtime here. Try setting this configuration option before you run pyspark.

Midhunpottammal · 2024-03-07T06:02:14Z

@iskandari @danepitkin @raulcd I managed to get Arrow working with a lower version of Java in Spark 3.5.0. Here's my stack:

pyarrow==15.0.0
pyspark==3.5.0
java == Java(TM) SE Runtime Environment (build 17.0.10+11-LTS-240)

When I try to move to Java version 21, I encounter the same error

danepitkin · 2024-03-08T14:33:20Z

It turns out Spark 3.X does not support Java 21, but Spark 4.0 does. Resolved issue here #40287

danepitkin added the Type: enhancement label Aug 2, 2023

github-actions bot added the Component: Java label Aug 2, 2023

danepitkin added this to the 14.0.0 milestone Aug 3, 2023

danepitkin added the Priority: Blocker Marks a blocker for the release label Sep 19, 2023

jarohen mentioned this issue Sep 26, 2023

Upgrade minimum Java version to 21 xtdb/xtdb#2798

Open

10 tasks

danepitkin mentioned this issue Sep 28, 2023

[Java][CI]: Enable support for JDK21 #37914

Closed

github-actions bot mentioned this issue Sep 28, 2023

GH-36994: [Java][CI] Enable support for JDK21 #37915

Closed

danepitkin changed the title ~~[Java] Support Java 21~~ [Java] Support Java 21 in CI Oct 3, 2023

danepitkin changed the title ~~[Java] Support Java 21 in CI~~ [Java][CI] Enable Java 21 Oct 10, 2023

danepitkin changed the title ~~[Java][CI] Enable Java 21~~ [Java][CI] Enable JDK 21 Oct 10, 2023

danepitkin modified the milestones: 14.0.0, 15.0.0 Oct 10, 2023

danepitkin removed the Priority: Blocker Marks a blocker for the release label Oct 10, 2023

danepitkin modified the milestones: 15.0.0, 14.0.0 Oct 11, 2023

github-actions bot mentioned this issue Oct 11, 2023

GH-36994: [Java] Use JDK 21 in CI #38219

Merged

github-actions bot assigned danepitkin Oct 11, 2023

raulcd added the Priority: Blocker Marks a blocker for the release label Oct 13, 2023

raulcd closed this as completed in #38219 Oct 17, 2023

LuciferYang mentioned this issue Nov 4, 2023

[SPARK-45781][BUILD] Upgrade Arrow to 14.0.0 apache/spark#43650

Closed

pan3793 mentioned this issue Apr 11, 2024

[TASK][EASY] Upgrade Arrow from 12.0.0 to 15.0.2 apache/kyuubi#6293

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java][CI] Enable JDK 21 #36994

[Java][CI] Enable JDK 21 #36994

danepitkin commented Aug 2, 2023

jarohen commented Sep 28, 2023 •

edited

Loading

raulcd commented Oct 10, 2023

danepitkin commented Oct 10, 2023

danepitkin commented Oct 10, 2023

Midhunpottammal commented Mar 5, 2024

raulcd commented Mar 5, 2024

Midhunpottammal commented Mar 5, 2024 •

edited

Loading

iskandari commented Mar 6, 2024

danepitkin commented Mar 6, 2024

Midhunpottammal commented Mar 7, 2024 •

edited

Loading

danepitkin commented Mar 8, 2024

[Java][CI] Enable JDK 21 #36994

[Java][CI] Enable JDK 21 #36994

Comments

danepitkin commented Aug 2, 2023

Describe the enhancement requested

Component(s)

jarohen commented Sep 28, 2023 • edited Loading

raulcd commented Oct 10, 2023

danepitkin commented Oct 10, 2023

danepitkin commented Oct 10, 2023

Midhunpottammal commented Mar 5, 2024

raulcd commented Mar 5, 2024

Midhunpottammal commented Mar 5, 2024 • edited Loading

iskandari commented Mar 6, 2024

danepitkin commented Mar 6, 2024

Midhunpottammal commented Mar 7, 2024 • edited Loading

danepitkin commented Mar 8, 2024

jarohen commented Sep 28, 2023 •

edited

Loading

Midhunpottammal commented Mar 5, 2024 •

edited

Loading

Midhunpottammal commented Mar 7, 2024 •

edited

Loading