Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java][CI] Enable JDK 21 #36994

Closed
danepitkin opened this issue Aug 2, 2023 · 11 comments · Fixed by #38219
Closed

[Java][CI] Enable JDK 21 #36994

danepitkin opened this issue Aug 2, 2023 · 11 comments · Fixed by #38219
Assignees
Labels
Milestone

Comments

@danepitkin
Copy link
Member

Describe the enhancement requested

Java 21 is the next long-term support (LTS) version after Java 17 and is scheduled to be released in September 2023.

Component(s)

Java

@danepitkin danepitkin added this to the 14.0.0 milestone Aug 3, 2023
@danepitkin danepitkin added the Priority: Blocker Marks a blocker for the release label Sep 19, 2023
@jarohen
Copy link
Contributor

jarohen commented Sep 28, 2023

I was getting Unhandled java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

Thankfully fixed in #35053 before I got here (but not yet released).

@danepitkin danepitkin changed the title [Java] Support Java 21 [Java] Support Java 21 in CI Oct 3, 2023
@raulcd
Copy link
Member

raulcd commented Oct 10, 2023

@davisusanibar @danepitkin @lidavidm what is the status of this one? Is this a release blocker?

@danepitkin
Copy link
Member Author

We can push it to Arrow v15. It would have been really nice to verify Java 21 in CI, but the docker image from Eclipse Temurin is still not ready.

@danepitkin danepitkin changed the title [Java] Support Java 21 in CI [Java][CI] Enable Java 21 Oct 10, 2023
@danepitkin danepitkin changed the title [Java][CI] Enable Java 21 [Java][CI] Enable JDK 21 Oct 10, 2023
@danepitkin danepitkin modified the milestones: 14.0.0, 15.0.0 Oct 10, 2023
@danepitkin danepitkin removed the Priority: Blocker Marks a blocker for the release label Oct 10, 2023
@danepitkin
Copy link
Member Author

Actually, Eclipse Temurin JDK is starting to be published now. Docker images will take a little bit longer, but maybe it will be ready in time! 🤞

@danepitkin danepitkin modified the milestones: 15.0.0, 14.0.0 Oct 11, 2023
@raulcd raulcd added the Priority: Blocker Marks a blocker for the release label Oct 13, 2023
raulcd pushed a commit that referenced this issue Oct 17, 2023
### Rationale for this change

Verify JDK 21 in CI in time for the Arrow v14 release.

### What changes are included in this PR?

* Bump latest Java version from 20 -> 21 in CI

### Are these changes tested?

Yes, via CI.

### Are there any user-facing changes?

No.
* Closes: #36994

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
raulcd pushed a commit that referenced this issue Oct 17, 2023
### Rationale for this change

Verify JDK 21 in CI in time for the Arrow v14 release.

### What changes are included in this PR?

* Bump latest Java version from 20 -> 21 in CI

### Are these changes tested?

Yes, via CI.

### Are there any user-facing changes?

No.
* Closes: #36994

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
### Rationale for this change

Verify JDK 21 in CI in time for the Arrow v14 release.

### What changes are included in this PR?

* Bump latest Java version from 20 -> 21 in CI

### Are these changes tested?

Yes, via CI.

### Are there any user-facing changes?

No.
* Closes: apache#36994

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
dongjoon-hyun pushed a commit to apache/spark that referenced this issue Nov 4, 2023
### What changes were proposed in this pull request?
This pr upgrade Apache Arrow from 13.0.0 to 14.0.0.

### Why are the changes needed?
The Apache Arrow 14.0.0 release brings a number of enhancements and bug fixes.
‎
In terms of bug fixes, the release addresses several critical issues that were causing failures in integration jobs with Spark([GH-36332](apache/arrow#36332)) and problems with importing empty data arrays([GH-37056](apache/arrow#37056)). It also optimizes the process of appending variable length vectors([GH-37829](apache/arrow#37829)) and includes C++ libraries for MacOS AARCH 64 in Java-Jars([GH-38076](apache/arrow#38076)).
‎
The new features and improvements focus on enhancing the handling and manipulation of data. This includes the introduction of DefaultVectorComparators for large types([GH-25659](apache/arrow#25659)), support for extended expressions in ScannerBuilder([GH-34252](apache/arrow#34252)), and the exposure of the VectorAppender class([GH-37246](apache/arrow#37246)).
‎
The release also brings enhancements to the development and testing process, with the CI environment now using JDK 21([GH-36994](apache/arrow#36994)). In addition, the release introduces vector validation consistent with C++, ensuring consistency across different languages([GH-37702](apache/arrow#37702)).
‎
Furthermore, the usability of VarChar writers and binary writers has been improved with the addition of extra input methods([GH-37705](apache/arrow#37705)), and VarCharWriter now supports writing from `Text` and `String`([GH-37706](apache/arrow#37706)). The release also adds typed getters for StructVector, improving the ease of accessing data([GH-37863](apache/arrow#37863)).

The full release notes as follows:
- https://arrow.apache.org/release/14.0.0.html

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43650 from LuciferYang/arrow-14.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
### Rationale for this change

Verify JDK 21 in CI in time for the Arrow v14 release.

### What changes are included in this PR?

* Bump latest Java version from 20 -> 21 in CI

### Are these changes tested?

Yes, via CI.

### Are there any user-facing changes?

No.
* Closes: apache#36994

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
### Rationale for this change

Verify JDK 21 in CI in time for the Arrow v14 release.

### What changes are included in this PR?

* Bump latest Java version from 20 -> 21 in CI

### Are these changes tested?

Yes, via CI.

### Are there any user-facing changes?

No.
* Closes: apache#36994

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
@Midhunpottammal
Copy link

while enabling arrow with spark in java 21 unhandled java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available . is this issue fixed or how to proceed

@raulcd
Copy link
Member

raulcd commented Mar 5, 2024

@Midhunpottammal can you share more details? which version of Arrow and spark are you using and the stack trace.

@Midhunpottammal
Copy link

Midhunpottammal commented Mar 5, 2024

@Midhunpottammal can you share more details? which version of Arrow and spark are you using and the stack trace.
@raulcd
java -version

java version "21.0.2" 2024-01-16 LTS
Java(TM) SE Runtime Environment (build 21.0.2+13-LTS-58)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.2+13-LTS-58, mixed mode, sharing)

Spark - Version

pyspark==3.5.0

pyarrow -Version

pyarrow 15.0.0

PySpark code

import time
import pandas as pd
from pyspark.sql import SparkSession
extra_java_options = os.getenv("SPARK_EXECUTOR_EXTRA_JAVA_OPTIONS", "")
spark = SparkSession.builder \
    .appName("ArrowPySparkExample") \
    .getOrCreate()
spark.conf.set("Dio.netty.tryReflectionSetAccessible", "true")
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
pdf = pd.DataFrame(["midhun"])
df = spark.createDataFrame(pdf)
result_pdf = df.select("*").toPandas()

Error Log
TaskContextImpl: Error in TaskCompletionListener
java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (49152)
Allocator(toArrowBatchIterator) 0/49152/49152/9223372036854775807 (res/actual/peak/limit)

at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:476)
at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.$anonfun$new$2(ArrowConverters.scala:97)
at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.$anonfun$new$2$adapted(ArrowConverters.scala:95)
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:132)
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:144)
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:144)
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:199)
at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:144)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:137)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:172)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)

24/03/05 14:02:59 ERROR Executor: Exception in task 11.0 in stage 0.0 (TID 11)
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available
at org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)

Full Stack Trace link

link

@iskandari
Copy link

Facing the same issue:

pyarrow 15.0.0
pyspark 3.5.0
java = openjdk/21.0.1
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

@danepitkin
Copy link
Member Author

I don't think you can set spark.conf.set("Dio.netty.tryReflectionSetAccessible", "true") at runtime here. Try setting this configuration option before you run pyspark.

@Midhunpottammal
Copy link

Midhunpottammal commented Mar 7, 2024

@iskandari @danepitkin @raulcd I managed to get Arrow working with a lower version of Java in Spark 3.5.0. Here's my stack:

pyarrow==15.0.0
pyspark==3.5.0
java == Java(TM) SE Runtime Environment (build 17.0.10+11-LTS-240)

When I try to move to Java version 21, I encounter the same error

@danepitkin
Copy link
Member Author

It turns out Spark 3.X does not support Java 21, but Spark 4.0 does. Resolved issue here #40287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants