Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compatibility issues with the Databricks runtime #506

Closed
imarios opened this issue Mar 11, 2021 · 2 comments · Fixed by #699
Closed

Fix compatibility issues with the Databricks runtime #506

imarios opened this issue Mar 11, 2021 · 2 comments · Fixed by #699

Comments

@imarios
Copy link
Contributor

imarios commented Mar 11, 2021

Fix is shown here: dsabanin@1ce8f0b
Thanks to @dsabanin for the fix!

@jckegelman
Copy link

Hi. I am running into this issue, specifically

java.lang.VerifyError: class frameless.functions.Spark2_4_LambdaVariable overrides final method genCode

when running the example UDF code found on the Using ScalaPB with Spark page. This is similar to what is reported on stackoverflow here. For reference I am using

  • Spark 3.2.1
  • ScalaPB 0.11.10
  • sparksql32-scalapb0_11 1.0.0
  • frameless 0.12.0
  • DBR 10.4 LTS

I reached out to Databricks internal support and here is their response:

The recommendation for resolving this issue is to use the TaggingExpression to express the semantics of the custom expressions in frameless. This expression type is in Apache Spark 2.4.0+ and fully supported in DBR as well.

The Catalyst expression API is considered a part of Spark’s internal development API so it was never intended to be supported as stable public API, as such there’s always a risk for 3rd party code to depend on its details.

There are some DBR-specific code changes that did indeed make Expression.genCode(CodegenContext) final, and the function equivalent to the original genCode(CodegenContext) is named genCodeInternal(CodegenContext). But there’s no way for 3rd party library to override this renamed version because we don’t release Databricks Runtime SDK for external users to compile against.

so this appears related to #300 . They also provided an alternative:

In case frameless doesn’t want to use the TaggingExpression trait, they could also just change their code from overriding genCode to overriding doGenCode instead. Yes there is a bit of waste doing that, but it’s benign and it’d work with DBR just fine.

I tried it with @dsabanin 's fix linked above and confirmed it does avoid the error, although I suspect that particular patch would limit the frameless library's usability with "vanilla" Spark.

Would frameless be open to refactoring the expressions to extend TaggingExpression instead of Expression?

chris-twiner added a commit to chris-twiner/frameless that referenced this issue Apr 11, 2023
@chris-twiner
Copy link
Contributor

fyi - so far I've not found issues using doCodeGen instead, Quality cross compiles from 2.4 with many supported Databricks LTS versions. There are lots of hacks needed to work with DBRs at that level, often having to reproduce interfaces specific the kind the DBRs use, but doCodeGen works throughout. I'm not using typed datasets or udfs in the project so I've not been hit by this.

In OSS genCode wraps doGenCode checking for dupes, adding null checks etc. Frameless wise I don't see any reason doGenCode can't just be used instead, dataset tests pass etc.

I've raised pr 700 for that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants