Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rdd related helpers #132

Merged
merged 6 commits into from
Feb 24, 2022
Merged

Rdd related helpers #132

merged 6 commits into from
Feb 24, 2022

Conversation

Jolanrensen
Copy link
Collaborator

adds sc: JavaSparkSession to KSparkSession

adds (Java)RDD.toDS() functions

includes tests

@Jolanrensen
Copy link
Collaborator Author

So, can I merge it? :) @asm0dey

inline fun <reified T> List<T>.toDS() = toDS(spark)
inline fun <reified T> Array<T>.toDS() = spark.dsOf(*this)
inline fun <reified T> dsOf(vararg arg: T) = spark.dsOf(*arg)
inline fun <reified T> RDD<T>.toDS() = toDS(spark)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should T be Serializable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't hold for all right? For instance primitives, are those Serializable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there can't be generic with primitive inside

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, so Int and other primitves that do not implement Serializable are allowed when having inline fun <reified T : Serializable> RDD<T>.toDS() = toDS(spark)? That's news to me. But useful!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they will be boxed to serializable Integer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'll add it to broadcast too!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no wait that doesn't hold. You can broadcast a non Serializable List for instance. And... you can also make an RDD of a non Serializable List

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List itself is not serializable, but its implementations are

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, but enforcing it to be Serializable requires users to for instance wrap their list like ArrayList(listOf(1, 2, 3)) which is not ideal...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nono, if "just" listOf doesn't work — let's not implement it.

@@ -593,10 +598,63 @@ class ApiTest : ShouldSpec({
it.nullable() shouldBe true
}
}
should("Easily convert a (Java)RDD to a Dataset") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this huge test should be split into smaller ones?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@asm0dey
Copy link
Contributor

asm0dey commented Feb 23, 2022

You will hate me, @Jolanrensen, but I've added more comments :)

@Jolanrensen
Copy link
Collaborator Author

Haha they're always welcome ;)

Added kotlin official code style prop for later
@Jolanrensen
Copy link
Collaborator Author

Alright, after this is done checking I'll merge it.

@Jolanrensen Jolanrensen merged commit 165c1b0 into spark-3.2 Feb 24, 2022
@Jolanrensen Jolanrensen added this to the 1.1.0 milestone May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants