Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a typed col function for creating column references #187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions dataset/src/main/scala/frameless/functions/package.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ package frameless

import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.catalyst.expressions.Literal
import org.apache.spark.sql.functions.{ col => sparkCol }
import shapeless.Witness

package object functions extends Udf with UnaryFunctions {
object aggregate extends AggregateFunctions
Expand All @@ -17,4 +19,12 @@ package object functions extends Udf with UnaryFunctions {
new TypedColumn(expr)
}
}

def col[T, A](column: Witness.Lt[Symbol])(
implicit
exists: TypedColumn.Exists[T, column.T, A],
encoder: TypedEncoder[A]): TypedColumn[T, A] = {
val untypedExpr = sparkCol(column.value.name).as[A](TypedExpressionEncoder[A])
new TypedColumn[T, A](untypedExpr)
}
}
3 changes: 2 additions & 1 deletion dataset/src/test/scala/frameless/SelectTests.scala
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,10 @@ class SelectTests extends TypedDatasetSuite {
val A = dataset.col[A]('a)

val dataset2 = dataset.select(A).collect().run().toVector
val symDataset2 = dataset.select(functions.col('a)).collect().run().toVector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, this works because inference fixes T from the expected type of select. But does it scale to complex expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yes, this will probably only work when used directly in the args to select. Not sure if it’ll work with selectMany, that needs to be tested. I plan on adding a similar assertion to all of the tests here.

An idea to deal with situations in which you have to specify T is to use https://tpolecat.github.io/2015/07/30/infer.html.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. but if T is Tuple3[Int, String, Double] it's not gonna look pretty :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I have another idea - will push in a few minutes.

val data2 = data.map { case X4(a, _, _, _) => a }

dataset2 ?= data2
(dataset2 ?= data2) && (symDataset2 ?= data2)
}

check(forAll(prop[Int, Int, Int, Int] _))
Expand Down