Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a typed col function for creating column references #187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

iravid
Copy link
Contributor

@iravid iravid commented Sep 22, 2017

Resolves #186.

Would be a good idea to wait for #110 before merging this to avoid conflict on SelectTests.scala.

@iravid
Copy link
Contributor Author

iravid commented Sep 22, 2017

Also refs #164

@codecov-io
Copy link

codecov-io commented Sep 22, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@048d06c). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #187   +/-   ##
=========================================
  Coverage          ?   94.95%           
=========================================
  Files             ?       35           
  Lines             ?      674           
  Branches          ?       11           
=========================================
  Hits              ?      640           
  Misses            ?       34           
  Partials          ?        0
Impacted Files Coverage Δ
...ss/functions/PartiallyConstructedTypedColumn.scala 100% <100%> (ø)
...t/src/main/scala/frameless/functions/package.scala 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 048d06c...897e499. Read the comment docs.

@@ -18,9 +18,10 @@ class SelectTests extends TypedDatasetSuite {
val A = dataset.col[A]('a)

val dataset2 = dataset.select(A).collect().run().toVector
val symDataset2 = dataset.select(functions.col('a)).collect().run().toVector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, this works because inference fixes T from the expected type of select. But does it scale to complex expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yes, this will probably only work when used directly in the args to select. Not sure if it’ll work with selectMany, that needs to be tested. I plan on adding a similar assertion to all of the tests here.

An idea to deal with situations in which you have to specify T is to use https://tpolecat.github.io/2015/07/30/infer.html.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. but if T is Tuple3[Int, String, Double] it's not gonna look pretty :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I have another idea - will push in a few minutes.

@iravid
Copy link
Contributor Author

iravid commented Sep 22, 2017

@OlivierBlanvillain have another look; this approach should allow to keep the column reference in a value, for example, and then have the actual TypedColumn constructed when using select.

@iravid
Copy link
Contributor Author

iravid commented Sep 22, 2017

Ah, but this approach doesn't support expressions like val c = functions.col('a) + 1. Tradeoffs, tradeoffs...

@iravid
Copy link
Contributor Author

iravid commented Sep 22, 2017

What's the reasoning behind carrying around the dataset type in the typed columns?

@OlivierBlanvillain
Copy link
Contributor

If it was just expr: TypedColumn[Boolean] we could't statically check that ds.select(expr) does not uses some columns that are not defined on ds. #162 has an interesting discussion with an alternative designed based around an implicit that would be the perfect solution if we had implicit function types.

@iravid
Copy link
Contributor Author

iravid commented Sep 22, 2017

That's true, but how about moving the column existence evidence to the methods on TypedDataset?

The reasoning is that a TypedColumn can freely exist outside the context of a typed dataset, e.g. as a literal expression. The evidence is only needed when using it as a projection on a dataset.

@OlivierBlanvillain
Copy link
Contributor

OlivierBlanvillain commented Sep 22, 2017

We didn't consider that, but it sounds isomorphic given than we can already do something like def a(ds)(implicits) = ds('a)... Also I'm not sure it would work out with the requirements of #175.

@iravid
Copy link
Contributor Author

iravid commented Sep 22, 2017

It is isomorphic up to the requirement of having ds at hand ;)

I will open a separate PR to experiment with moving the evidence to ds.select et al and we can decide separately if this PR is good enough to be useful given its limited inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a way to construct a TypedColumn reference from a Symbol
4 participants