-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new JdbcIO read/write params to Scio #4820
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4820 +/- ##
==========================================
+ Coverage 62.42% 62.53% +0.10%
==========================================
Files 280 281 +1
Lines 10406 10431 +25
Branches 773 781 +8
==========================================
+ Hits 6496 6523 +27
+ Misses 3910 3908 -2
|
case readOpts: JdbcReadOptions[_] => | ||
jdbcIoId(readOpts.connectionOptions, readOpts.query) | ||
case writeOpts: JdbcWriteOptions[_] => | ||
jdbcIoId(writeOpts.connectionOptions, writeOpts.statement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nicer like this!
outputParallelization: Boolean = JdbcIoOptions.DefaultOutputParallelization | ||
outputParallelization: Boolean = JdbcIoOptions.DefaultOutputParallelization, | ||
dataSourceProviderFn: () => DataSource = null, | ||
configOverride: Read[T] => Read[T] = identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we agree to apply this convention on all IOs for 0.13 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RustedBones it would be good! We need to be consistent across all APIs
def getWriteOptions(opts: CloudSqlOptions): JdbcWriteOptions[String] = | ||
JdbcWriteOptions[String]( | ||
connectionOptions = getConnectionOptions(opts), | ||
statement = "INSERT INTO <this> VALUES( ?, ? ..?)" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks this s unused so far
var expectedTransform: BJdbcIO.Read[String] = null | ||
sc.jdbcSelect[String]( | ||
getDefaultReadOptions(opts).copy(configOverride = r => { | ||
expectedTransform = r.withQuery("overridden query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get back the query instead on memorizing the transform in a var ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can. This was the simplest code, otherwise we would need to match transform by type. Which is not difficult either :)
namespace: String = null, | ||
configOverride: beam.DatastoreV1.Read => beam.DatastoreV1.Read = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually defined the default parameters in the companion object
namespace: String = ReadParam.DefaultNamespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we need null
as a constant? I think nulls are different from other optional parameters. It's just simpler without constants if they are null
def datastore( | ||
projectId: String, | ||
query: Query, | ||
namespace: String = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and reuse it there
namespace: String = DatastoreIO.ReadParam.DefaultNamespace
@@ -66,7 +64,7 @@ object JdbcIO { | |||
final case class JdbcSelect[T: Coder](readOptions: JdbcReadOptions[T]) extends JdbcIO[T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we've not defined the constructor for JdbcIOs
properly. Here we should only have data that allows to identify the target destination (required to distinguish the mocked IO basically), so the connection option and the query.
All the other param should be passed as ReadP
or WriteP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also noted that it stands out from other IOs. Will change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do in another PR. I think we have to check some other IOs too (I recall CsvIO has the same issue)
outputParallelization: Boolean = JdbcIoOptions.DefaultOutputParallelization | ||
outputParallelization: Boolean = JdbcIoOptions.DefaultOutputParallelization, | ||
dataSourceProviderFn: () => DataSource = null, | ||
configOverride: Read[T] => Read[T] = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we pass identity
instead ?
@@ -104,38 +144,42 @@ final case class JdbcSelect[T: Coder](readOptions: JdbcReadOptions[T]) extends J | |||
EmptyTap | |||
} | |||
|
|||
final case class JdbcWrite[T](writeOptions: JdbcWriteOptions[T]) extends JdbcIO[T] { | |||
final case class JdbcWrite[T](opts: JdbcConnectionOptions, statement: String) extends JdbcIO[T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can think of merging the read/write into a single IO that now they have the same signature. Probably for a next step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a good thing
|
||
/** Enhanced version of [[com.spotify.scio.values.SCollection SCollection]] with JDBC methods. */ | ||
final class JdbcSCollectionOps[T](private val self: SCollection[T]) extends AnyVal { | ||
|
||
/** Save this SCollection as a JDBC database. */ | ||
@deprecated("Use another overload with multiple parameters") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the since="0.13.0" param
def saveAsJdbc( | ||
connectionOptions: JdbcConnectionOptions, | ||
statement: String, | ||
preparedStatementSetter: (T, PreparedStatement) => Unit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd curry this one as this is the element operation (like foreach)
def jdbcSelect[T: ClassTag: Coder]( | ||
connectionOptions: JdbcConnectionOptions, | ||
query: String, | ||
rowMapper: ResultSet => T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd curry this one as this is a map
element operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit for the example code
scio-examples/src/main/scala/com/spotify/scio/examples/extra/CloudSqlExample.scala
Outdated
Show resolved
Hide resolved
scio-examples/src/main/scala/com/spotify/scio/examples/extra/CloudSqlExample.scala
Outdated
Show resolved
Hide resolved
…loudSqlExample.scala Co-authored-by: Michel Davit <micheld@spotify.com>
…loudSqlExample.scala Co-authored-by: Michel Davit <micheld@spotify.com>
Implementing #4751