Added new JdbcIO read/write params to Scio #4820

shnapz · 2023-05-24T18:04:36Z

Implementing #4751

codecov · 2023-05-24T18:27:10Z

Codecov Report

Merging #4820 (be06a30) into main (924282d) will increase coverage by 0.10%.
The diff coverage is 75.78%.

❗ Current head be06a30 differs from pull request most recent head e583a85. Consider uploading reports for the commit e583a85 to get more accurate results

@@            Coverage Diff             @@
##             main    #4820      +/-   ##
==========================================
+ Coverage   62.42%   62.53%   +0.10%     
==========================================
  Files         280      281       +1     
  Lines       10406    10431      +25     
  Branches      773      781       +8     
==========================================
+ Hits         6496     6523      +27     
+ Misses       3910     3908       -2

Impacted Files	Coverage Δ
...fy/scio/coders/instances/kryo/AvroSerializer.scala	`100.00% <ø> (ø)`
...c/main/scala/com/spotify/scio/io/FileStorage.scala	`97.72% <ø> (-0.06%)`	⬇️
...main/scala/com/spotify/scio/bigquery/package.scala	`100.00% <ø> (ø)`
...scala/com/spotify/scio/datastore/DatastoreIO.scala	`9.09% <0.00%> (-7.58%)`	⬇️
...m/spotify/scio/jdbc/syntax/SCollectionSyntax.scala	`33.33% <25.00%> (-66.67%)`	⬇️
...m/spotify/scio/jdbc/syntax/ScioContextSyntax.scala	`80.00% <66.66%> (-20.00%)`	⬇️
.../src/main/scala/com/spotify/scio/jdbc/JdbcIO.scala	`72.72% <75.00%> (+45.22%)`	⬆️
...cala/com/spotify/scio/coders/KryoAtomicCoder.scala	`69.60% <100.00%> (-0.72%)`	⬇️
...otify/scio/coders/LowPriorityCoderDerivation.scala	`97.43% <100.00%> (ø)`
...com/spotify/scio/coders/instances/AvroCoders.scala	`84.61% <100.00%> (+1.75%)`	⬆️
... and 9 more

... and 2 files with indirect coverage changes

RustedBones · 2023-05-25T08:08:38Z

scio-jdbc/src/main/scala/com/spotify/scio/jdbc/JdbcIO.scala

+    case readOpts: JdbcReadOptions[_] =>
+      jdbcIoId(readOpts.connectionOptions, readOpts.query)
+    case writeOpts: JdbcWriteOptions[_] =>
+      jdbcIoId(writeOpts.connectionOptions, writeOpts.statement)


nicer like this!

RustedBones · 2023-05-25T08:16:41Z

scio-jdbc/src/main/scala/com/spotify/scio/jdbc/JdbcOptions.scala

-  outputParallelization: Boolean = JdbcIoOptions.DefaultOutputParallelization
+  outputParallelization: Boolean = JdbcIoOptions.DefaultOutputParallelization,
+  dataSourceProviderFn: () => DataSource = null,
+  configOverride: Read[T] => Read[T] = identity


Do we agree to apply this convention on all IOs for 0.13 ?

@RustedBones it would be good! We need to be consistent across all APIs

RustedBones · 2023-05-30T12:08:25Z

scio-jdbc/src/test/scala/com/spotify/scio/jdbc/JdbcIOTests.scala

+  def getWriteOptions(opts: CloudSqlOptions): JdbcWriteOptions[String] =
+    JdbcWriteOptions[String](
+      connectionOptions = getConnectionOptions(opts),
+      statement = "INSERT INTO <this> VALUES( ?, ? ..?)"
+    )


Looks this s unused so far

RustedBones · 2023-05-30T12:10:26Z

scio-jdbc/src/test/scala/com/spotify/scio/jdbc/JdbcIOTests.scala

+    var expectedTransform: BJdbcIO.Read[String] = null
+    sc.jdbcSelect[String](
+      getDefaultReadOptions(opts).copy(configOverride = r => {
+        expectedTransform = r.withQuery("overridden query")


can we get back the query instead on memorizing the transform in a var ?

Yeah, we can. This was the simplest code, otherwise we would need to match transform by type. Which is not difficult either :)

RustedBones · 2023-05-31T07:35:02Z

scio-google-cloud-platform/src/main/scala/com/spotify/scio/datastore/DatastoreIO.scala

+    namespace: String = null,
+    configOverride: beam.DatastoreV1.Read => beam.DatastoreV1.Read = null


We usually defined the default parameters in the companion object

namespace: String = ReadParam.DefaultNamespace

do you think we need null as a constant? I think nulls are different from other optional parameters. It's just simpler without constants if they are null

RustedBones · 2023-05-31T07:35:47Z

...ogle-cloud-platform/src/main/scala/com/spotify/scio/datastore/syntax/ScioContextSyntax.scala

+  def datastore(
+    projectId: String,
+    query: Query,
+    namespace: String = null,


and reuse it there

namespace: String = DatastoreIO.ReadParam.DefaultNamespace

RustedBones · 2023-05-31T07:45:46Z

scio-jdbc/src/main/scala/com/spotify/scio/jdbc/JdbcIO.scala

@@ -66,7 +64,7 @@ object JdbcIO {
 final case class JdbcSelect[T: Coder](readOptions: JdbcReadOptions[T]) extends JdbcIO[T] {


IMHO we've not defined the constructor for JdbcIOs properly. Here we should only have data that allows to identify the target destination (required to distinguish the mocked IO basically), so the connection option and the query.

All the other param should be passed as ReadP or WriteP

Yes, I also noted that it stands out from other IOs. Will change it

let's do in another PR. I think we have to check some other IOs too (I recall CsvIO has the same issue)

RustedBones · 2023-05-31T09:24:40Z

scio-jdbc/src/main/scala/com/spotify/scio/jdbc/JdbcOptions.scala

-  outputParallelization: Boolean = JdbcIoOptions.DefaultOutputParallelization
+  outputParallelization: Boolean = JdbcIoOptions.DefaultOutputParallelization,
+  dataSourceProviderFn: () => DataSource = null,
+  configOverride: Read[T] => Read[T] = null


shouldn't we pass identity instead ?

RustedBones · 2023-06-02T09:44:30Z

scio-jdbc/src/main/scala/com/spotify/scio/jdbc/JdbcIO.scala

@@ -104,38 +144,42 @@ final case class JdbcSelect[T: Coder](readOptions: JdbcReadOptions[T]) extends J
    EmptyTap
 }

-final case class JdbcWrite[T](writeOptions: JdbcWriteOptions[T]) extends JdbcIO[T] {
+final case class JdbcWrite[T](opts: JdbcConnectionOptions, statement: String) extends JdbcIO[T] {


We can think of merging the read/write into a single IO that now they have the same signature. Probably for a next step

That would be a good thing

RustedBones · 2023-06-02T09:45:20Z

scio-jdbc/src/main/scala/com/spotify/scio/jdbc/syntax/SCollectionSyntax.scala


 /** Enhanced version of [[com.spotify.scio.values.SCollection SCollection]] with JDBC methods. */
 final class JdbcSCollectionOps[T](private val self: SCollection[T]) extends AnyVal {

  /** Save this SCollection as a JDBC database. */
+  @deprecated("Use another overload with multiple parameters")


put the since="0.13.0" param

RustedBones · 2023-06-02T09:46:14Z

scio-jdbc/src/main/scala/com/spotify/scio/jdbc/syntax/SCollectionSyntax.scala

+  def saveAsJdbc(
+    connectionOptions: JdbcConnectionOptions,
+    statement: String,
+    preparedStatementSetter: (T, PreparedStatement) => Unit,


I'd curry this one as this is the element operation (like foreach)

RustedBones · 2023-06-02T09:47:07Z

scio-jdbc/src/main/scala/com/spotify/scio/jdbc/syntax/ScioContextSyntax.scala

+  def jdbcSelect[T: ClassTag: Coder](
+    connectionOptions: JdbcConnectionOptions,
+    query: String,
+    rowMapper: ResultSet => T,


I'd curry this one as this is a map element operation

RustedBones

Nit for the example code

scio-examples/src/main/scala/com/spotify/scio/examples/extra/CloudSqlExample.scala

…loudSqlExample.scala Co-authored-by: Michel Davit <micheld@spotify.com>

Added new JdbcIO read/write params to Scio

9aa40ed

shnapz requested a review from RustedBones May 24, 2023 18:04

shnapz marked this pull request as draft May 24, 2023 18:05

added configOverride

5e2ee53

RustedBones reviewed May 25, 2023

View reviewed changes

added unit test for JdbcIO

0b0bac8

shnapz marked this pull request as ready for review May 30, 2023 01:28

RustedBones reviewed May 30, 2023

View reviewed changes

shnapz added this to the 0.13.0 milestone May 30, 2023

shnapz self-assigned this May 30, 2023

shnapz added 5 commits May 30, 2023 12:49

Fixed, added Write JDBCIO tests

58ed32a

Fix ArrayBuffer calls

13c928c

Added header

83d9e1c

Fix cofigOverride

5315e90

Added configOverride to Datastore

05416d6

RustedBones reviewed May 31, 2023

View reviewed changes

shnapz added 7 commits May 31, 2023 19:42

Refactored JdbcIO Read/Write params

cc5cc82

Removed commented code

e513de6

fix warning

2382d06

git status

b0cec13

Returned previous APIs and marked as deprecated

d56b60b

one more deprecated

01e74cc

renamed file

0a7c741

shnapz requested a review from RustedBones June 1, 2023 15:50

RustedBones reviewed Jun 2, 2023

View reviewed changes

shnapz added 2 commits June 2, 2023 10:49

Address PR comments

8c46df0

small fix

8cb973b

RustedBones approved these changes Jun 5, 2023

View reviewed changes

scio-examples/src/main/scala/com/spotify/scio/examples/extra/CloudSqlExample.scala Outdated Show resolved Hide resolved

RustedBones reviewed Jun 5, 2023

View reviewed changes

scio-examples/src/main/scala/com/spotify/scio/examples/extra/CloudSqlExample.scala Outdated Show resolved Hide resolved

shnapz and others added 3 commits June 5, 2023 12:07

Update scio-examples/src/main/scala/com/spotify/scio/examples/extra/C…

5ed97d2

…loudSqlExample.scala Co-authored-by: Michel Davit <micheld@spotify.com>

Update scio-examples/src/main/scala/com/spotify/scio/examples/extra/C…

281024c

…loudSqlExample.scala Co-authored-by: Michel Davit <micheld@spotify.com>

Fix IT test

e583a85

RustedBones merged commit eee089e into main Jun 6, 2023

RustedBones deleted the akabas/jdbcio-extension branch June 6, 2023 14:54

laviandra mentioned this pull request Aug 17, 2023

Jdbc docs are outdated #4953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new JdbcIO read/write params to Scio #4820

Added new JdbcIO read/write params to Scio #4820

shnapz commented May 24, 2023 •

edited

Loading

codecov bot commented May 24, 2023 •

edited

Loading

RustedBones May 25, 2023

RustedBones May 25, 2023

shnapz May 25, 2023

RustedBones May 30, 2023

RustedBones May 30, 2023

shnapz May 30, 2023

RustedBones May 31, 2023

shnapz May 31, 2023

RustedBones May 31, 2023

RustedBones May 31, 2023

shnapz May 31, 2023

RustedBones May 31, 2023

RustedBones May 31, 2023

RustedBones Jun 2, 2023

shnapz Jun 2, 2023

RustedBones Jun 2, 2023

RustedBones Jun 2, 2023

RustedBones Jun 2, 2023

RustedBones left a comment

		namespace: String = null,
		configOverride: beam.DatastoreV1.Read => beam.DatastoreV1.Read = null

		@@ -66,7 +64,7 @@ object JdbcIO {
		final case class JdbcSelect[T: Coder](readOptions: JdbcReadOptions[T]) extends JdbcIO[T] {

Added new JdbcIO read/write params to Scio #4820

Added new JdbcIO read/write params to Scio #4820

Conversation

shnapz commented May 24, 2023 • edited Loading

codecov bot commented May 24, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RustedBones left a comment

Choose a reason for hiding this comment

shnapz commented May 24, 2023 •

edited

Loading

codecov bot commented May 24, 2023 •

edited

Loading