Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe coder unwrap #4887

Merged
merged 4 commits into from
Jun 20, 2023
Merged

Safe coder unwrap #4887

merged 4 commits into from
Jun 20, 2023

Conversation

RustedBones
Copy link
Contributor

@RustedBones RustedBones commented Jun 15, 2023

Unwrapping was done on Ref and LazyCoder. This was required to extract values from tuple coders when RecordCoder was used (materialization wraps into a Ref/LazyCoder)

IMHO, we should expect TupleCoder in those cases and fail if a RecordCoder was given.

We could then only unwrap MaterlializedCoder and NullableCoder (double check the option is enabled). implicit scio coders will get stable.

Probably a better fix than #4886

Comment on lines -53 to -56
case c: RecordCoder[(K, V)] =>
val ac = coderElement[K](c)(0)
val bc = coderElement[V](c)(1)
(ac, bc)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll not support users who force non tuple coders for (K, V)

Copy link
Contributor Author

@RustedBones RustedBones Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to accept any StructuredCoder of size 2. even if wrapped in Ref or LazyCoder, this returns the element coders

Comment on lines -37 to -39
// Needed because scalac is an idiot
implicit def btCollCoder: Coder[BTRow] =
Coder.gen[(ByteString, Iterable[Mutation])]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like not. And Coder.tuple2Coder should have been used

@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #4887 (66ee5eb) into main (69ad53a) will decrease coverage by 0.02%.
The diff coverage is 85.91%.

❗ Current head 66ee5eb differs from pull request most recent head 252a9e9. Consider uploading reports for the commit 252a9e9 to get more accurate results

@@            Coverage Diff             @@
##             main    #4887      +/-   ##
==========================================
- Coverage   62.84%   62.83%   -0.02%     
==========================================
  Files         282      282              
  Lines       10573    10595      +22     
  Branches      795      775      -20     
==========================================
+ Hits         6645     6657      +12     
- Misses       3928     3938      +10     
Impacted Files Coverage Δ
...fy/scio/coders/instances/kryo/AvroSerializer.scala 100.00% <ø> (ø)
...com/spotify/scio/coders/instances/AvroCoders.scala 84.21% <ø> (ø)
...-core/src/main/scala/com/spotify/scio/io/Tap.scala 79.31% <ø> (ø)
...ify/scio/io/dynamic/syntax/SCollectionSyntax.scala 90.38% <ø> (ø)
...spotify/scio/schemas/instances/AvroInstances.scala 0.00% <ø> (ø)
...in/scala/com/spotify/scio/values/SCollection.scala 94.36% <ø> (ø)
.../com/spotify/scio/parquet/avro/ParquetAvroIO.scala 89.43% <ø> (ø)
...la/com/spotify/scio/testing/BigtableMatchers.scala 100.00% <ø> (ø)
...ain/scala/com/spotify/scio/coders/BeamCoders.scala 65.11% <72.22%> (-11.08%) ⬇️
.../src/main/scala/com/spotify/scio/avro/AvroIO.scala 93.63% <100.00%> (ø)
... and 3 more

@RustedBones RustedBones added this to the 0.13.0 milestone Jun 16, 2023
case c: beam.NullableCoder[T] => c.getValueCoder
case _ => coder
case c: MaterializedCoder[T] => unwrap(options, c.bcoder)
case c: beam.NullableCoder[T] if options.nullableCoders => c.getValueCoder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen in the case that a user doesn't have the --nullableCoders option set, but still uses NullableCoder in their pipeline? (i.e. they have a step like .map(x => ...)(Coder.beam(NullableCoder.of(...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null coder will be propagated. This is required by countByValue for example, otherwise we would fail on null values.


/** Get key-value coders from a `SideInput[Map[K, Iterable[V]]]`. */
def getMultiMapKV[K, V](si: SideInput[Map[K, Iterable[V]]]): (Coder[K], Coder[V]) = {
val coder = si.view.getPCollection.getCoder
Copy link
Contributor Author

@RustedBones RustedBones Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unsafe. The PCollection behind a view has no guarantee to be of the same type (int contains a transform function). Drop this helper and expect a coder when we pass a SideInput

Comment on lines +90 to +91
.asMultiMapSideInput
.map(_.map { case (k, vs) => k -> vs.map(_.toInt) })
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transformation was failing, because we tried to use a Coder[String] to encode the produced Int value

@RustedBones RustedBones merged commit 97fbfea into main Jun 20, 2023
@RustedBones RustedBones deleted the safe-coder-unwrap branch June 20, 2023 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants