Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with voyager #5034

Merged
merged 2 commits into from
Oct 19, 2023
Merged

Fix issues with voyager #5034

merged 2 commits into from
Oct 19, 2023

Conversation

RustedBones
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Oct 17, 2023

Codecov Report

Merging #5034 (e3e637a) into main (fb3c27f) will decrease coverage by 0.01%.
Report is 6 commits behind head on main.
The diff coverage is 78.57%.

❗ Current head e3e637a differs from pull request most recent head 388f1fa. Consider uploading reports for the commit 388f1fa to get more accurate results

@@            Coverage Diff             @@
##             main    #5034      +/-   ##
==========================================
- Coverage   63.12%   63.12%   -0.01%     
==========================================
  Files         286      286              
  Lines       10736    10743       +7     
  Branches      764      791      +27     
==========================================
+ Hits         6777     6781       +4     
- Misses       3959     3962       +3     
Files Coverage Δ
...scala/com/spotify/scio/extra/voyager/Voyager.scala 66.66% <100.00%> (ø)
.../scio/extra/voyager/syntax/SCollectionSyntax.scala 57.14% <100.00%> (+7.14%) ⬆️
...potify/scio/bigquery/types/ConverterProvider.scala 100.00% <ø> (ø)
...m/spotify/scio/bigquery/types/SchemaProvider.scala 95.91% <100.00%> (-0.17%) ⬇️
...com/spotify/scio/bigquery/types/TypeProvider.scala 47.94% <ø> (ø)
...la/com/spotify/scio/bigquery/types/MacroUtil.scala 90.47% <0.00%> (-0.44%) ⬇️
.../src/main/scala/com/spotify/scio/ScioContext.scala 82.52% <66.66%> (-0.40%) ⬇️

... and 1 file with indirect coverage changes

// Do not use reifyAsIterableInGlobalWindow
// as iterableCoder may materialize the full collection
self.context
.parallelize(Seq((): Unit))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, how does Dataflow render this job graph? I guess it adds another root-level transform for .parallelize() that feeds into the Voyager SCollection?

Copy link
Contributor Author

@RustedBones RustedBones Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes:
image
I'm just getting rid on the extra map from reifyAsIterableInGlobalWindow that can trigger the iterableCoder forcing the side input collection to fit in memory. Here we directly access the side input that should be able to stream records from the view

@RustedBones RustedBones marked this pull request as ready for review October 19, 2023 10:23
@RustedBones RustedBones merged commit ee2b08d into main Oct 19, 2023
11 checks passed
@RustedBones RustedBones deleted the voyager-fixes branch October 19, 2023 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants