-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify BucketedInput serialization #5270
Conversation
|
||
// Map distinct FileOperations/FileSuffixes to indices in a map, for efficient encoding of | ||
// large BucketedInputs | ||
final Map<KV<String, String>, Integer> fileOperationsMetadata = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just the logic from writeObject
for efficiently encoding large #s of inputs. (the logic from readObject
is now in getInputs()
.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5270 +/- ##
=======================================
Coverage 62.50% 62.50%
=======================================
Files 301 301
Lines 10860 10852 -8
Branches 740 736 -4
=======================================
- Hits 6788 6783 -5
+ Misses 4072 4069 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still something I don't get: How replacing the
protected Map<ResourceId, KV<String, FileOperations<V>>> inputs;
With
private transient Map<ResourceId, KV<String, FileOperations<V>>> inputs;
private final Map<Integer, KV<String, FileOperations>> fileOperationsEncoding;
private final Map<ResourceId, Integer> directoriesEncoding;
now makes the class serializable ?
scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/SortedBucketSource.java
Outdated
Show resolved
Hide resolved
scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/SortedBucketSource.java
Outdated
Show resolved
Hide resolved
Sorry for the confusion. To clarify, BucketedInput was
My suspicion is that we weren't properly flushing something in our complex It's a bit frustrating because I haven't been able to repro this with a unit test (even using the Flink serializer class...), so it's all been manually tested via deploys to a locally running Flink cluster. |
Removes overrides of readObject/writeObject by simply making all class members serializable/