Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add @yields_batches and @yields_elements #19268

Merged
merged 3 commits into from
Jun 10, 2022

Conversation

TheNeuralBit
Copy link
Member

@TheNeuralBit TheNeuralBit commented Jun 3, 2022

This PR adds two new decorators, @yields_batches and @yields_elements, which override the default interpretation for the output of DoFn.process and DoFn.process_batch. These decorators are handled via changes to OutputProcessor (now renamed to OutputHandler for clarity), which branches depending on whether batches or elements are expected in the results iterable.

Where possible, common logic in OutputHandler was extracted into private helper methods that are re-used. The .pxd definition for these methods indicates they should be inlined in cython-generated code so they don't affect performance.

Fixes #21656

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the python label Jun 3, 2022
@TheNeuralBit
Copy link
Member Author

TheNeuralBit commented Jun 3, 2022

Despite the inlining this PR seems to have a minor effect (0.1-0.2 0.01-0.02 microseconds/element) on map_fn_microbenchmark, presumably due to the new branch. Benchmarks on my desktop with Intel Xeon W-2135 CPU.

c779710:

Fixed cost   1.6063439090633391
Per-element  7.317467689514159e-07                                                                                  
R^2          0.9972847477152673

This PR:

Fixed cost   1.630842854329745
Per-element  7.487253376931855e-07
R^2          0.998137270424781

@codecov
Copy link

codecov bot commented Jun 3, 2022

Codecov Report

Merging #19268 (e6c3735) into master (c779710) will decrease coverage by 0.06%.
The diff coverage is 85.81%.

@@            Coverage Diff             @@
##           master   #19268      +/-   ##
==========================================
- Coverage   74.09%   74.02%   -0.07%     
==========================================
  Files         697      698       +1     
  Lines       91986    92209     +223     
==========================================
+ Hits        68154    68258     +104     
- Misses      22583    22702     +119     
  Partials     1249     1249              
Flag Coverage Δ
python 83.62% <85.81%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...on/apache_beam/runners/direct/sdf_direct_runner.py 35.53% <75.00%> (ø)
sdks/python/apache_beam/runners/common.py 88.68% <83.33%> (+0.74%) ⬆️
sdks/python/apache_beam/transforms/core.py 92.38% <90.47%> (+0.08%) ⬆️
sdks/python/apache_beam/utils/windowed_value.py 92.52% <100.00%> (+0.17%) ⬆️
...ache_beam/runners/dataflow/dataflow_job_service.py 50.00% <0.00%> (-12.17%) ⬇️
sdks/python/apache_beam/dataframe/io.py 88.78% <0.00%> (-3.26%) ⬇️
...eam/runners/portability/fn_api_runner/fn_runner.py 87.51% <0.00%> (-2.50%) ⬇️
...nners/portability/fn_api_runner/worker_handlers.py 77.89% <0.00%> (-1.45%) ⬇️
sdks/python/apache_beam/internal/dill_pickler.py 84.89% <0.00%> (-1.44%) ⬇️
sdks/python/apache_beam/runners/direct/executor.py 96.46% <0.00%> (-0.55%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c779710...e6c3735. Read the comment docs.

@robertwb
Copy link
Contributor

robertwb commented Jun 7, 2022

The "per element" line seems to be missing from the benchmarks. What is the relative difference?

@TheNeuralBit
Copy link
Member Author

Apologies, somehow the formatting was such that the per-element line was hidden after a bunch of spaces. It's there now.

@robertwb
Copy link
Contributor

robertwb commented Jun 8, 2022

Thanks. The difference looks more like 0.01-0.02 microseconds, or a couple percent, which is more acceptable.

@TheNeuralBit TheNeuralBit changed the title [BEAM-14293] Add @yields_batches and @yields_elements Add @yields_batches and @yields_elements Jun 9, 2022
@TheNeuralBit
Copy link
Member Author

retest this please

@TheNeuralBit
Copy link
Member Author

Run Portable_Python PreCommit

@TheNeuralBit
Copy link
Member Author

Run PythonDocker PreCommit

1 similar comment
@TheNeuralBit
Copy link
Member Author

Run PythonDocker PreCommit

@TheNeuralBit
Copy link
Member Author

Run Portable_Python PreCommit

@TheNeuralBit TheNeuralBit merged commit 0de9821 into apache:master Jun 10, 2022
bullet03 pushed a commit to akvelon/beam that referenced this pull request Jun 20, 2022
* Add yields_batches and yields_elements decorators

* Handle timestamp propagation for element-to-batch and batch-to-element

* fixup! Handle timestamp propagation for element-to-batch and batch-to-element
@liferoad liferoad mentioned this pull request May 27, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add @DoFn.yields_batches and @DoFn.yields_elements decorators to override defaults
2 participants