Releases · apache/beam

24 Aug 17:16

lostluck

v2.59.0

c560075

Latest

We are happy to present the new 2.59.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.59.0, check out the detailed release notes.

Highlights

Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
Initial experimental support for using Prism with the Java and Python SDKs
- Prism is presently targeting local testing usage, or other small scale execution.
- For Java, use 'PrismRunner', or 'TestPrismRunner' as an argument to the --runner flag.
- For Python, use 'PrismRunner' as an argument to the --runner flag.
- Go already uses Prism as the default local runner.

I/Os

Improvements to the performance of BigqueryIO when using withPropagateSuccessfulStorageApiWrites(true) method (Java) (#31840).
[Managed Iceberg] Added support for writing to partitioned tables (#32102)
Update ClickHouseIO to use the latest version of the ClickHouse JDBC driver (#32228).
Add ClickHouseIO dedicated User-Agent (#32252).

New Features / Improvements

BigQuery endpoint can be overridden via PipelineOptions, this enables BigQuery emulators (Java) (#28149).
Go SDK Minimum Go Version updated to 1.21 (#32092).
[BigQueryIO] Added support for withFormatRecordOnFailureFunction() for STORAGE_WRITE_API and STORAGE_API_AT_LEAST_ONCE methods (Java) (#31354).
Updated Go protobuf package to new version (Go) (#21515).
Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
Adds OrderedListState support for Java SDK via FnApi.
Initial support for using Prism from the Python and Java SDKs.

Bugfixes

Fixed incorrect service account impersonation flow for Python pipelines using BigQuery IOs (#32030).
Auto-disable broken and meaningless upload_graph feature when using Dataflow Runner V2 (#32159).
(Python) Upgraded google-cloud-storage to version 2.18.2 to fix a data corruption issue (#32135).
(Go) Fix corruption on State API writes. (#32245).

Known Issues

Prism is under active development and does not yet support all pipelines. See #29650 for progress.
- In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
  OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support.
- If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.59.0 release. Thank you to all contributors!

Ahmed Abualsaud,Ahmet Altay,Andrew Crites,atask-g,Axel Magnuson,Ayush Pandey,Bartosz Zablocki,Chamikara Jayalath,cutiepie-10,Damon,Danny McCormick,dependabot[bot],Eddie Phillips,Francis O'Hara,Hyeonho Kim,Israel Herraiz,Jack McCluskey,Jaehyeon Kim,Jan Lukavský,Jeff Kinard,Jeffrey Kinard,jonathan-lemos,jrmccluskey,Kirill Berezin,Kiruphasankaran Nataraj,lahariguduru,liferoad,lostluck,Maciej Szwaja,Manit Gupta,Mark Zitnik,martin trieu,Naireen Hussain,Prerit Chandok,Radosław Stankiewicz,Rebecca Szper,Robert Bradshaw,Robert Burke,ron-gal,Sam Whittle,Sergei Lilichenko,Shunping Huang,Svetak Sundhar,Thiago Nunes,Timothy Itodo,tvalentyn,twosom,Vatsal,Vitaly Terentyev,Vlado Djerek,Yifan Ye,Yi Hu

Assets 20

apache_beam-v2.59.0-prism-darwin-amd64.zip

15.9 MB 2024-08-24T15:59:54Z
apache_beam-v2.59.0-prism-darwin-amd64.zip.asc

866 Bytes 2024-08-24T15:59:54Z
apache_beam-v2.59.0-prism-darwin-amd64.zip.sha512

173 Bytes 2024-08-24T15:59:54Z
apache_beam-v2.59.0-prism-darwin-arm64.zip

15.3 MB 2024-08-24T16:00:25Z
apache_beam-v2.59.0-prism-darwin-arm64.zip.asc

866 Bytes 2024-08-24T16:00:25Z
apache_beam-v2.59.0-prism-darwin-arm64.zip.sha512

173 Bytes 2024-08-24T16:00:25Z
apache_beam-v2.59.0-prism-linux-amd64.zip

15.9 MB 2024-08-24T15:57:52Z
apache_beam-v2.59.0-prism-linux-amd64.zip.asc

866 Bytes 2024-08-24T15:57:52Z
apache_beam-v2.59.0-prism-linux-amd64.zip.sha512

172 Bytes 2024-08-24T15:57:51Z
apache_beam-v2.59.0-prism-linux-arm64.zip

14.9 MB 2024-08-24T15:58:22Z
Source code (zip)

2024-09-11T21:30:16Z
Source code (tar.gz)

2024-09-11T21:30:16Z

16 Aug 18:44

damccorm

v2.58.1

414bc20

Beam 2.58.1 release

We are happy to present the new 2.58.1 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

New Features / Improvements

Fixed issue where KafkaIO Records read with ReadFromKafkaViaSDF are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, (#32196)

Known Issues

Large Dataflow graphs using runner v2, or pipelines explicitly enabling the upload_graph experiment, will fail at construction time (#32159).
Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue (#32169). The issue will be fixed in 2.59.0 (#32135). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.58.1 release. Thank you to all contributors!

Danny McCormick

Sam Whittle

Assets 20

06 Aug 13:49

jrmccluskey

v2.58.0

d8315f6

Beam 2.58.0 release

We are happy to present the new 2.58.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information about changes in 2.58.0, check out the detailed release notes.

I/Os

Support for Solace source (SolaceIO.Read) added (Java) (#31440).

New Features / Improvements

Multiple RunInference instances can now share the same model instance by setting the model_identifier parameter (Python) (#31665).
Added options to control the number of Storage API multiplexing connections (#31721)
[BigQueryIO] Better handling for batch Storage Write API when it hits AppendRows throughput quota (#31837)
[IcebergIO] All specified catalog properties are passed through to the connector (#31726)
Removed a third-party LGPL dependency from the Go SDK (#31765).
Support for MapState and SetState when using Dataflow Runner v1 with Streaming Engine (Java) ([#18200])

Breaking Changes

[IcebergIO] IcebergCatalogConfig was changed to support specifying catalog properties in a key-store fashion (#31726)
[SpannerIO] Added validation that query and table cannot be specified at the same time for SpannerIO.read(). Previously withQuery overrides withTable, if set (#24956).

Bug fixes

[BigQueryIO] Fixed a bug in batch Storage Write API that frequently exhausted concurrent connections quota (#31710)

List of Contributors

According to git shortlog, the following people contributed to the 2.58.0 release. Thank you to all contributors!

Ahmed Abualsaud

Ahmet Altay

Alexandre Moueddene

Alexey Romanenko

Andrew Crites

Bartosz Zablocki

Celeste Zeng

Chamikara Jayalath

Clay Johnson

Damon Douglass

Danny McCormick

Dilnaz Amanzholova

Florian Bernard

Francis O'Hara

George Ma

Israel Herraiz

Jack McCluskey

Jaehyeon Kim

James Roseman

Kenneth Knowles

Maciej Szwaja

Michel Davit

Minh Son Nguyen

Naireen

Niel Markwick

Oliver Cardoza

Robert Bradshaw

Robert Burke

Rohit Sinha

S. Veyrié

Sam Whittle

Shunping Huang

Svetak Sundhar

TongruiLi

Tony Tang

Valentyn Tymofieiev

Vitaly Terentyev

Yi Hu

Assets 20

26 Jun 20:00

kennknowles

v2.57.0

e3314d4

Beam 2.57.0 Release

We are happy to present the new 2.57.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.57.0, check out the detailed release notes.

Highlights

Apache Beam adds Python 3.12 support (#29149).
Added FlinkRunner for Flink 1.18 (#30789).

I/Os

Ensure that BigtableIO closes the reader streams (#31477).

New Features / Improvements

Added Feast feature store handler for enrichment transform (Python) (#30957).
BigQuery per-worker metrics are reported by default for Streaming Dataflow Jobs (Java) (#31015)
Adds inMemory() variant of Java List and Map side inputs for more efficient lookups when the entire side input fits into memory.
Beam YAML now supports the jinja templating syntax.
Template variables can be passed with the (json-formatted) --jinja_variables flag.
DataFrame API now supports pandas 2.1.x and adds 12 more string functions for Series.(#31185).
Added BigQuery handler for enrichment transform (Python) (#31295)
Disable soft delete policy when creating the default bucket for a project (Java) (#31324).
Added DoFn.SetupContextParam and DoFn.BundleContextParam which can be used
as a python DoFn.process, Map, or FlatMap parameter to invoke a context
manager per DoFn setup or bundle (analogous to using setup/teardown
or start_bundle/finish_bundle respectively.)
Go SDK Prism Runner
- Pre-built Prism binaries are now part of the release and are available via the Github release page. (#29697).
- Some pipelines will work on Java and Python, but this is in part to prepare for real runner wrappers in 2.58.0
- ProcessingTime is now handled synthetically with TestStream pipelines and Non-TestStream pipelines, for fast test pipeline execution by default. (#30083).
  - Prism does NOT yet support "real time" execution for this release.
Improve processing for large elements to reduce the chances for exceeding 2GB protobuf limits (Python)([https://github.com//issues/31607]).

Breaking Changes

Java's View.asList() side inputs are now optimized for iterating rather than
indexing when in the global window.
This new implementation still supports all (immutable) List methods as before,
but some of the random access methods like get() and size() will be slower.
To use the old implementation one can use View.asList().withRandomAccess().
SchemaTransforms implemented with TypedSchemaTransformProvider now produce a
configuration Schema with snake_case naming convention
(#31374). This will make the following
cases problematic:
- Running a pre-2.57.0 remote SDK pipeline containing a 2.57.0+ Java SchemaTransform,
  and vice versa:
- Running a 2.57.0+ remote SDK pipeline containing a pre-2.57.0 Java SchemaTransform
- All direct uses of Python's SchemaAwareExternalTransform
  should be updated to use new snake_case parameter names.
Upgraded Jackson Databind to 2.15.4 (Java) (#26743).
jackson-2.15 has known breaking changes. An important one is it imposed a buffer limit for parser.
If your custom PTransform/DoFn are affected, refer to #31580 for mitigation.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.57.0 release. Thank you to all contributors!

Ahmed Abualsaud

Ahmet Altay

Alexey Romanenko

Andrey Devyatkin

Anody Zhang

Arvind Ram

Ben Konz

Bruno Volpato

Celeste Zeng

Chamikara Jayalath

Claire McGinty

Colm O hEigeartaigh

Damon

Danny McCormick

Evan Galpin

Ferran Fernández Garrido

Florent Biville

Jack Dingilian

Jack McCluskey

Jan Lukavský

JayajP

Jeff Kinard

Jeffrey Kinard

John Casey

Justin Uang

Kenneth Knowles

Kevin Zhou

Liam Miller-Cushon

Maarten Vercruysse

Maciej Szwaja

Maja Kontrec Rönn

Marc hurabielle

Martin Trieu

Mattie Fu

Min Zhu

Naireen Hussain

Nick Anikin

Pablo Rodriguez Defino

Paul King

Priyans Desai

Radosław Stankiewicz

Rebecca Szper

Ritesh Ghorse

Robert Bradshaw

Robert Burke

Rodrigo Bozzolo

RyuSA

Sam Rohde

Sam Whittle

Sergei Lilichenko

Shahar Epstein

Shunping Huang

Svetak Sundhar

Tomo Suzuki

Tony Tang

Valentyn Tymofieiev

Vincent Stollenwerk

Vineet Kumar

Vitaly Terentyev

Vlado Djerek

XQ Hu

Yi Hu

akashorabek

bzablocki

kberezin

Assets 20

02 May 01:14

damccorm

v2.56.0

b34cf54

Beam 2.56.0 release

We are happy to present the new 2.56.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.56.0, check out the detailed release notes.

Highlights

Added FlinkRunner for Flink 1.17, removed support for Flink 1.12 and 1.13. Previous version of Pipeline running on Flink 1.16 and below can be upgraded to 1.17, if the Pipeline is first updated to Beam 2.56.0 with the same Flink version. After Pipeline runs with Beam 2.56.0, it should be possible to upgrade to FlinkRunner with Flink 1.17. (#29939)
New Managed I/O Java API (#30830).
New Ordered Processing PTransform added for processing order-sensitive stateful data (#30735).

I/Os

Upgraded Avro version to 1.11.3, kafka-avro-serializer and kafka-schema-registry-client versions to 7.6.0 (Java) (#30638).
The newer Avro package is known to have breaking changes. If you are affected, you can keep pinned to older Avro versions which are also tested with Beam.
Iceberg read/write support is available through the new Managed I/O Java API (#30830).

New Features / Improvements

Profiling of Cythonized code has been disabled by default. This might improve performance for some Python pipelines (#30938).
Bigtable enrichment handler now accepts a custom function to build a composite row key. (Python) (#30974).

Breaking Changes

Default consumer polling timeout for KafkaIO.Read was increased from 1 second to 2 seconds. Use KafkaIO.read().withConsumerPollingTimeout(Duration duration) to configure this timeout value when necessary (#30870).
Python Dataflow users no longer need to manually specify --streaming for pipelines using unbounded sources such as ReadFromPubSub.

Bugfixes

Fixed locking issue when shutting down inactive bundle processors. Symptoms of this issue include slowness or stuckness in long-running jobs (Python) (#30679).
Fixed logging issue that caused silecing the pip output when installing of dependencies provided in --requirements_file (Python).

List of Contributors

According to git shortlog, the following people contributed to the 2.56.0 release. Thank you to all contributors!

Abacn

Ahmed Abualsaud

Andrei Gurau

Andrey Devyatkin

Aravind Pedapudi

Arun Pandian

Arvind Ram

Bartosz Zablocki

Brachi Packter

Byron Ellis

Chamikara Jayalath

Clement DAL PALU

Damon

Danny McCormick

Daria Bezkorovaina

Dip Patel

Evan Burrell

Hai Joey Tran

Jack McCluskey

Jan Lukavský

JayajP

Jeff Kinard

Julien Tournay

Kenneth Knowles

Luís Bianchin

Maciej Szwaja

Melody Shen

Oleh Borysevych

Pablo Estrada

Rebecca Szper

Ritesh Ghorse

Robert Bradshaw

Sam Whittle

Sergei Lilichenko

Shahar Epstein

Shunping Huang

Svetak Sundhar

Timothy Itodo

Veronica Wasson

Vitaly Terentyev

Vlado Djerek

Yi Hu

akashorabek

bzablocki

clmccart

damccorm

dependabot[bot]

dmitryor

github-actions[bot]

liferoad

martin trieu

tvalentyn

xianhualiu

Assets 2

08 Apr 13:09

damccorm

v2.55.1

62a3eb4

Beam 2.55.1 release

Bugfixes

Fixed issue that broke WriteToJson in languages other than Java (X-lang) (#30776).

Assets 2

25 Mar 19:54

Abacn

v2.55.0

dda0549

Beam 2.55.0 release

We are happy to present the new 2.55.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.55.0, check out the detailed release notes.

Highlights

The Python SDK will now include automatically generated wrappers for external Java transforms! (#29834)

I/Os

Added support for handling bad records to BigQueryIO (#30081).
- Full Support for Storage Read and Write APIs
- Partial Support for File Loads (Failures writing to files supported, failures loading files to BQ unsupported)
- No Support for Extract or Streaming Inserts
Added support for handling bad records to PubSubIO (#30372).
- Support is not available for handling schema mismatches, and enabling error handling for writing to Pub/Sub topics with schemas is not recommended
--enableBundling pipeline option for BigQueryIO DIRECT_READ is replaced by --enableStorageReadApiV2. Both were considered experimental and subject to change (Java) (#26354).

New Features / Improvements

Allow writing clustered and not time-partitioned BigQuery tables (Java) (#30094).
Redis cache support added to RequestResponseIO and Enrichment transform (Python) (#30307)
Merged sdks/java/fn-execution and runners/core-construction-java into the main SDK. These artifacts were never meant for users, but noting
that they no longer exist. These are steps to bring portability into the core SDK alongside all other core functionality.
Added Vertex AI Feature Store handler for Enrichment transform (Python) (#30388)

Breaking Changes

Arrow version was bumped to 15.0.0 from 5.0.0 (#30181).
Go SDK users who build custom worker containers may run into issues with the move to distroless containers as a base (see Security Fixes).
- The issue stems from distroless containers lacking additional tools, which current custom container processes may rely on.
- See https://beam.apache.org/documentation/runtime/environments/#from-scratch-go for instructions on building and using a custom container.
Python SDK has changed the default value for the --max_cache_memory_usage_mb pipeline option from 100 to 0. This option was first introduced in the 2.52.0 SDK version. This change restores the behavior of the 2.51.0 SDK, which does not use the state cache. If your pipeline uses iterable side inputs views, consider increasing the cache size by setting the option manually. (#30360).

Deprecations

Bug fixes

Fixed SpannerIO.readChangeStream to support propagating credentials from pipeline options
to the getDialect calls for authenticating with Spanner (Java) (#30361).
Reduced the number of HTTP requests in GCSIO function calls (Python) (#30205)

Security Fixes

Go SDK base container image moved to distroless/base-nossl-debian12, reducing vulnerable container surface to kernel and glibc (#30011).

Known Issues

In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 (#30679).

List of Contributors

According to git shortlog, the following people contributed to the {$RELEASE_VERSION} release. Thank you to all contributors!

Ahmed Abualsaud

Anand Inguva

Andrew Crites

Andrey Devyatkin

Arun Pandian

Arvind Ram

Chamikara Jayalath

Chris Gray

Claire McGinty

Damon Douglas

Dan Ellis

Danny McCormick

Daria Bezkorovaina

Dima I

Edward Cui

Ferran Fernández Garrido

GStravinsky

Jan Lukavský

Jason Mitchell

JayajP

Jeff Kinard

Jeffrey Kinard

Kenneth Knowles

Mattie Fu

Michel Davit

Oleh Borysevych

Ritesh Ghorse

Ritesh Tarway

Robert Bradshaw

Robert Burke

Sam Whittle

Scott Strong

Shunping Huang

Steven van Rossum

Svetak Sundhar

Talat UYARER

Ukjae Jeong (Jay)

Vitaly Terentyev

Vlado Djerek

Yi Hu

akashorabek

case-k

clmccart

dengwe1

dhruvdua

hardshah

johnjcasey

liferoad

martin trieu

tvalentyn

Assets 2

14 Feb 18:03

lostluck

v2.54.0

f660f49

Beam 2.54.0 release

We are happy to present the new 2.54.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.54.0, check out the detailed release notes.

Highlights

Enrichment Transform along with GCP BigTable handler added to Python SDK (#30001).
Beam Java Batch pipelines run on Google Cloud Dataflow will default to the Portable (Runner V2)[https://cloud.google.com/dataflow/docs/runner-v2] starting with this version. (All other languages are already on Runner V2.)
- This change is still rolling out to the Dataflow service, see (Runner V2 documentation)[https://cloud.google.com/dataflow/docs/runner-v2] for how to enable or disable it intentionally.

I/Os

Added support for writing to BigQuery dynamic destinations with Python's Storage Write API (#30045)
Adding support for Tuples DataType in ClickHouse (Java) (#29715).
Added support for handling bad records to FileIO, TextIO, AvroIO (#29670).
Added support for handling bad records to BigtableIO (#29885).

New Features / Improvements

Enrichment Transform along with GCP BigTable handler added to Python SDK (#30001).

Breaking Changes

Deprecations

Bugfixes

Fixed a memory leak affecting some Go SDK since 2.46.0. (#28142)

Security Fixes

Known Issues

List of Contributors

According to git shortlog, the following people contributed to the 2.54.0 release. Thank you to all contributors!

Ahmed Abualsaud

Alexey Romanenko

Anand Inguva

Andrew Crites

Arun Pandian

Bruno Volpato

caneff

Chamikara Jayalath

Changyu Li

Cheskel Twersky

Claire McGinty

clmccart

Damon

Danny McCormick

dependabot[bot]

Edward Cheng

Ferran Fernández Garrido

Hai Joey Tran

hugo-syn

Issac

Jack McCluskey

Jan Lukavský

JayajP

Jeffrey Kinard

Jerry Wang

Jing

Joey Tran

johnjcasey

Kenneth Knowles

Knut Olav Løite

liferoad

Marc

Mark Zitnik

martin trieu

Mattie Fu

Naireen Hussain

Neeraj Bansal

Niel Markwick

Oleh Borysevych

pablo rodriguez defino

Rebecca Szper

Ritesh Ghorse

Robert Bradshaw

Robert Burke

Sam Whittle

Shunping Huang

Svetak Sundhar

S. Veyrié

Talat UYARER

tvalentyn

Vlado Djerek

Yi Hu

Zechen Jian

Assets 2

04 Jan 16:15

jrmccluskey

v2.53.0

260b554

Beam 2.53.0 release

We are happy to present the new 2.53.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.53.0, check out the detailed release notes.

Highlights

Python streaming users that use 2.47.0 and newer versions of Beam should update to version 2.53.0, which fixes a known issue: (#27330).

I/Os

TextIO now supports skipping multiple header lines (Java) (#17990).
Python GCSIO is now implemented with GCP GCS Client instead of apitools (#25676)
Adding support for LowCardinality DataType in ClickHouse (Java) (#29533).
Added support for handling bad records to KafkaIO (Java) (#29546)
Add support for generating text embeddings in MLTransform for Vertex AI and Hugging Face Hub models.(#29564)
NATS IO connector added (Go) (#29000).

New Features / Improvements

The Python SDK now type checks collections.abc.Collections types properly. Some type hints that were erroneously allowed by the SDK may now fail. (#29272)
Running multi-language pipelines locally no longer requires Docker.
Instead, the same (generally auto-started) subprocess used to perform the
expansion can also be used as the cross-language worker.
Framework for adding Error Handlers to composite transforms added in Java (#29164).
Python 3.11 images now include google-cloud-profiler (#29561).

Breaking Changes

Upgraded to go 1.21.5 to build, fixing CVE-2023-45285 and CVE-2023-39326

Deprecations

Euphoria DSL is deprecated and will be removed in a future release (not before 2.56.0) (#29451)

Bugfixes

(Python) Fixed sporadic crashes in streaming pipelines that affected some users of 2.47.0 and newer SDKs (#27330).
(Python) Fixed a bug that caused MLTransform to drop identical elements in the output PCollection (#29600).

List of Contributors

According to git shortlog, the following people contributed to the 2.53.0 release. Thank you to all contributors!

Ahmed Abualsaud

Ahmet Altay

Alexey Romanenko

Anand Inguva

Arun Pandian

Balázs Németh

Bruno Volpato

Byron Ellis

Calvin Swenson Jr

Chamikara Jayalath

Clay Johnson

Damon

Danny McCormick

Ferran Fernández Garrido

Georgii Zemlianyi

Israel Herraiz

Jack McCluskey

Jacob Tomlinson

Jan Lukavský

JayajP

Jeffrey Kinard

Johanna Öjeling

Julian Braha

Julien Tournay

Kenneth Knowles

Lawrence Qiu

Mark Zitnik

Mattie Fu

Michel Davit

Mike Williamson

Naireen

Naireen Hussain

Niel Markwick

Pablo Estrada

Radosław Stankiewicz

Rebecca Szper

Reuven Lax

Ritesh Ghorse

Robert Bradshaw

Robert Burke

Sam Rohde

Sam Whittle

Shunping Huang

Svetak Sundhar

Talat UYARER

Tom Stepp

Tony Tang

Vlado Djerek

Yi Hu

Zechen Jiang

clmccart

damccorm

darshan-sj

gabry.wu

johnjcasey

liferoad

lrakla

martin trieu

tvalentyn

Assets 2

17 Nov 18:45

damccorm

v2.52.0

7c8a997

Beam 2.52.0 release

We are happy to present the new 2.52.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.52.0, check out the detailed release notes.

Highlights

Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been finally removed from Java SDK "core" package.
Please, use beam-sdks-java-extensions-avro instead. This will allow to easily update Avro version in user code without
potential breaking changes in Beam "core" since the Beam Avro extension already supports the latest Avro versions and
should handle this. (#25252).
Publishing Java 21 SDK container images now supported as part of Apache Beam release process. (#28120)
- Direct Runner and Dataflow Runner support running pipelines on Java21 (experimental until tests fully setup). For other runners (Flink, Spark, Samza, etc) support status depend on runner projects.

New Features / Improvements

Add UseDataStreamForBatch pipeline option to the Flink runner. When it is set to true, Flink runner will run batch
jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed
using the DataSet API.
upload_graph as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK (PR#28621.
state amd side input cache has been enabled to a default of 100 MB. Use --max_cache_memory_usage_mb=X to provide cache size for the user state API and side inputs. (Python) (#28770).
Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO's and turnkey transforms. More information can be found in the YAML root folder and in the README.

Breaking Changes

org.apache.beam.sdk.io.CountingSource.CounterMark uses custom CounterMarkCoder as a default coder since all Avro-dependent
classes finally moved to extensions/avro. In case if it's still required to use AvroCoder for CounterMark, then,
as a workaround, a copy of "old" CountingSource class should be placed into a project code and used directly
(#25252).
Renamed host to firestoreHost in FirestoreOptions to avoid potential conflict of command line arguments (Java) (#29201).

Bugfixes

Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's BigtableIO.BigtableSource when you have more cores than bytes to read (Java) #28793.
watch_file_pattern arg of the RunInference arg had no effect prior to 2.52.0. To use the behavior of arg watch_file_pattern prior to 2.52.0, follow the documentation at https://beam.apache.org/documentation/ml/side-input-updates/ and use WatchFilePattern PTransform as a SideInput. (#28948)
MLTransform doesn't output artifacts such as min, max and quantiles. Instead, MLTransform will add a feature to output these artifacts as human readable format - #29017. For now, to use the artifacts such as min and max that were produced by the eariler MLTransform, use read_artifact_location of MLTransform, which reads artifacts that were produced earlier in a different MLTransform (#29016)
Fixed a memory leak, which affected some long-running Python pipelines: #28246.

Security Fixes

Fixed CVE-2023-39325 (Java/Python/Go) (#29118).
Mitigated CVE-2023-47248 (Python) #29392.

List of Contributors

According to git shortlog, the following people contributed to the 2.52.0 release. Thank you to all contributors!

Ahmed Abualsaud
Ahmet Altay
Aleksandr Dudko
Alexey Romanenko
Anand Inguva
Andrei Gurau
Andrey Devyatkin
BjornPrime
Bruno Volpato
Bulat
Chamikara Jayalath
Damon
Danny McCormick
Devansh Modi
Dominik Dębowczyk
Ferran Fernández Garrido
Hai Joey Tran
Israel Herraiz
Jack McCluskey
Jan Lukavský
JayajP
Jeff Kinard
Jeffrey Kinard
Jiangjie Qin
Jing
Joar Wandborg
Johanna Öjeling
Julien Tournay
Kanishk Karanawat
Kenneth Knowles
Kerry Donny-Clark
Luís Bianchin
Minbo Bae
Pranav Bhandari
Rebecca Szper
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
RyuSA
Shunping Huang
Steven van Rossum
Svetak Sundhar
Tony Tang
Vitaly Terentyev
Vivek Sumanth
Vlado Djerek
Yi Hu
aku019
brucearctor
caneff
damccorm
ddebowczyk92
dependabot[bot]
dpcollins-google
edman124
gabry.wu
illoise
johnjcasey
jonathan-lemos
kennknowles
liferoad
magicgoody
martin trieu
nancyxu123
pablo rodriguez defino
tvalentyn

Assets 2

Releases: apache/beam

Beam 2.59.0 release

Highlights

I/Os

New Features / Improvements

Bugfixes

Known Issues

List of Contributors

Beam 2.58.1 release

New Features / Improvements

Known Issues

List of Contributors

Beam 2.58.0 release

I/Os

New Features / Improvements

Breaking Changes

Bug fixes

List of Contributors

Beam 2.57.0 Release

Highlights

I/Os

New Features / Improvements

Breaking Changes

List of Contributors

Beam 2.56.0 release

Highlights

I/Os

New Features / Improvements

Breaking Changes

Bugfixes

List of Contributors

Beam 2.55.1 release

Bugfixes

Beam 2.55.0 release

Highlights

I/Os

New Features / Improvements

Breaking Changes

Deprecations

Bug fixes

Security Fixes

Known Issues

List of Contributors

Beam 2.54.0 release

Highlights

I/Os

New Features / Improvements

Breaking Changes

Deprecations

Bugfixes

Security Fixes

Known Issues

List of Contributors

Beam 2.53.0 release

Highlights

I/Os

New Features / Improvements

Breaking Changes

Deprecations

Bugfixes

List of Contributors

Beam 2.52.0 release

Highlights

New Features / Improvements

Breaking Changes

Bugfixes

Security Fixes

List of Contributors