Exploring the thread mode in PyFlink

May 6, 2022 - Xingbo Huang Dian Fu

PyFlink was introduced in Flink 1.9 which purpose is to bring the power of Flink to Python users and allow Python users to develop Flink jobs in Python language. The functionality becomes more and more mature through the development in the past releases. Before Flink 1.15, Python user-defined functions will be executed in separate Python processes (based on the Apache Beam Portability Framework). It will bring additional serialization/deserialization overhead and also communication overhead. ...

Continue reading »

Improvements to Flink operations: Snapshots Ownership and Savepoint Formats

May 6, 2022 - Dawid Wysakowicz (@dwysakowicz) Daisy Tsang

Flink has become a well established data streaming engine and a mature project requires some shifting of priorities from thinking purely about new features towards improving stability and operational simplicity. In the last couple of releases, the Flink community has tried to address some known friction points, which includes improvements to the snapshotting process. Snapshotting takes a global, consistent image of the state of a Flink job and is integral to fault-tolerance and exacty-once processing. ...

Continue reading »

Announcing the Release of Apache Flink 1.15

May 5, 2022 - Joe Moser (@JoemoeAT) Yun Gao (@YunGao16)

Thanks to our well-organized and open community, Apache Flink continues to grow as a technology and remain one of the most active projects in the Apache community. With the release of Flink 1.15, we are proud to announce a number of exciting changes. One of the main concepts that makes Apache Flink stand out is the unification of batch (aka bounded) and stream (aka unbounded) data processing, which helps reduce the complexity of development. ...

Continue reading »

Apache Flink Kubernetes Operator 0.1.0 Release Announcement

April 3, 2022 - Gyula Fora (@GyulaFora)

The Apache Flink Community is pleased to announce the preview release of the Apache Flink Kubernetes Operator (0.1.0) The Flink Kubernetes Operator allows users to easily manage their Flink deployment lifecycle using native Kubernetes tooling. The operator takes care of submitting, savepointing, upgrading and generally managing Flink jobs using the built-in Flink Kubernetes integration. This way users do not have to use the Flink Clients (e.g. CLI) or interact with the Flink jobs manually, they only have to declare the desired deployment specification and the operator will take care of the rest. ...

Continue reading »

The Generic Asynchronous Base Sink

March 16, 2022 - Zichen Liu

Flink sinks share a lot of similar behavior. Most sinks batch records according to user-defined buffering hints, sign requests, write them to the destination, retry unsuccessful or throttled requests, and participate in checkpointing. This is why for Flink 1.15 we have decided to create the AsyncSinkBase (FLIP-171), an abstract sink with a number of common functionalities extracted. This is a base implementation for asynchronous sinks, which you should use whenever you need to implement a sink that doesn’t offer transactional capabilities. ...

Continue reading »

Apache Flink 1.14.4 Release Announcement

March 11, 2022 - Konstantin Knauf (@snntrable)

The Apache Flink Community is pleased to announce another bug fix release for Flink 1.14. This release includes 51 bug and vulnerability fixes and minor improvements for Flink 1.14. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA. We highly recommend all users to upgrade to Flink 1.14.4. Release Artifacts # Maven Dependencies # <dependency> <groupId>org. ...

Continue reading »

Scala Free in One Fifteen

February 22, 2022 - Seth Wiesman (@sjwiesman)

Flink 1.15 is right around the corner, and among the many improvements is a Scala free classpath. Users can now leverage the Java API from any Scala version, including Scala 3! Fig.1 Flink 1.15 Scala 3 Example This blog will discuss what has historically made supporting multiple Scala versions so complex, how we achieved this milestone, and the future of Scala in Apache Flink. TLDR: All Scala dependencies are now isolated to the flink-scala jar. ...

Continue reading »

Apache Flink 1.13.6 Release Announcement

February 18, 2022 - Konstantin Knauf (@snntrable)

The Apache Flink Community is pleased to announce another bug fix release for Flink 1.13. This release includes 99 bug and vulnerability fixes and minor improvements for Flink 1.13 including another upgrade of Apache Log4j (to 2.17.1). Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA. We highly recommend all users to upgrade to Flink 1. ...

Continue reading »

Stateful Functions 3.2.0 Release Announcement

January 31, 2022 - Till Rohrmann (@stsffap) Igal Shilman (@IgalShilman)

Stateful Functions is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications. This new release brings various improvements to the StateFun runtime, a leaner way to specify StateFun module components, and a brand new JavaScript SDK! The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent Java SDK, Python SDK,, GoLang SDK and JavaScript SDK distributions are available on Maven, PyPI, Github, and npm respectively. ...

Continue reading »

Pravega Flink Connector 101

January 20, 2022 - Yumin Zhou (Brian) (@crazy__zhou)

Pravega, which is now a CNCF sandbox project, is a cloud-native storage system based on abstractions for both batch and streaming data consumption. Pravega streams (a new storage abstraction) are durable, consistent, and elastic, while natively supporting long-term data retention. In comparison, Apache Flink is a popular real-time computing engine that provides unified batch and stream processing. Flink provides high-throughput, low-latency computation, as well as support for complex event processing and state management. ...

Continue reading »