January 17, 2022 -
Thomas Weise
(@thweise)
Martijn Visser
(@martijnvisser82)
The Apache Flink community released the second bugfix version of the Apache Flink 1.14 series. The first bugfix release was 1.14.2, being an emergency release due to an Apache Log4j Zero Day (CVE-2021-44228). Flink 1.14.1 was abandoned. That means that this Flink release is the first bugfix release of the Flink 1.14 series which contains bugfixes not related to the mentioned CVE.
This release includes 164 fixes and minor improvements for Flink 1.
...
Continue reading »
January 7, 2022 -
Dong Lin
Yun Gao
The Apache Flink community is excited to announce the release of Flink ML 2.0.0! Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.
This release involves a major refactor of the earlier Flink ML library and introduces major features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library.
...
Continue reading »
January 4, 2022 -
Zhilong Hong
Zhu Zhu
Daisy Tsang
Till Rohrmann
(@stsffap)
Introduction # When scheduling large-scale jobs in Flink 1.12, a lot of time is required to initialize jobs and deploy tasks. The scheduler also requires a large amount of heap memory in order to store the execution topology and host temporary deployment descriptors. For example, for a job with a topology that contains two vertices connected with an all-to-all edge and a parallelism of 10k (which means there are 10k source tasks and 10k sink tasks and every source task is connected to all sink tasks), Flink’s JobManager would require 30 GiB of heap memory and more than 4 minutes to deploy all of the tasks.
...
Continue reading »
January 4, 2022 -
Zhilong Hong
Zhu Zhu
Daisy Tsang
Till Rohrmann
(@stsffap)
Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations.
Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. Currently, there are two distribution patterns in Flink: pointwise and all-to-all.
...
Continue reading »
December 22, 2021 -
Igal Shilman
Seth Wiesman
The Apache Flink community has released an emergency bugfix version of Apache Flink Stateful Function 3.1.1.
This release include a version upgrade of Apache Flink to 1.13.5, for log4j to address CVE-2021-44228 and CVE-2021-45046.
We highly recommend all users to upgrade to the latest patch release.
You can find the source and binaries on the updated Downloads page, and Docker images in the apache/flink-statefun dockerhub repository.
Continue reading »
December 16, 2021 -
Chesnay Schepler
The Apache Flink community has released emergency bugfix versions of Apache Flink for the 1.11, 1.12, 1.13 and 1.14 series.
These releases only include a version upgrade for Log4j to address CVE-2021-44228 and CVE-2021-45046.
We highly recommend all users to upgrade to the respective patch release.
You can find the source and binaries on the updated Downloads page, and Docker images in the apache/flink dockerhub repository.
We are publishing this announcement earlier than usual to give users access to the updated source/binary releases as soon as possible.
...
Continue reading »
December 10, 2021 -
Konstantin Knauf
Please see [this](/news/2021/12/16/log4j-patch-releases) for our updated recommendation regarding this CVE. Yesterday, a new Zero Day for Apache Log4j was reported. It is by now tracked under CVE-2021-44228.
Apache Flink is bundling a version of Log4j that is affected by this vulnerability. We recommend users to follow the advisory of the Apache Log4j Community. For Apache Flink this currently translates to setting the following property in your flink-conf.yaml:
env.java.opts: -Dlog4j2.formatMsgNoLookups=true If you are already setting env.
...
Continue reading »
November 3, 2021 -
Johannes Moser
It has now been a month since the community released Apache Flink 1.14 into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all.
A retrospective on the release cycle # From the team, we collected emotions that have been attributed to points in time of the 1.
...
Continue reading »
October 26, 2021 -
Yingjie Cao (Kevin)
Daisy Tsang
Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature.
How data gets passed around between operators # Data shuffling is an important stage in batch processing applications and describes how data is sent from one operator to the next. In this phase, output data of the upstream operator will spill over to persistent storages like disk, then the downstream operator will read the corresponding data and process it.
...
Continue reading »
October 26, 2021 -
Yingjie Cao (Kevin)
Daisy Tsang
Part one of this blog post explained the motivation behind introducing sort-based blocking shuffle, presented benchmark results, and provided guidelines on how to use this new feature.
Like sort-merge shuffle implemented by other distributed data processing frameworks, the whole sort-based shuffle process in Flink consists of several important stages, including collecting data in memory, sorting the collected data in memory, spilling the sorted data to files, and reading the shuffle data from these spilled files.
...
Continue reading »