How We Improved Scheduler Performance for Large-scale Jobs - Part One

January 4, 2022 - Zhilong Hong Zhu Zhu Daisy Tsang Till Rohrmann (@stsffap)

Introduction # When scheduling large-scale jobs in Flink 1.12, a lot of time is required to initialize jobs and deploy tasks. The scheduler also requires a large amount of heap memory in order to store the execution topology and host temporary deployment descriptors. For example, for a job with a topology that contains two vertices connected with an all-to-all edge and a parallelism of 10k (which means there are 10k source tasks and 10k sink tasks and every source task is connected to all sink tasks), Flink’s JobManager would require 30 GiB of heap memory and more than 4 minutes to deploy all of the tasks. ...

Continue reading »

How We Improved Scheduler Performance for Large-scale Jobs - Part Two

January 4, 2022 - Zhilong Hong Zhu Zhu Daisy Tsang Till Rohrmann (@stsffap)

Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations. Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. Currently, there are two distribution patterns in Flink: pointwise and all-to-all. ...

Continue reading »

Apache Flink StateFun Log4j emergency release

December 22, 2021 - Igal Shilman Seth Wiesman

The Apache Flink community has released an emergency bugfix version of Apache Flink Stateful Function 3.1.1. This release include a version upgrade of Apache Flink to 1.13.5, for log4j to address CVE-2021-44228 and CVE-2021-45046. We highly recommend all users to upgrade to the latest patch release. You can find the source and binaries on the updated Downloads page, and Docker images in the apache/flink-statefun dockerhub repository.

Continue reading »

Apache Flink Log4j emergency releases

December 16, 2021 - Chesnay Schepler

The Apache Flink community has released emergency bugfix versions of Apache Flink for the 1.11, 1.12, 1.13 and 1.14 series. These releases only include a version upgrade for Log4j to address CVE-2021-44228 and CVE-2021-45046. We highly recommend all users to upgrade to the respective patch release. You can find the source and binaries on the updated Downloads page, and Docker images in the apache/flink dockerhub repository. We are publishing this announcement earlier than usual to give users access to the updated source/binary releases as soon as possible. ...

Continue reading »

Advise on Apache Log4j Zero Day (CVE-2021-44228)

December 10, 2021 - Konstantin Knauf

Please see [this](/news/2021/12/16/log4j-patch-releases) for our updated recommendation regarding this CVE. Yesterday, a new Zero Day for Apache Log4j was reported. It is by now tracked under CVE-2021-44228. Apache Flink is bundling a version of Log4j that is affected by this vulnerability. We recommend users to follow the advisory of the Apache Log4j Community. For Apache Flink this currently translates to setting the following property in your flink-conf.yaml: env.java.opts: -Dlog4j2.formatMsgNoLookups=true If you are already setting env. ...

Continue reading »

Flink Backward - The Apache Flink Retrospective

November 3, 2021 - Johannes Moser

It has now been a month since the community released Apache Flink 1.14 into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all. A retrospective on the release cycle # From the team, we collected emotions that have been attributed to points in time of the 1. ...

Continue reading »

Sort-Based Blocking Shuffle Implementation in Flink - Part One

October 26, 2021 - Yingjie Cao (Kevin) Daisy Tsang

Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature. How data gets passed around between operators # Data shuffling is an important stage in batch processing applications and describes how data is sent from one operator to the next. In this phase, output data of the upstream operator will spill over to persistent storages like disk, then the downstream operator will read the corresponding data and process it. ...

Continue reading »

Sort-Based Blocking Shuffle Implementation in Flink - Part Two

October 26, 2021 - Yingjie Cao (Kevin) Daisy Tsang

Part one of this blog post explained the motivation behind introducing sort-based blocking shuffle, presented benchmark results, and provided guidelines on how to use this new feature. Like sort-merge shuffle implemented by other distributed data processing frameworks, the whole sort-based shuffle process in Flink consists of several important stages, including collecting data in memory, sorting the collected data in memory, spilling the sorted data to files, and reading the shuffle data from these spilled files. ...

Continue reading »

Apache Flink 1.13.3 Released

October 19, 2021 - Chesnay Schepler

The Apache Flink community released the third bugfix version of the Apache Flink 1.13 series. This release includes 136 fixes and minor improvements for Flink 1.13.2. The list below includes bugfixes and improvements. For a complete list of all changes see: JIRA. We highly recommend all users to upgrade to Flink 1.13.3. Updated Maven dependencies: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.13.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.13.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.13.3</version> </dependency> You can find the binaries on the updated Downloads page. ...

Continue reading »

Apache Flink 1.14.0 Release Announcement

September 29, 2021 - Stephan Ewen (@StephanEwen) Johannes Moser (@joemoeAT)

The Apache Software Foundation recently released its annual report and Apache Flink once again made it on the list of the top 5 most active projects! This remarkable activity also shows in the new 1.14.0 release. Once again, more than 200 contributors worked on over 1,000 issues. We are proud of how this community is consistently moving the project forward. This release brings many new features and improvements in areas such as the SQL API, more connector support, checkpointing, and PyFlink. ...

Continue reading »