April 7, 2020 -
Stephan Ewen
(@stephanewen)
Today, we are announcing the release of Stateful Functions (StateFun) 2.0 — the first release of Stateful Functions as part of the Apache Flink project. This release marks a big milestone: Stateful Functions 2.0 is not only an API update, but the first version of an event-driven database that is built on Apache Flink.
Stateful Functions 2.0 makes it possible to combine StateFun’s powerful approach to state and composition with the elasticity, rapid scaling/scale-to-zero and rolling upgrade capabilities of FaaS implementations like AWS Lambda and modern resource orchestration frameworks like Kubernetes.
...
Continue reading »
March 30, 2020 -
Marta Paes
(@morsapaes)
While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog.
And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the Flink Forward Virtual Conference, on April 22-24 (see Upcoming Events).
...
Continue reading »
March 27, 2020 -
Bowen Li
(@Bowen__Li)
In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.
Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020?
We’ve came up with some for you.
Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics.
...
Continue reading »
March 24, 2020 -
Alexander Fedulov
(@alex_fedulov)
In the first article of the series, we gave a high-level description of the objectives and required functionality of a Fraud Detection engine. We also described how to make data partitioning in Apache Flink customizable based on modifiable rules instead of using a hardcoded KeysExtractor implementation.
We intentionally omitted details of how the applied rules are initialized and what possibilities exist for updating them at runtime. In this post, we will address exactly these details.
...
Continue reading »
February 22, 2020 -
Maximilian Michels
(@stadtlegende)
Markos Sfikas
(@MarkSfik)
Note: This blog post is based on the talk “Beam on Flink: How Does It Actually Work?”.
Apache Flink and Apache Beam are open-source frameworks for parallel, distributed data processing at scale. Unlike Flink, Beam does not come with a full-blown execution engine of its own but plugs into other execution engines, such as Apache Flink, Apache Spark, or Google Cloud Dataflow. In this blog post we discuss the reasons to use Flink together with Beam for your batch and stream processing needs.
...
Continue reading »
February 20, 2020 -
Seth Wiesman
(@sjwiesman)
Introduction # The recent Apache Flink 1.10 release includes many exciting features. In particular, it marks the end of the community’s year-long effort to merge in the Blink SQL contribution from Alibaba. The reason the community chose to spend so much time on the contribution is that SQL works. It allows Flink to offer a truly unified interface over batch and streaming and makes stream processing accessible to a broad audience of developers and analysts.
...
Continue reading »
February 11, 2020 -
Marta Paes
(@morsapaes)
The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).
Flink 1.10 also marks the completion of the Blink integration, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage.
...
Continue reading »
February 3, 2020 -
Kartik Khare
(@khare_khote)
Writing unit tests is one of the essential tasks of designing a production-grade application. Without tests, a single change in code can result in cascades of failure in production. Thus unit tests should be written for all types of applications, be it a simple job cleaning data and training a model or a complex multi-tenant, real-time data processing system. In the following sections, we provide a guide for unit testing of Apache Flink applications.
...
Continue reading »
January 30, 2020 -
Hequn Cheng
(@HequnC)
The Apache Flink community released the second bugfix version of the Apache Flink 1.9 series.
This release includes 117 fixes and minor improvements for Flink 1.9.1. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.9.2.
Updated Maven dependencies:
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.9.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.9.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.9.2</version> </dependency> You can find the binaries on the updated Downloads page.
...
Continue reading »
January 29, 2020 -
Seth Wiesman
(@sjwiesman)
Introduction # With stateful stream-processing becoming the norm for complex event-driven applications and real-time analytics, Apache Flink is often the backbone for running business logic and managing an organization’s most valuable asset — its data — as application state in Flink.
In order to provide a state-of-the-art experience to Flink developers, the Apache Flink community makes significant efforts to provide the safety and future-proof guarantees organizations need while managing state in Flink.
...
Continue reading »