Stateful Functions 2.0 - An Event-driven Database on Apache Flink

April 7, 2020 - Stephan Ewen (@stephanewen)

Today, we are announcing the release of Stateful Functions (StateFun) 2.0 — the first release of Stateful Functions as part of the Apache Flink project. This release marks a big milestone: Stateful Functions 2.0 is not only an API update, but the first version of an event-driven database that is built on Apache Flink. Stateful Functions 2.0 makes it possible to combine StateFun’s powerful approach to state and composition with the elasticity, rapid scaling/scale-to-zero and rolling upgrade capabilities of FaaS implementations like AWS Lambda and modern resource orchestration frameworks like Kubernetes. ...

Continue reading »

Flink Community Update - April'20

March 30, 2020 - Marta Paes (@morsapaes)

While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog. And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the Flink Forward Virtual Conference, on April 22-24 (see Upcoming Events). ...

Continue reading »

Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration

March 27, 2020 - Bowen Li (@Bowen__Li)

In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse. Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020? We’ve came up with some for you. Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics. ...

Continue reading »

Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic

March 24, 2020 - Alexander Fedulov (@alex_fedulov)

In the first article of the series, we gave a high-level description of the objectives and required functionality of a Fraud Detection engine. We also described how to make data partitioning in Apache Flink customizable based on modifiable rules instead of using a hardcoded KeysExtractor implementation. We intentionally omitted details of how the applied rules are initialized and what possibilities exist for updating them at runtime. In this post, we will address exactly these details. ...

Continue reading »

Apache Beam: How Beam Runs on Top of Flink

February 22, 2020 - Maximilian Michels (@stadtlegende) Markos Sfikas (@MarkSfik)

Note: This blog post is based on the talk “Beam on Flink: How Does It Actually Work?”. Apache Flink and Apache Beam are open-source frameworks for parallel, distributed data processing at scale. Unlike Flink, Beam does not come with a full-blown execution engine of its own but plugs into other execution engines, such as Apache Flink, Apache Spark, or Google Cloud Dataflow. In this blog post we discuss the reasons to use Flink together with Beam for your batch and stream processing needs. ...

Continue reading »

No Java Required: Configuring Sources and Sinks in SQL

February 20, 2020 - Seth Wiesman (@sjwiesman)

Introduction # The recent Apache Flink 1.10 release includes many exciting features. In particular, it marks the end of the community’s year-long effort to merge in the Blink SQL contribution from Alibaba. The reason the community chose to spend so much time on the contribution is that SQL works. It allows Flink to offer a truly unified interface over batch and streaming and makes stream processing accessible to a broad audience of developers and analysts. ...

Continue reading »

Apache Flink 1.10.0 Release Announcement

February 11, 2020 - Marta Paes (@morsapaes)

The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink). Flink 1.10 also marks the completion of the Blink integration, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage. ...

Continue reading »

A Guide for Unit Testing in Apache Flink

February 3, 2020 - Kartik Khare (@khare_khote)

Writing unit tests is one of the essential tasks of designing a production-grade application. Without tests, a single change in code can result in cascades of failure in production. Thus unit tests should be written for all types of applications, be it a simple job cleaning data and training a model or a complex multi-tenant, real-time data processing system. In the following sections, we provide a guide for unit testing of Apache Flink applications. ...

Continue reading »

Apache Flink 1.9.2 Released

January 30, 2020 - Hequn Cheng (@HequnC)

The Apache Flink community released the second bugfix version of the Apache Flink 1.9 series. This release includes 117 fixes and minor improvements for Flink 1.9.1. The list below includes a detailed list of all fixes and improvements. We highly recommend all users to upgrade to Flink 1.9.2. Updated Maven dependencies: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.9.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.9.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.9.2</version> </dependency> You can find the binaries on the updated Downloads page. ...

Continue reading »

State Unlocked: Interacting with State in Apache Flink

January 29, 2020 - Seth Wiesman (@sjwiesman)

Introduction # With stateful stream-processing becoming the norm for complex event-driven applications and real-time analytics, Apache Flink is often the backbone for running business logic and managing an organization’s most valuable asset — its data — as application state in Flink. In order to provide a state-of-the-art experience to Flink developers, the Apache Flink community makes significant efforts to provide the safety and future-proof guarantees organizations need while managing state in Flink. ...

Continue reading »