05 Jun 2022 Gyula Fora (@GyulaFora) & Yang Wang
In the last two months since our initial preview release the community has been hard at work to stabilize and improve the core Flink Kubernetes Operator logic. We are now proud to announce the first production ready release of the operator project.
The Flink Kubernetes Operator 1.0.0 version brings numerous improvements and new features to almost every aspect of the operator.
- New v1beta1 API version & compatibility guarantees
- Session Job Management support
- Support for Flink 1.13, 1.14 and 1.15
- Deployment recovery and rollback
- New Operator metrics
- Improved configuration management
- Custom validators
- Savepoint history and cleanup
New API version and compatibility guarantees
The 1.0.0 release brings a new API version: v1beta1.
Don’t let the name confuse you, we consider v1beta1 the first production ready API release, and we will maintain backward compatibility for your applications going forward.
Session Job Management
One of the most exciting new features of 1.0.0 is the introduction of the FlinkSessionJob resource. In contrast with the FlinkDeployment that allows us to manage Application and Session Clusters, the FlinkSessionJob allows users to manage Flink jobs on a running Session deployment.
This is extremely valuable in environments where users want to deploy Flink jobs quickly and iteratively and also allows cluster administrators to manage the session cluster independently of the running jobs.
apiVersion: flink.apache.org/v1beta1 kind: FlinkSessionJob metadata: name: basic-session-job-example spec: deploymentName: basic-session-cluster job: jarURI: https://repo1.maven.org/maven2/org/apache/flink/flink-examples-streaming_2.12/1.15.0/flink-examples-streaming_2.12-1.15.0-TopSpeedWindowing.jar parallelism: 4 upgradeMode: stateless
Multi-version Flink support
The Flink Kubernetes Operator now supports the following Flink versions out-of-the box:
- Flink 1.15 (Recommended)
- Flink 1.14
- Flink 1.13
Flink 1.15 comes with a set of features that allow deeper integration for the operator. We recommend using Flink 1.15 to get the best possible operational experience.
Deployment Recovery and Rollbacks
We have added two new features to make Flink cluster operations smoother when using the operator.
Now the operator will try to recover Flink JobManager deployments that went missing for some reason. Maybe it was accidentally deleted by the user or another service in the cluster. As long as HA was enabled and the job did not fatally fail, the operator will try to restore the job from the latest available checkpoint.
We also added experimental support for application upgrade rollbacks. With this feature the operator will monitor new application upgrades and if they don’t become stable (healthy & running) within a configurable period, they will be rolled back to the latest stable specification previously deployed.
While this feature will likely see improvements and new settings in the coming versions, it already provides benefits in cases where we have a large number of jobs with strong uptime requirements where it’s better to roll back than be stuck in a failing state.
Improved Operator Metrics
Beyond the existing JVM based system metrics, additional Operator specific metrics were added to the current release.
|Namespace||FlinkDeployment.Count||Number of managed FlinkDeployment instances per namespace||Gauge|
|Namespace||FlinkDeployment.<Status>.Count||Number of managed FlinkDeployment resources per <Status> per namespace. <Status> can take values from: READY, DEPLOYED_NOT_READY, DEPLOYING, MISSING, ERROR||Gauge|
|Namespace||FlinkSessionJob.Count||Number of managed FlinkSessionJob instances per namespace||Gauge|
Our intention is to advance further on the Operator Maturity Model by adding more dynamic/automatic features
- Standalone deployment mode support FLIP-225
- Auto-scaling using Horizontal Pod Autoscaler
- Dynamic change of watched namespaces
- Pluggable Status and Event reporters (Making it easier to integrate with proprietary control planes)
- SQL jobs support
The source artifacts and helm chart are now available on the updated Downloads page of the Flink website.
The official 1.0.0 release archive doubles as a Helm repository that you can easily register locally:
You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
List of Contributors
The Apache Flink community would like to thank each and every one of the contributors that have made this release possible:
Aitozi, Biao Geng, ConradJam, Fuyao Li, Gyula Fora, Jaganathan Asokan, James Busche, liuzhuo, Márton Balassi, Matyas Orhidi, Nicholas Jiang, Ted Chang, Thomas Weise, Xin Hao, Yang Wang, Zili Chen