Kubernetes Operators revolutionize the management and deployment of applications within the Kubernetes environment.

Kubernetes Operators: A Practical Guide

What Are Kubernetes Operators?

Kubernetes Operators are a way of packaging, deploying, and managing a Kubernetes application. They are essentially a Kubernetes API extension that allows you to create, configure, and manage applications through declarative configuration. This means that you can treat your applications in much the same way as you would treat built-in Kubernetes resources.

At their core, Kubernetes Operators are controllers that extend the Kubernetes API to create, configure, and manage instances of complex stateful applications. They do this on behalf of a Kubernetes user. Think of them as an SRE agent for your application, automating common tasks such as deployment, scaling, and upgrades.

The real power of Kubernetes Operators lies in their ability to encode operational knowledge. This means they can handle much of the manual work involved in running an application, such as managing backups and updates, handling failover and recovery, and so on. This frees up our time to focus on building and improving our applications.

Kubernetes Operator Use Cases

Kubernetes Operators have a wide range of use cases, all aimed at making life easier for developers and operators. Here are some of the common use cases:

Automated Application Deployment and Management

This can involve processes such as installing an application, configuring it, setting up necessary resources, and so on. With Kubernetes Operators, all of this can be done automatically, without the need for manual intervention.

Furthermore, Kubernetes Operators also handle application updates. They can automatically roll out new versions of an application, ensuring that the update process is smooth and fault-tolerant. This significantly reduces the risk of downtime or issues during an update.

Resource Management

Resource management involves managing the resources that an application needs to run, such as CPU, memory, disk space, and network bandwidth. Kubernetes Operators can automatically adjust these resources based on the needs of the application, ensuring optimal performance.

In addition, Kubernetes Operators can also handle the allocation of persistent storage for an application, as well as manage database connections and other necessary services. This level of resource management is invaluable in a dynamic environment like Kubernetes, where resources can be scarce and need to be efficiently managed.

Self-Healing and Resilience

Kubernetes Operators can automatically detect and recover from failures, ensuring that an application remains available and operational.

For instance, if a pod crashes, the Operator can automatically restart it. If a node goes down, the Operator can reschedule the pods running on that node to other nodes. If an application needs to be scaled up to handle increased load, the Operator can automatically add more pods.

Common Kubernetes Operators

There are many Kubernetes Operators available, each designed to manage a specific type of application. Here are a few Operators you might come across.

Prometheus Operator

The Prometheus Operator is designed to manage Prometheus, an open-source monitoring and alerting toolkit. It simplifies the deployment and configuration of Prometheus servers, and allows you to define how to scrape metrics from your applications using a Kubernetes-native API.

What's great about the Prometheus Operator is that it allows you to manage your monitoring configuration in the same way as you manage your applications. This means you can use the same tools and processes for both, which simplifies operations and reduces the risk of errors.

etcd Operator

The etcd Operator is designed to manage etcd, a distributed key-value store that provides a reliable way to store data across a cluster of machines. It handles tasks such as deploying, scaling, and upgrading etcd clusters, as well as recovering from failures.

The etcd Operator encapsulates the complex operational knowledge required to run etcd. This means you don't need to be an expert in etcd to use it effectively, which is a huge win for developers.

Istio Operator

The Istio Operator is designed to manage Istio, an open-source service mesh that provides traffic management, policy enforcement, and telemetry collection. It simplifies the deployment and configuration of Istio, and allows you to manage Istio's complex configuration in a Kubernetes-native way.

What's nice about the Istio Operator is that it allows you to manage Istio using the same declarative approach as for your applications. This means you can manage your service mesh in the same way as you manage your apps, which provides a consistent and efficient operational model.

Tutorial: Using the Prometheus Operator

The code in this tutorial was shared in the Prometheus documentation.

Installing the Operator

The first step is to set up the operator’s Custom Resource Definitions (CRDs) alongside the operator itself, incorporating the required Role-Based Access Control (RBAC) resources. Execute the following shell commands to deploy the CRDs and the Prometheus Operator in the default namespace:

LATEST=$(curl -s https://api.github.com/repos/prometheus-operator/prometheus-operator/releases/latest | jq -cr .tag_name)

curl -sL https://github.com/prometheus-operator/prometheus-operator/releases/download/${LATEST}/bundle.yaml | kubectl create -f -

After initiating the deployment, the operator might take a few minutes to become fully operational. Progress can be monitored and completion verified with the command:

kubectl wait --for=condition=Ready pods -l app.kubernetes.io/name=prometheus-operator -n default

Deploying a Sample Application

To demonstrate the capabilities of the Prometheus Operator, let's deploy a sample application. This application, packaged as a Kubernetes Deployment with 3 replicas, listens on port 8080 and exposes metrics. Here is the pod manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: example-app
        image: quay.io/brancz/prometheus-example-app:v0.5.0
        ports:
        - name: web
          containerPort: 8080

Following the deployment, a Kubernetes Service is created to expose the application. This Service targets all pods tagged with the example-app label and specifies the port (8080) where metrics are available.

kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080

Subsequently, a ServiceMonitor object is created to select all Service objects with the app: example-app label. This object also carries a team: frontend label, signifying the team responsible for monitoring this application/service.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

Deploying Prometheus

Before deploying Prometheus, especially in clusters where RBAC authorization is enabled, it’s crucial to set up the appropriate RBAC rules for the Prometheus service account. This setup includes creating a service account along with the necessary ClusterRole and ClusterRoleBinding:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

With the RBAC setup complete, the next step is to define a Prometheus custom resource. This resource specifies the configuration for Prometheus, including the number of replicas, resource requests/limits, and which ServiceMonitors should be included. For our example, the Prometheus object is configured to select ServiceMonitors with the team: frontend label:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

This setup ensures that Prometheus instances are dynamically configurable based on the defined ServiceMonitors, allowing for a flexible and scalable monitoring infrastructure.

Exposing the Prometheus Service

To access Prometheus's web interface, it’s necessary to expose the Prometheus service externally. A NodePort Service is one of the simplest methods to achieve this:

apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: NodePort
  ports:
  - name: web
    nodePort: 30900
    port: 9090
    protocol: TCP
    targetPort: web
  selector:
    prometheus: prometheus

This configuration allows access to the Prometheus web server through a node's IP address on port 30900, facilitating direct interaction with the Prometheus interface and monitoring capabilities.

Conclusion

In conclusion, the Prometheus Operator significantly simplifies the deployment and management of Prometheus monitoring systems in a Kubernetes environment. By leveraging Kubernetes native constructs, it streamlines the entire process from setting up the necessary RBAC resources to deploying Prometheus instances and their dependent ServiceMonitors. This integration enhances monitoring capabilities and ensures that developers and operators can maintain high visibility into the performance and health of their applications, facilitating more proactive management and optimization of resources.


Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.

Sponsors