One place for hosting & domains

      Operator

      How to Deploy Prometheus Operator and Grafana on Linode Kubernetes Engine


      Updated by Linode Written by Ben Bigger

      In this guide, you will deploy the Prometheus Operator to your Linode Kubernetes Engine (LKE) cluster using Helm, either as:

      The Prometheus Operator Monitoring Stack

      When administrating any system, effective monitoring tools can empower users to perform quick and effective issue diagnosis and resolution. This need for monitoring solutions has led to the development of several prominent open source tools designed to solve problems associated with monitoring diverse systems.

      Since its release in 2016, Prometheus has become a leading monitoring tool for containerized environments including Kubernetes. Alertmanager is often used with Prometheus to send and manage alerts with tools such as Slack. Grafana, an open source visualization tool with a robust web interface, is commonly deployed along with Prometheus to provide centralized visualization of system metrics.

      The community-supported Prometheus Operator Helm Chart provides a complete monitoring stack including each of these tools along with Node Exporter and kube-state-metrics, and is designed to provide robust Kubernetes monitoring in its default configuration.

      While there are several options for deploying the Prometheus Operator, using Helm, a Kubernetes “package manager,” to deploy the community-supported the Prometheus Operator enables you to:

      • Control the components of your monitoring stack with a single configuration file.
      • Easily manage and upgrade your deployments.
      • Utilize out-of-the-box Grafana interfaces built for Kubernetes monitoring.

      Before You Begin

      Note

      1. Deploy an LKE Cluster. This guide was written using an example node pool with three 2 GB Linodes. Depending on the workloads you will be deploying on your cluster, you may consider using Linodes with more available resources.

      2. Install Helm 3 to your local environment.

      3. Install kubectl to your local environment and connect to your cluster.

      4. Create the monitoring namespace on your LKE cluster:

        kubectl create namespace monitoring
        
      5. Create a directory named lke-monitor to store all of your Helm values and Kubernetes manifest files and move into the new directory:

        mkdir ~/lke-monitor && cd ~/lke-monitor
        
      6. Add the Google stable Helm charts repository to your Helm repos:

        helm repo add stable https://kubernetes-charts.storage.googleapis.com/
        
      7. Update your Helm repositories:

        helm repo update
        
      8. (Optional) For public access with HTTPS and basic auth configured for your web interfaces of your monitoring tools:

        • Purchase a domain name from a reliable domain registrar and configure your registrar to use Linode’s nameservers with your domain. Using Linode’s DNS Manager, create a new Domain for the one that you have purchased.

        • Ensure that htpasswd is installed to your local environment. For many systems, this tool has already been installed. Debian and Ubuntu users will have to install the apache2-utils package with the following command:

          sudo apt install apache2-utils
          

      Prometheus Operator Minimal Deployment

      In this section, you will complete a minimal deployment of the Prometheus Operator for individual/local access with kubectl Port-Forward. If you require your monitoring interfaces to be publicly accessible over the internet, you can skip to the following section on completing a Prometheus Operator Deployment with HTTPS and Basic Auth.

      Deploy Prometheus Operator

      In this section, you will create a Helm chart values file and use it to deploy Prometheus Operator to your LKE cluster.

      1. Using the text editor of your choice, create a file named values.yaml in the ~/lke-monitor directory and save it with the configurations below. Since the control plane is Linode-managed, as part of this step we will also disable metrics collection for the control plane component:

        Caution

        The below configuration will establish persistent data storage with three separate 10GiB Block Storage Volumes for Prometheus, Alertmanager, and Grafana. Because the Prometheus Operator deploys as StatefulSets, these Volumes and their associated Persistent Volume resources must be deleted manually if you later decide to tear down this Helm release.
        ~/lke-monitor/values.yaml
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        
        # Prometheus Operator Helm Chart values for Linode Kubernetes Engine minimal deployment
        prometheus:
          prometheusSpec:
            storageSpec:
              volumeClaimTemplate:
                spec:
                  storageClassName: linode-block-storage-retain
                  resources:
                    requests:
                      storage: 10Gi
        
        alertmanager:
          alertmanagerSpec:
            storage:
              volumeClaimTemplate:
                spec:
                  storageClassName: linode-block-storage-retain
                  resources:
                    requests:
                      storage: 10Gi
        
        grafana:
          persistence:
            enabled: true
            storageClassName: linode-block-storage-retain
            size: 10Gi
        
        # Disable metrics for Linode-managed Kubernetes control plane elements
        kubeEtcd:
          enabled: false
        
        kubeControllerManager:
          enabled: false
        
        kubeScheduler:
          enabled: false
            
      2. Export an environment variable to store your Grafana admin password:

        Note

        Replace prom-operator in the below command with a secure password and save the password for later reference.

        export GRAFANA_ADMINPASSWORD="prom-operator"
        
      3. Using Helm, deploy a Prometheus Operator release labeled lke-monitor in the monitoring namespace on your LKE cluster with the settings established in your values.yaml file:

        helm install 
        lke-monitor stable/prometheus-operator 
        -f ~/lke-monitor/values.yaml 
        --namespace monitoring 
        --set grafana.adminPassword=$GRAFANA_ADMINPASSWORD 
        --set prometheusOperator.createCustomResource=false
        

        Note

        You can safely ignore messages similar to manifest_sorter.go:192: info: skipping unknown hook: "crd-install" as discussed in this Github issues thread.

        Alternatively, you can add --set prometheusOperator.createCustomResource=false to the above command to prevent the message from appearing.

      4. Verify that the Prometheus Operator has been deployed to your LKE cluster and its components are running and ready by checking the pods in the monitoring namespace:

        kubectl -n monitoring get pods
        

        You should see a similar output to the following:

          
        NAME                                                     READY   STATUS    RESTARTS   AGE
        alertmanager-lke-monitor-prometheus-ope-alertmanager-0   2/2     Running   0          45s
        lke-monitor-grafana-84cbb54f98-7gqtk                     2/2     Running   0          54s
        lke-monitor-kube-state-metrics-68c56d976f-n587d          1/1     Running   0          54s
        lke-monitor-prometheus-node-exporter-6xt8m               1/1     Running   0          53s
        lke-monitor-prometheus-node-exporter-dkc27               1/1     Running   0          53s
        lke-monitor-prometheus-node-exporter-pkc65               1/1     Running   0          53s
        lke-monitor-prometheus-ope-operator-f87bc9f7c-w56sw      2/2     Running   0          54s
        prometheus-lke-monitor-prometheus-ope-prometheus-0       3/3     Running   1          35s
            
        

      Access Monitoring Interfaces with Port-Forward

      1. List the services running in the monitoring namespace and review their respective ports:

        kubectl -n monitoring get svc
        

        You should see an output similar to the following:

          
        NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                     AGE
        alertmanager-operated                     ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP  115s
        lke-monitor-grafana                       ClusterIP   10.128.140.155  <none>        80/TCP                      2m3s
        lke-monitor-kube-state-metrics            ClusterIP   10.128.165.34   <none>        8080/TCP                    2m3s
        lke-monitor-prometheus-node-exporter      ClusterIP   10.128.192.213  <none>        9100/TCP                    2m3s
        lke-monitor-prometheus-ope-alertmanager   ClusterIP   10.128.153.6    <none>        9093/TCP                    2m3s
        lke-monitor-prometheus-ope-operator       ClusterIP   10.128.198.160  <none>        8080/TCP,443/TCP            2m3s
        lke-monitor-prometheus-ope-prometheus     ClusterIP   10.128.121.47   <none>        9090/TCP                    2m3s
        prometheus-operated                       ClusterIP   None            <none>        9090/TCP                    105s
            
        

        From the above output, the resource services you will access have the corresponding ports:

        ResourceService NamePort
        Prometheuslke‑monitor‑prometheus‑ope‑prometheus9090
        Alertmanagerlke‑monitor‑prometheus‑ope‑alertmanager9093
        Grafanalke‑monitor‑grafana80
      2. Use kubectl port-forward to open a connection to a service, then access the service’s interface by entering the corresponding address in your web browser:

        Note

        Press control+C on your keyboard to terminate a port-forward process after entering any of the following commands.

        • To provide access to the Prometheus interface at the address 127.0.0.1:9090 in your web browser, enter:

          kubectl -n monitoring 
          port-forward 
          svc/lke-monitor-prometheus-ope-prometheus 
          9090
          
        • To provide access to the Alertmanager interface at the address 127.0.0.1:9093 in your web browser, enter:

          kubectl -n monitoring 
          port-forward 
          svc/lke-monitor-prometheus-ope-alertmanager  
          9093
          
        • To provide access to the Grafana interface at the address 127.0.0.1:8081 in your web browser, enter:

          kubectl -n monitoring 
          port-forward 
          svc/lke-monitor-grafana  
          8081:80
          

          Log in with the username admin and the password you exported as $GRAFANA_ADMINPASSWORD. The Grafana dashboards are accessible at Dashboards > Manage from the left navigation bar.

      Prometheus Operator Deployment with HTTPS and Basic Auth

      Note

      Before you start on this section, ensure that you have completed all of the steps in Before you Begin.

      This section will show you how to install and configure the necessary components for secure, path-based, public access to the Prometheus, Alertmanager, and Grafana interfaces using the domain you have configured for use with Linode.

      An Ingress is used to provide external routes, via HTTP or HTTPS, to your cluster’s services. An Ingress Controller, like the NGINX Ingress Controller, fulfills the requirements presented by the Ingress using a load balancer.

      To enable HTTPS on your monitoring interfaces, you will create a Transport Layer Security (TLS) certificate from the Let’s Encrypt certificate authority (CA) using the ACME protocol. This will be facilitated by cert-manager, the native Kubernetes certificate management controller.

      While the Grafana interface is natively password-protected, the Prometheus and Alertmanager interfaces must be secured by other means. This guide covers basic authentication configurations to secure the Prometheus and Alertmanager interfaces.

      If you are completing this section of the guide after completing a Prometheus Operator Minimal Deployment, you can use Helm to upgrade your release and maintain the persistent data storage for your monitoring stack.

      Install the NGINX Ingress Controller

      In this section, you will install the NGINX Ingress Controller using Helm, which will create a NodeBalancer to handle your cluster’s traffic.

      1. Install the Google stable NGINX Ingress Controller Helm chart:

        helm install nginx-ingress stable/nginx-ingress
        
      2. Access your NodeBalancer’s assigned external IP address.

        kubectl -n default get svc -o wide nginx-ingress-controller
        

        The command will return a similar output to the following:

          
        NAME                       TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE   SELECTOR
        nginx-ingress-controller   LoadBalancer   10.128.41.200   192.0.2.0      80:30889/TCP,443:32300/TCP   59s   app.kubernetes.io/component=controller,app=nginx-ingress,release=nginx-ingress
            
        
      3. Copy the IP address of the EXTERNAL IP field and navigate to Linode’s DNS Manager and create an A record using this external IP address and a hostname value corresponding to the subdomain you plan to use with your domain.

      Now that your NGINX Ingress Controller has been deployed and your domain’s A record has been updated, you are ready to enable HTTPS on your monitoring interfaces.

      Install cert-manager

      Note

      Before performing the commands in this section, ensure that your DNS has had time to propagate across the internet. You can query the status of your DNS by using the following command, substituting example.com for your domain (including a subdomain if you have configured one).

      dig +short example.com
      

      If successful, the output should return the IP address of your NodeBalancer.

      1. Install cert-manager’s CRDs.

        kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.2/cert-manager.crds.yaml
        
      2. Add the Helm repository which contains the cert-manager Helm chart.

        helm repo add jetstack https://charts.jetstack.io
        
      3. Update your Helm repositories.

        helm repo update
        
      4. Install the cert-manager Helm chart. These basic configurations should be sufficient for many use cases, however, additional cert-manager configurable parameters can be found in cert-manager’s official documentation.

        helm install 
        cert-manager jetstack/cert-manager 
        --namespace cert-manager 
        --version v0.15.2
        
      5. Verify that the corresponding cert-manager pods are running and ready.

        kubectl -n cert-manager get pods
        

        You should see a similar output:

          
        NAME                                       READY   STATUS    RESTARTS   AGE
        cert-manager-749df5b4f8-mc9nj              1/1     Running   0          19s
        cert-manager-cainjector-67b7c65dff-4fkrw   1/1     Running   0          19s
        cert-manager-webhook-7d5d8f856b-4nw9z      1/1     Running   0          19s
            
        

      Create a ClusterIssuer Resource

      Now that cert-manager is installed and running on your cluster, you will need to create a ClusterIssuer resource which defines which CA can create signed certificates when a certificate request is received. A ClusterIssuer is not a namespaced resource, so it can be used by more than one namespace.

      1. Using the text editor of your choice, create a file named acme-issuer-prod.yaml with the example configurations, replacing the value of email with your own email address for the ACME challenge:

        ~/lke-monitor/acme-issuer-prod.yaml
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        
        apiVersion: cert-manager.io/v1alpha2
        kind: ClusterIssuer
        metadata:
          name: letsencrypt-prod
        spec:
          acme:
            email: [email protected]
            server: https://acme-v02.api.letsencrypt.org/directory
            privateKeySecretRef:
              name: letsencrypt-secret-prod
            solvers:
            - http01:
                ingress:
                  class: nginx
            
        • This manifest file creates a ClusterIssuer resource that will register an account on an ACME server. The value of spec.acme.server designates Let’s Encrypt’s production ACME server, which should be trusted by most browsers.

          Note

          Let’s Encrypt provides a staging ACME server that can be used to test issuing trusted certificates, while not worrying about hitting Let’s Encrypt’s production rate limits. The staging URL is https://acme-staging-v02.api.letsencrypt.org/directory.
        • The value of privateKeySecretRef.name provides the name of a secret containing the private key for this user’s ACME server account (this is tied to the email address you provide in the manifest file). The ACME server will use this key to identify you.

        • To ensure that you own the domain for which you will create a certificate, the ACME server will issue a challenge to a client. cert-manager provides two options for solving challenges, http01 and DNS01. In this example, the http01 challenge solver will be used and it is configured in the solvers array. cert-manager will spin up challenge solver Pods to solve the issued challenges and use Ingress resources to route the challenge to the appropriate Pod.

      2. Create the ClusterIssuer resource:

        kubectl apply -f ~/lke-monitor/acme-issuer-prod.yaml
        

      Create a Certificate Resource

      After you have a ClusterIssuer resource, you can create a Certificate resource. This will describe your x509 public key certificate and will be used to automatically generate a CertificateRequest which will be sent to your ClusterIssuer.

      1. Using the text editor of your choice, create a file named certificate-prod.yaml with the example configurations:

        Note

        Replace the value of spec.dnsNames with the domain, including subdomains, that you will use to host your monitoring interfaces.

        ~/lke-monitor/certificate-prod.yaml
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        
        apiVersion: cert-manager.io/v1alpha2
        kind: Certificate
        metadata:
          name: prometheus-operator-prod
          namespace: monitoring
        spec:
          secretName: letsencrypt-secret-prod
          duration: 2160h # 90d
          renewBefore: 360h # 15d
          issuerRef:
            name: letsencrypt-prod
            kind: ClusterIssuer
          dnsNames:
          - example.com
            

        Note

        The configurations in this example create a Certificate in the monitoring namespace that is valid for 90 days and renews 15 days before expiry.

      2. Create the Certificate resource:

        kubectl apply -f ~/lke-monitor/certificate-prod.yaml
        
      3. Verify that the Certificate has been successfully issued:

        kubectl -n monitoring get certs
        

        When your certificate is ready, you should see a similar output:

          
        NAME          READY   SECRET                    AGE
        lke-monitor   True    letsencrypt-secret-prod   33s
            
        

      Next, you will create the necessary resources for basic authentication of the Prometheus and Alertmanager interfaces.

      Configure Basic Auth Credentials

      In this section, you will use htpasswd to generate credentials for basic authentication and create a Kubernetes Secret, which will then be applied to your Ingress configuration to secure access to your monitoring interfaces.

      1. Create a basic authentication password file for the user admin:

        htpasswd -c ~/lke-monitor/auth admin
        

        Follow the prompts to create a secure password, then store your password securely for future reference.

      2. Create a Kubernetes Secret for the monitoring namespace using the file you created above:

        kubectl -n monitoring create secret generic basic-auth --from-file=auth
        
      3. Verify that the basic-auth secret has been created on your LKE cluster:

        kubectl -n monitoring get secret basic-auth
        

        You should see a similar output to the following:

          
        NAME         TYPE     DATA   AGE
        basic-auth   Opaque   1      81s
            
        

      All the necessary components are now in place to be able to enable HTTPS on your monitoring interfaces. In the next section, you will complete the steps needed to deploy Prometheus Operator.

      Deploy or Upgrade Prometheus Operator

      In this section, you will create a Helm chart values file and use it to deploy Prometheus Operator to your LKE cluster.

      1. Using the text editor of your choice, create a file named values-https-basic-auth.yaml in the ~/lke-monitor directory and save it with the configurations below. Since the control plane is Linode-managed, as part of this step we will also disable metrics collection for the control plane component:

        Note

        Caution

        The below configuration will establish persistent data storage with three separate 10GiB Block Storage Volumes for Prometheus, Alertmanager, and Grafana. Because the Prometheus Operator deploys as StatefulSets, these Volumes and their associated Persistent Volume resources must be deleted manually if you later decide to tear down this Helm release.
        ~/lke-monitor/values-https-basic-auth.yaml
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76
        77
        78
        79
        80
        81
        82
        83
        84
        85
        86
        87
        88
        89
        90
        91
        92
        93
        94
        95
        
        # Helm chart values for Prometheus Operator with HTTPS and basic auth
        prometheus:
          ingress:
            enabled: true
            annotations:
              kubernetes.io/ingress.class: nginx
              nginx.ingress.kubernetes.io/rewrite-target: /$2
              cert-manager.io/cluster-issuer: letsencrypt-prod
              nginx.ingress.kubernetes.io/auth-type: basic
              nginx.ingress.kubernetes.io/auth-secret: basic-auth
              nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
            hosts:
            - example.com
            paths:
            - /prometheus(/|$)(.*)
            tls:
            - secretName: lke-monitor-tls
              hosts:
              - example.com
          prometheusSpec:
            routePrefix: /
            externalUrl: https://example.com/prometheus
            storageSpec:
              volumeClaimTemplate:
                spec:
                  storageClassName: linode-block-storage-retain
                  resources:
                    requests:
                      storage: 10Gi
        
        alertmanager:
          ingress:
            enabled: true
            annotations:
              kubernetes.io/ingress.class: nginx
              nginx.ingress.kubernetes.io/rewrite-target: /$2
              cert-manager.io/cluster-issuer: letsencrypt-prod
              nginx.ingress.kubernetes.io/auth-type: basic
              nginx.ingress.kubernetes.io/auth-secret: basic-auth
              nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
            hosts:
            - example.com
            paths:
            - /alertmanager(/|$)(.*)
            tls:
            - secretName: lke-monitor-tls
              hosts:
              - example.com
          alertmanagerSpec:
            routePrefix: /
            externalUrl: https://example.com/alertmanager
            storage:
              volumeClaimTemplate:
                spec:
                  storageClassName: linode-block-storage-retain
                  resources:
                    requests:
                      storage: 10Gi
        
        grafana:
          persistence:
            enabled: true
            storageClassName: linode-block-storage-retain
            size: 10Gi
          ingress:
            enabled: true
            annotations:
              kubernetes.io/ingress.class: nginx
              nginx.ingress.kubernetes.io/rewrite-target: /$2
              nginx.ingress.kubernetes.io/auth-type: basic
              nginx.ingress.kubernetes.io/auth-secret: basic-auth
              nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
            hosts:
            - example.com
            path: /grafana(/|$)(.*)
            tls:
            - secretName: lke-monitor-tls
              hosts:
              - example.com
          grafana.ini:
            server:
              domain: example.com
              root_url: "%(protocol)s://%(domain)s/grafana/"
              enable_gzip: "true"
        
        # Disable control plane metrics
        kubeEtcd:
          enabled: false
        
        kubeControllerManager:
          enabled: false
        
        kubeScheduler:
          enabled: false
            
      2. Export an environment variable to store your Grafana admin password:

        Note

        Replace prom-operator in the below command with a secure password and save the password for later reference.

        export GRAFANA_ADMINPASSWORD="prom-operator"
        
      3. Using Helm, deploy a Prometheus Operator release labeled lke-monitor in the monitoring namespace on your LKE cluster with the settings established in your values-https-basic-auth.yaml file:

        Note

        If you have already deployed a Prometheus Operator release, you can upgrade it by replacing helm install with helm upgrade in the below command.

        helm install 
        lke-monitor stable/prometheus-operator 
        -f ~/lke-monitor/values-https-basic-auth.yaml 
        --namespace monitoring 
        --set grafana.adminPassword=$GRAFANA_ADMINPASSWORD
        

        Once completed, you will see output similar to the following:

          
        NAME: lke-monitor
        LAST DEPLOYED: Mon Jul 27 17:03:46 2020
        NAMESPACE: monitoring
        STATUS: deployed
        REVISION: 1
        NOTES:
        The Prometheus Operator has been installed. Check its status by running:
          kubectl --namespace monitoring get pods -l "release=lke-monitor"
        
        Visit https://github.com/coreos/prometheus-operator for instructions on how
        to create & configure Alertmanager and Prometheus instances using the Operator.
        
        
      4. Verify that the Prometheus Operator has been deployed to your LKE cluster and its components are running and ready by checking the pods in the monitoring namespace:

        kubectl -n monitoring get pods
        

        You should see a similar output to the following, confirming that you are ready to access your monitoring interfaces using your domain:

          
        NAME                                                     READY   STATUS    RESTARTS   AGE
        alertmanager-lke-monitor-prometheus-ope-alertmanager-0   2/2     Running   0          45s
        lke-monitor-grafana-84cbb54f98-7gqtk                     2/2     Running   0          54s
        lke-monitor-kube-state-metrics-68c56d976f-n587d          1/1     Running   0          54s
        lke-monitor-prometheus-node-exporter-6xt8m               1/1     Running   0          53s
        lke-monitor-prometheus-node-exporter-dkc27               1/1     Running   0          53s
        lke-monitor-prometheus-node-exporter-pkc65               1/1     Running   0          53s
        lke-monitor-prometheus-ope-operator-f87bc9f7c-w56sw      2/2     Running   0          54s
        prometheus-lke-monitor-prometheus-ope-prometheus-0       3/3     Running   1          35s
            
        

      Access Monitoring Interfaces from your Domain

      Your monitoring interfaces are now publicly accessible with HTTPS and basic auth from the domain you have configured for use with this guide at the following paths:

      ResourceDomain and path
      Prometheusexample.com/prometheus
      Alertmanagerexample.com/alertmanager
      Grafanaexample.com/grafana

      When accessing an interface for the first time, log in as admin with the password you configured for basic auth credentials.

      When accessing the Grafana interface, you will then log in again as admin with the password you exported as $GRAFANA_ADMINPASSWORD on your local environment. The Grafana dashboards are accessible at Dashboards > Manage from the left navigation bar.

      More Information

      You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

      This guide is published under a CC BY-ND 4.0 license.



      Source link

      How to Set Up DigitalOcean Kubernetes Cluster Monitoring with Helm and Prometheus Operator


      Introduction

      Along with tracing and logging, monitoring and alerting are essential components of a Kubernetes observability stack. Setting up monitoring for your Kubernetes cluster allows you to track your resource usage and analyze and debug application errors.

      A monitoring system usually consists of a time-series database that houses metric data and a visualization layer. In addition, an alerting layer creates and manages alerts, handing them off to integrations and external services as necessary. Finally, one or more components generate or expose the metric data that will be stored, visualized, and processed for alerts by this monitoring stack.

      One popular monitoring solution is the open-source Prometheus, Grafana, and Alertmanager stack:

      • Prometheus is a time series database and monitoring tool that works by polling metrics endpoints and scraping and processing the data exposed by these endpoints. It allows you to query this data using PromQL, a time series data query language.
      • Grafana is a data visualization and analytics tool that allows you to build dashboards and graphs for your metrics data.
      • Alertmanager, usually deployed alongside Prometheus, forms the alerting layer of the stack, handling alerts generated by Prometheus and deduplicating, grouping, and routing them to integrations like email or PagerDuty.

      In addition, tools like kube-state-metrics and node_exporter expose cluster-level Kubernetes object metrics as well as machine-level metrics like CPU and memory usage.

      Implementing this monitoring stack on a Kubernetes cluster can be complicated, but luckily some of this complexity can be managed with the Helm package manager and CoreOS’s Prometheus Operator and kube-prometheus projects. These projects bake in standard configurations and dashboards for Prometheus and Grafana, and abstract away some of the lower-level Kubernetes object definitions. The Helm prometheus-operator chart allows you to get a full cluster monitoring solution up and running by installing Prometheus Operator and the rest of the components listed above, along with a default set of dashboards, rules, and alerts useful for monitoring Kubernetes clusters.

      In this tutorial, we will demonstrate how to install the prometheus-operator Helm chart on a DigitalOcean Kubernetes cluster. By the end of the tutorial, you will have installed a full monitoring stack into your cluster.

      Prerequisites

      To follow this tutorial, you will need:

      Step 1 — Creating a Custom Values File

      Before we install the prometheus-operator Helm chart, we’ll create a custom values file that will override some of the chart’s defaults with DigitalOcean-specific configuration parameters. To learn more about overriding default chart values, consult the Helm Install section of the Helm docs.

      To begin, create and open a file called custom-values.yaml on your local machine using nano or your favorite editor:

      Copy and paste in the following custom values, which enable persistent storage for the Prometheus, Grafana, and Alertmananger components, and disable monitoring for Kubernetes control plane components not exposed on DigitalOcean Kubernetes:

      custom-values.yaml

      # Define persistent storage for Prometheus (PVC)
      prometheus:
        prometheusSpec:
          storageSpec:
            volumeClaimTemplate:
              spec:
                accessModes: ["ReadWriteOnce"]
                storageClassName: do-block-storage
                resources:
                  requests:
                    storage: 5Gi
      
      # Define persistent storage for Grafana (PVC)
      grafana:
        # Set password for Grafana admin user
        adminPassword: your_admin_password
        persistence:
          enabled: true
          storageClassName: do-block-storage
          accessModes: ["ReadWriteOnce"]
          size: 5Gi
      
      # Define persistent storage for Alertmanager (PVC)
      alertmanager:
        alertmanagerSpec:
          storage:
            volumeClaimTemplate:
              spec:
                accessModes: ["ReadWriteOnce"]
                storageClassName: do-block-storage
                resources:
                  requests:
                    storage: 5Gi
      
      # Change default node-exporter port
      prometheus-node-exporter:
        service:
          port: 30206
          targetPort: 30206
      
      # Disable Etcd metrics
      kubeEtcd:
        enabled: false
      
      # Disable Controller metrics
      kubeControllerManager:
        enabled: false
      
      # Disable Scheduler metrics
      kubeScheduler:
        enabled: false
      

      In this file, we override some of the default values packaged with the chart in its values.yaml file.

      We first enable persistent storage for Prometheus, Grafana, and Alertmanager so that their data persists across Pod restarts. Behind the scenes, this defines a 5 Gi Persistent Volume Claim (PVC) for each component, using the DigitalOcean Block Storage storage class. You should modify the size of these PVCs to suit your monitoring storage needs. To learn more about PVCs, consult Persistent Volumes from the official Kubernetes docs.

      Next, replace your_admin_password with a secure password that you'll use to log in to the Grafana metrics dashboard with the admin user.

      We'll then configure a different port for node-exporter. Node-exporter runs on each Kubernetes node and provides OS and hardware metrics to Prometheus. We must change its default port to get around the DigitalOcean Kubernetes firewall defaults, which will block port 9100 but allow ports in the range 30000-32767. Alternatively, you can configure a custom firewall rule for node-exporter. To learn how, consult How to Configure Firewall Rules from the official DigitalOcean Cloud Firewalls docs.

      Finally, we'll disable metrics collection for three Kubernetes control plane components that do not expose metrics on DigitalOcean Kubernetes: the Kubernetes Scheduler and Controller Manager, and etcd cluster data store.

      To see the full list of configurable parameters for the prometheus-operator chart, consult the Configuration section from the chart repo README or the default values file.

      When you're done editing, save and close the file. We can now install the chart using Helm.

      Step 2 — Installing the prometheus-operator Chart

      The prometheus-operator Helm chart will install the following monitoring components into your DigitalOcean Kubernetes cluster:

      • Prometheus Operator, a Kubernetes Operator that allows you to configure and manage Prometheus clusters. Kubernetes Operators integrate domain-specific logic into the process of packaging, deploying, and managing applications with Kubernetes. To learn more about Kubernetes Operators, consult the CoreOS Operators Overview. To learn more about Prometheus Operator, consult this introductory post on the Prometheus Operator and the Prometheus Operator GitHub repo. Prometheus Operator will be installed as a Deployment.
      • Prometheus, installed as a StatefulSet.
      • Alertmanager, a service that handles alerts sent by the Prometheus server and routes them to integrations like PagerDuty or email. To learn more about Alertmanager, consult Alerting from the Prometheus docs. Alertmanager will be installed as a StatefulSet.
      • Grafana, a time series data visualization tool that allows you to visualize and create dashboards for your Prometheus metrics. Grafana will be installed as a Deployment.
      • node-exporter, a Prometheus exporter that runs on cluster nodes and provides OS and hardware metrics to Prometheus. Consult the node-exporter GitHub repo to learn more. node-exporter will be installed as a DaemonSet.
      • kube-state-metrics, an add-on agent that listens to the Kubernetes API server and generates metrics about the state of Kubernetes objects like Deployments and Pods. You can learn more by consulting the kube-state-metrics GitHub repo. kube-state-metrics will be installed as a Deployment.

      By default, along with scraping metrics generated by node-exporter, kube-state-metrics, and the other components listed above, Prometheus will be configured to scrape metrics from the following components:

      • kube-apiserver, the Kubernetes API server.
      • CoreDNS, the Kubernetes cluster DNS server.
      • kubelet, the primary node agent that interacts with kube-apiserver to manage Pods and containers on a node.
      • cAdvisor, a node agent that discovers running containers and collects their CPU, memory, filesystem, and network usage metrics.

      On your local machine, let's begin by installing the prometheus-operator Helm chart and passing in the custom values file we created above:

      • helm install --namespace monitoring --name doks-cluster-monitoring -f custom-values.yaml stable/prometheus-operator

      Here we run helm install and install all components into the monitoring namespace, which we create at the same time. This allows us to cleanly separate the monitoring stack from the rest of the Kubernetes cluster. We name the Helm release doks-cluster-monitoring and pass in the custom values file we created in Step 1. Finally, we specify that we'd like to install the prometheus-operator chart from the Helm stable directory.

      You should see the following output:

      Output

      NAME: doks-cluster-monitoring LAST DEPLOYED: Mon Apr 22 10:30:42 2019 NAMESPACE: monitoring STATUS: DEPLOYED RESOURCES: ==> v1/PersistentVolumeClaim NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE doks-cluster-monitoring-grafana Pending do-block-storage 10s ==> v1/ServiceAccount NAME SECRETS AGE doks-cluster-monitoring-grafana 1 10s doks-cluster-monitoring-kube-state-metrics 1 10s . . . ==> v1beta1/ClusterRoleBinding NAME AGE doks-cluster-monitoring-kube-state-metrics 9s psp-doks-cluster-monitoring-prometheus-node-exporter 9s NOTES: The Prometheus Operator has been installed. Check its status by running: kubectl --namespace monitoring get pods -l "release=doks-cluster-monitoring" Visit https://github.com/coreos/prometheus-operator for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

      This indicates that Prometheus Operator, Prometheus, Grafana, and the other components listed above have successfully been installed into your DigitalOcean Kubernetes cluster.

      Following the note in the helm install output, check the status of the release's Pods using kubectl get pods:

      • kubectl --namespace monitoring get pods -l "release=doks-cluster-monitoring"

      You should see the following:

      Output

      NAME READY STATUS RESTARTS AGE doks-cluster-monitoring-grafana-9d7f984c5-hxnw6 2/2 Running 0 3m36s doks-cluster-monitoring-kube-state-metrics-dd8557f6b-9rl7j 1/1 Running 0 3m36s doks-cluster-monitoring-pr-operator-9c5b76d78-9kj85 1/1 Running 0 3m36s doks-cluster-monitoring-prometheus-node-exporter-2qvxw 1/1 Running 0 3m36s doks-cluster-monitoring-prometheus-node-exporter-7brwv 1/1 Running 0 3m36s doks-cluster-monitoring-prometheus-node-exporter-jhdgz 1/1 Running 0 3m36s

      This indicates that all the monitoring components are up and running, and you can begin exploring Prometheus metrics using Grafana and its preconfigured dashboards.

      Step 3 — Accessing Grafana and Exploring Metrics Data

      The prometheus-operator Helm chart exposes Grafana as a ClusterIP Service, which means that it's only accessible via a cluster-internal IP address. To access Grafana outside of your Kubernetes cluster, you can either use kubectl patch to update the Service in place to a public-facing type like NodePort or LoadBalancer, or kubectl port-forward to forward a local port to a Grafana Pod port.

      In this tutorial we'll forward ports, but to learn more about kubectl patch and Kubernetes Service types, you can consult Update API Objects in Place Using kubectl patch and Services from the official Kubernetes docs.

      Begin by listing running Services in the monitoring namespace:

      • kubectl get svc -n monitoring

      You should see the following Services:

      Output

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 34m doks-cluster-monitoring-grafana ClusterIP 10.245.105.130 <none> 80/TCP 34m doks-cluster-monitoring-kube-state-metrics ClusterIP 10.245.140.151 <none> 8080/TCP 34m doks-cluster-monitoring-pr-alertmanager ClusterIP 10.245.197.254 <none> 9093/TCP 34m doks-cluster-monitoring-pr-operator ClusterIP 10.245.14.163 <none> 8080/TCP 34m doks-cluster-monitoring-pr-prometheus ClusterIP 10.245.201.173 <none> 9090/TCP 34m doks-cluster-monitoring-prometheus-node-exporter ClusterIP 10.245.72.218 <none> 30206/TCP 34m prometheus-operated ClusterIP None <none> 9090/TCP 34m

      We are going to forward local port 8000 to port 80 of the doks-cluster-monitoring-grafana Service, which will in turn forward to port 3000 of a running Grafana Pod. These Service and Pod ports are configured in the stable/grafana Helm chart values file:

      • kubectl port-forward -n monitoring svc/doks-cluster-monitoring-grafana 8000:80

      You should see the following output:

      Output

      Forwarding from 127.0.0.1:8000 -> 3000 Forwarding from [::1]:8000 -> 3000

      This indicates that local port 8000 is being forwarded successfully to a Grafana Pod.

      Visit http://localhost:8000 in your web browser. You should see the following Grafana login page:

      Grafana Login Page

      Enter admin as the username and the password you configured in custom-values.yaml. Then, hit Log In.

      You'll be brought to the following Home Dashboard:

      Grafana Home Page

      In the left-hand navigation bar, select the Dashboards button, then click on Manage:

      Grafana Dashboard Tab

      You'll be brought to the following dashboard management interface, which lists the dashboards installed by the prometheus-operator Helm chart:

      Grafana Dashboard List

      These dashboards are generated by kubernetes-mixin, an open-source project that allows you to create a standardized set of cluster monitoring Grafana dashboards and Prometheus alerts. To learn more, consult the Kubernetes Mixin GitHub repo.

      Click in to the Kubernetes / Nodes dashboard, which visualizes CPU, memory, disk, and network usage for a given node:

      Grafana Nodes Dashboard

      Describing each dashboard and how to use it to visualize your cluster's metrics data goes beyond the scope of this tutorial. To learn more about the USE method for analyzing a system's performance, you can consult Brendan Gregg's The Utilization Saturation and Errors (USE) Method page. Google's SRE Book is another helpful resource, in particular Chapter 6: Monitoring Distributed Systems. To learn how to build your own Grafana dashboards, check out Grafana's Getting Started page.

      In the next step, we'll follow a similar process to connect to and explore the Prometheus monitoring system.

      Step 4 — Accessing Prometheus and Alertmanager

      To connect to the Prometheus Pods, we once again have to use kubectl port-forward to forward a local port. If you’re done exploring Grafana, you can close the port-forward tunnel by hitting CTRL-C. Alternatively you can open a new shell and port-forward connection.

      Begin by listing running Services in the monitoring namespace:

      • kubectl get svc -n monitoring

      You should see the following Services:

      Output

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 34m doks-cluster-monitoring-grafana ClusterIP 10.245.105.130 <none> 80/TCP 34m doks-cluster-monitoring-kube-state-metrics ClusterIP 10.245.140.151 <none> 8080/TCP 34m doks-cluster-monitoring-pr-alertmanager ClusterIP 10.245.197.254 <none> 9093/TCP 34m doks-cluster-monitoring-pr-operator ClusterIP 10.245.14.163 <none> 8080/TCP 34m doks-cluster-monitoring-pr-prometheus ClusterIP 10.245.201.173 <none> 9090/TCP 34m doks-cluster-monitoring-prometheus-node-exporter ClusterIP 10.245.72.218 <none> 30206/TCP 34m prometheus-operated ClusterIP None <none> 9090/TCP 34m

      We are going to forward local port 9090 to port 9090 of the doks-cluster-monitoring-pr-prometheus Service:

      • kubectl port-forward -n monitoring svc/doks-cluster-monitoring-pr-prometheus 9090:9090

      You should see the following output:

      Output

      Forwarding from 127.0.0.1:9090 -> 9090 Forwarding from [::1]:9090 -> 9090

      This indicates that local port 9090 is being forwarded successfully to a Prometheus Pod.

      Visit http://localhost:9090 in your web browser. You should see the following Prometheus Graph page:

      Prometheus Graph Page

      From here you can use PromQL, the Prometheus query language, to select and aggregate time series metrics stored in its database. To learn more about PromQL, consult Querying Prometheus from the official Prometheus docs.

      In the Expression field, type machine_cpu_cores and hit Execute. You should see a list of time series with the metric machine_cpu_cores that reports the number of CPU cores on a given node. You can see which node generated the metric and which job scraped the metric in the metric labels.

      Finally, in the top navigation bar, click on Status and then Targets to see the list of targets Prometheus has been configured to scrape. You should see a list of targets corresponding to the list of monitoring endpoints described at the beginning of Step 2.

      To learn more about Promtheus and how to query your cluster metrics, consult the official Prometheus docs.

      We'll follow a similar process to connect to AlertManager, which manages Alerts generated by Prometheus. You can explore these Alerts by clicking into Alerts in the Prometheus top navigation bar.

      To connect to the Alertmanager Pods, we will once again use kubectl port-forward to forward a local port. If you’re done exploring Prometheus, you can close the port-forward tunnel by hitting CTRL-C. Alternatively you can open a new shell and port-forward connection.

      Begin by listing running Services in the monitoring namespace:

      • kubectl get svc -n monitoring

      You should see the following Services:

      Output

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 34m doks-cluster-monitoring-grafana ClusterIP 10.245.105.130 <none> 80/TCP 34m doks-cluster-monitoring-kube-state-metrics ClusterIP 10.245.140.151 <none> 8080/TCP 34m doks-cluster-monitoring-pr-alertmanager ClusterIP 10.245.197.254 <none> 9093/TCP 34m doks-cluster-monitoring-pr-operator ClusterIP 10.245.14.163 <none> 8080/TCP 34m doks-cluster-monitoring-pr-prometheus ClusterIP 10.245.201.173 <none> 9090/TCP 34m doks-cluster-monitoring-prometheus-node-exporter ClusterIP 10.245.72.218 <none> 30206/TCP 34m prometheus-operated ClusterIP None <none> 9090/TCP 34m

      We are going to forward local port 9093 to port 9093 of the doks-cluster-monitoring-pr-alertmanager Service.

      • kubectl port-forward -n monitoring svc/doks-cluster-monitoring-pr-alertmanager 9093:9093

      You should see the following output:

      Output

      Forwarding from 127.0.0.1:9093 -> 9093 Forwarding from [::1]:9093 -> 9093

      This indicates that local port 9093 is being forwarded successfully to an Alertmanager Pod.

      Visit http://localhost:9093 in your web browser. You should see the following Alertmanager Alerts page:

      Alertmanager Alerts Page

      From here, you can explore firing alerts and optionally silencing them. To learn more about Alertmanager, consult the official Alertmanager documentation.

      Conclusion

      In this tutorial, you installed a Prometheus, Grafana, and Alertmanager monitoring stack into your DigitalOcean Kubernetes cluster with a standard set of dashboards, Prometheus rules, and alerts. Since this was done using Helm, you can use helm upgrade, helm rollback, and helm delete to upgrade, roll back, or delete the monitoring stack. To learn more about these functions, consult How To Install Software on Kubernetes Clusters with the Helm Package Manager.

      The prometheus-operator chart helps you get cluster monitoring up and running quickly using Helm. You may wish to build, deploy, and configure Prometheus Operator manually. To do so, consult the Prometheus Operator and kube-prometheus GitHub repos.



      Source link