One place for hosting & domains

      Health

      Use Keepalived Health Checks with BGP-based Failover


      Keepalived is one of the most commonly used applications that implements VRRP, a networking protocol that manages IP address assignment and ARP-based failover. It can be configured with additional health checks, such as checking the status of a service or running a custom script. When one of these health checks detects an issue, the instance changes to a fault state and failover is triggered. During these state transitions, additional task can be performed through custom scripts.

      The Linode platform is currently undergoing
      network infrastructure upgrades, which affects IP address assignment and failover. Once this upgrade occurs for the data center and hardware that your Compute Instances reside on, VRRP software like Keepalived can no longer directly manage failover. However, other features of Keepalived can still be used. For instance, Keepalived can continue to run health checks or VRRP scripts. It can then be configured to interact with whichever BGP daemon your system is using to manage IP address assignment and failover.

      This guide covers how to configure Keepalived with a simple health check and enable it to control
      lelastic, a BGP daemon created for the Linode platform.

      Note

      If you are migrating to BGP-based failover and currently have health checks configured with Keepalived, you can modify the steps in this guide to include your own settings.

      Configure IP Sharing and BGP Failover

      Before continuing, IP Sharing and BGP failover must be properly configured on both Compute Instances. To do this, follow the
      Configuring Failover on a Compute Instance guide, which walks you through the process of configuring failover with lelastic. If you decide to use a tool other than lelastic, you will need to make modifications to some of the commands or code examples provided in some of the following sections.

      Install and Configure Keepalived

      This section covers installing the keepalived software from your distribution’s repository. See
      Installing Keepalived on the official documentation if you prefer to install it from source.

      1. Log in to your Compute Instance over SSH. See
        Connecting to a Remote Server Over SSH for assistance.

      2. Install keepalived by following the instructions for your system’s distribution.

        Ubuntu and Debian:

        sudo apt update && sudo apt upgrade
        sudo apt install keepalived

        CentOS 8 Stream, CentOS/RHL 8 (including derivatives such as AlmaLinux 8 and Rocky Linux 8), Fedora:

        sudo dnf upgrade
        sudo dnf install keepalived

        CentOS 7:

        sudo yum update
        sudo yum install keepalived
      3. Create and edit a new keepalived configuration file.

        sudo nano /etc/keepalived/keepalived.conf
      4. Enter the following settings for your configuration into this file. Use the example below as a starting point, replacing each item below with the appropriate values for your Compute Instance. For more configuration options, see
        Configuration Options.

        • $password: A secure password to use for this keepalived configuration instance. The same password must be used for each Compute Instance you configure.

        • $ip-a: The IP address of this Compute Instance.

        • $ip-b: The IP address of the other Compute Instance.

        • $ip-shared: The Shared IP address.

        File: /etc/keepalived/keepalived.conf
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        
        vrrp_instance example_instance {
            state BACKUP
            nopreempt
            interface eth0
            virtual_router_id 10
            priority 100
            advert_int 1
            authentication {
                auth_type PASS
                auth_pass $password
            }
            unicast_src_ip $ip-a
            unicast_peer {
                $ip-b
            }
            virtual_ipaddress {
                $ip-shared/32
            }
        }

        In the above configuration file, the state is set to BACKUP and the parameter nopreempt is included. When each Compute Instance uses these settings, failover is sticky. This means the Shared IP address remains routed to a Compute Instance until it enters a FAULT state, even if it is lower priority than the other Compute Instance. If you wish to prioritize one instance over the other, remove the nopreempt parameter, set one of the Compute Instances to a MASTER state, and adjust the PRIORITY parameter as desired.

      5. Enable and start the keepalived service.

        sudo systemctl enable keepalived
        sudo systemctl start keepalived
      6. Perform these steps again on the other Compute Instance you would like to configure.

      Create the Notify Script

      Keepalived can be configured to run notification scripts when the instance changes state (such as when entering a MASTER, BACKUP ,or FAULT state). These scripts can perform any action and are commonly used to interact with a service or modify network configuration files. For this guide, the scripts are used to update a log file and start or stop the BGP daemon that controls BGP failover on your Compute Instance.

      1. Create and edit the notify script.

        sudo nano /etc/keepalived/notify.sh
        
      2. Copy and paste the following bash script into the newly created file. If you wish to control a BGP daemon other than lelastic, replace sudo systemctl restart lelastic and sudo systemctl stop lelastic with the appropriate commands for your service.

        File: /etc/keepalived/notify.sh
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        
        #!/bin/bash
        
        keepalived_log='/tmp/keepalived.state'
        function check_state {
                local state=$1
                cat << EOF >> $keepalived_log
        ===================================
        Date:  $(date +'%d-%b-%Y %H:%M:%S')
        [INFO] Now $state
        
        EOF
                if [[ "$state" == "Master" ]]; then
                        sudo systemctl restart lelastic
                else
                        sudo systemctl stop lelastic
                fi
        }
        
        function main {
                local state=$1
                case $state in
                Master)
                        check_state Master;;
                Backup)
                        check_state Backup;;
                Fault)
                        check_state Fault;;
                *)
                        echo "[ERR] Provided arguement is invalid"
                esac
        }
        main $1
      3. Make the file executable.

        sudo chmod +x /etc/keepalived/notify.sh
      4. Modify the keepalived configuration files so that the notify script is used for each state change.

        File: /etc/keepalived/keepalived.conf
        1
        2
        3
        4
        5
        6
        
        vrrp_instance example_instance {
            ...
            notify_master "/etc/keepalived/notify.sh Master"
            notify_backup "/etc/keepalived/notify.sh Backup"
            notify_fault "/etc/keepalived/notify.sh Fault"
        }
      5. Restart your BGP daemon and keepalived.

        sudo systemctl restart lelastic
        sudo systemctl restart keepalived
      6. View the log file to see if it was properly created and updated. If the notification script was successfully used, this log file should have an accurate timestamp and the current state of the instance.

        cat /tmp/keepalived.state
        ===================================
        Date:  14-Oct-2022 14:30:54
        [INFO] Now Master

      Configure the Health Check (VRRP Script)

      The next step is to configure Keepalived with a health check so that it can failover if it ever detects an issue. This is the primary reason you may want to use Keepalived alongside a BGP daemon. Keepalived can be configured to track a file (track_file), track a process (track_process), or run a custom script so that you can preform more complex health checks. When using a script, like is shown in this example, the script should return a 0 to indicate success and return any other value to indicate a failure. When a failure is detected, the state is changed to FAULT and the notify script runs.

      This guide helps you configure a custom script that detects if a file is present or not. If the file is present, the script returns a 1 to indicate a failure.

      1. Create and edit the health check script.

        sudo nano /etc/keepalived/check.sh
      2. Copy the following script and paste it into the file.

        File: /etc/keepalived/check.sh
        1
        2
        3
        4
        5
        6
        7
        8
        
        #!/bin/bash
        
        trigger='/etc/keepalived/trigger.file'
        if [ -f $trigger ]; then
          exit 1
        else
          exit 0
        fi
      3. Make the file executable.

        sudo chmod +x /etc/keepalived/failover.sh
      4. Update the keepalived configuration file to define the VRRP script and enable your VRRP instance to use the script. The interval determines how often the script is run, fall determines how many times the script must return a failure before the state is changed to FAULT, and rise determines how many times a success is returned before the instance goes back to a BACKUP or MASTER state.

        File: /etc/keepalived/keepalived.conf
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        
        vrrp_script check_for_file {
            script "/etc/keepalived/check.sh"
            interval 5
            fall 2
            rise 2
        }
        vrrp_instance example_instance {
            ...
            track_script {
                check_for_file
            }
            ...
        }
      5. Restart your BGP daemon and keepalived.

        sudo systemctl restart lelastic
        sudo systemctl restart keepalived
      6. To test this health check, create the trigger file on whichever Compute Instance is in a MASTER state.

        touch /etc/keepalived/trigger.file
      7. Check the log file on that Compute Instance to make sure it enters a FAULT state. Once it does, check the log file on the other Compute Instance to verify that it enters a MASTER state.

        tail -F /tmp/keepalived.state
        ===================================
        Date:  14-Oct-2022 14:30:54
        [INFO] Now Master

      By default, Keepalived attempts to run the scripts using a keepalived_script user. If that doesn’t exist, it uses the root user. Since running these scripts as the root user introduces many security concerns, this section discusses creating the keepalived_script user.

      1. Create a limited user account called keepalived_script. Since it is never used to log in, that feature can be disabled.

        sudo useradd -r -s /sbin/nologin -M keepalived_script
      2. Edit the sudoers file.

      3. Within this file, grant permission for the new user to restart and stop the BGP daemon. The example below uses lelastic.

        File: /etc/sudoers
        1
        2
        3
        
        # User privilege specification
        root    ALL=(ALL:ALL) ALL
        keepalived_script ALL=(ALL:ALL) NOPASSWD: /usr/bin/systemctl restart lelastic, /usr/bin/systemctl stop lelastic
      4. Update the ownership of the /etc/keepalived directory (and all of the files within it).

        sudo chown -R keepalived_script:keepalived_script /etc/keepalived
      5. Once again, edit the Keepalived configuration file and paste the following snippet to the top of that file.

        File: /etc/keepalived/keepalived.conf
        1
        2
        3
        4
        
        global_defs {
            enable_script_security
        }
        ...

      Example Configuration Files

      The links below contain complete working configuration files along with the specified example IP addresses. Please review them if you would like to see all of the recommended settings for each Compute Instance combined into a single file.

      • Shared IP: 203.0.113.57 (configured on the loopback interface)
      • Compute Instance A: 192.0.2.173 (
        keepalived.conf)
      • Compute Instance B: 198.51.100.49 (
        keepalived.conf)



      Source link

      How To Monitor Server Health with Checkmk on Ubuntu 20.04


      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      As a systems administrator, it’s best to know the current state of your infrastructure and services. Ideally, you want to notice failing disks or application downtimes before your users do. Monitoring tools like Checkmk can help administrators detect these issues and maintain healthy servers.

      Generally, monitoring software can track your servers’ hardware, uptime, and service statuses, and it can raise alerts when something goes wrong. In a real scenario, a monitoring system would alert you if any services go down. In a more robust one, the notifications would come shortly after any suspicious signs arose, such as increased memory usage or an abnormal amount of TCP connections.

      Many monitoring solutions offer varying degrees of complexity and feature sets, both free and commercial. In many cases, the installation, configuration, and management of these tools is difficult and time-consuming.

      Checkmk is a monitoring solution that is both robust and simpler to install. It is a self-contained software bundle that combines Nagios (a popular and open-source alerting service) with add-ons for gathering, monitoring, and graphing data. It also comes with Checkmk’s web interface — a comprehensive tool that addresses many of Nagios’s shortcomings. It offers a user-friendly dashboard, a full-featured notification system, and a repository of easy-to-install monitoring agents for many Linux distributions. If it weren’t for Checkmk’s web interface, we would have to use different views for various tasks. It wouldn’t be possible to configure all these features without resorting to extensive file modifications.

      In this guide, we will set up Checkmk on an Ubuntu 20.04 server and monitor two separate hosts. We will monitor the Ubuntu server itself and a separate CentOS 8 server, but we could use the same approach to add any number of additional hosts to our monitoring configuration.

      Prerequisites

      Step 1 — Installing Checkmk on Ubuntu

      In order to use our monitoring site, we first must install Checkmk on the Ubuntu server. This will give us all the tools we need. Checkmk provides official ready-to-use Ubuntu package files that we can use to install the software bundle.

      First, let’s update the packages list so that we have the most recent version of the repository listings:

      To browse the packages we can go to the package listing site. Ubuntu 20.04, among others, can be selected in the page menu.

      Now download the package:

      • wget https://download.checkmk.com/checkmk/1.6.0p20/check-mk-raw-1.6.0p20_0.focal_amd64.deb

      Then install the newly downloaded package:

      • sudo apt install -y ./check-mk-raw-1.6.0p20_0.focal_amd64.deb

      This command will install the Checkmk package along with all necessary dependencies, including the Apache web server that is used to provide web access to the monitoring interface.

      After the installation completes, we now can access the omd command. Try it out:

      This omd command will output the following:

      Output

      Usage (called as root): omd help Show general help . . . General Options: -V <version> set specific version, useful in combination with update/create omd COMMAND -h, --help show available options of COMMAND

      The omd command can manage all Checkmk instances on our server. It can start and stop all the monitoring services at once, and we will use it to create our Checkmk instance. First, however, we have to update our firewall settings to allow outside access to the default web ports.

      Step 2 — Adjusting the Firewall Settings

      Before we’ll be able to work with Checkmk, it’s necessary to allow outside access to the web server in the firewall configuration. Assuming that you followed the firewall configuration steps in the prerequisites, you’ll have a UFW firewall set up to restrict access to your server.

      During installation, Apache registers itself with UFW to provide an easy way to enable or disable access to Apache through the firewall.

      To allow access to Apache, use the following command:

      Now verify the changes:

      You’ll see that Apache is listed among the allowed services:

      Output

      Status: active To Action From -- ------ ---- OpenSSH ALLOW Anywhere Apache ALLOW Anywhere OpenSSH (v6) ALLOW Anywhere (v6) Apache (v6) ALLOW Anywhere (v6)

      This will allow us to access the Checkmk web interface.

      In the next step, we’ll create the first Checkmk monitoring instance.

      Step 3 — Creating a Checkmk Monitoring Instance

      Checkmk uses the concept of instances, or individual installations, to isolate multiple Checkmk copies on a server. In most cases, only one copy of Checkmk is enough and that’s how we will configure the software in this guide.

      First we must give our new instance a name, and we will use monitoring throughout this text. To create the instance, type:

      • sudo omd create monitoring

      The omd tool will set up everything for us automatically. The command output will look similar to the following:

      Output

      Adding /opt/omd/sites/monitoring/tmp to /etc/fstab. Creating temporary filesystem /omd/sites/monitoring/tmp...OK Restarting Apache...OK Created new site monitoring with version 1.6.0p20.cre. The site can be started with omd start monitoring. The default web UI is available at http://your_ubuntu_server/monitoring/ The admin user for the web applications is cmkadmin with password: your-default-password (It can be changed with 'htpasswd -m ~/etc/htpasswd cmkadmin' as site user.) Please do a su - monitoring for administration of this site.

      In this output the URL address, default username, and password for accessing our monitoring interface are highlighted. The instance is now created, but it still needs to be started. To start the instance, type:

      • sudo omd start monitoring

      Now all the necessary tools and services will be started at once. At the end we’ll see an output verifying that all our services have started successfully:

      Output

      Starting mkeventd...OK Starting rrdcached...OK Starting npcd...OK Starting nagios...OK Starting apache...OK Initializing Crontab...OK

      The instance is up and running.

      To access the Checkmk instance, open http://your_ubuntu_server_ip/monitoring/ in the web browser. You will be prompted for a password. Use the default credentials printed beforehand on the screen; we will change these defaults later on.

      The Checkmk screen opens with a dashboard, which shows all our services and server statuses in lists, and it uses practical graphs resembling the Earth. Straight after installation these are empty, but we will shortly make it display statuses for our services and systems.

      Blank Checkmk dashboard

      In the next step, we will change the default password to secure the site using this interface.

      Step 4 — Changing Your Administrative Password

      During installation, Checkmk generates a random password for the cmkadmin administrative user. This password is meant to be changed upon installation, and as such it is often short and not very secure. We can change this via the web interface.

      First, open the Users page from the WATO - Configuration menu on the left. The list will show all users that currently have access to the Checkmk site. On a fresh installation it will list only two users. The first one, automation, is intended for use with automated tools; the second is the cmkadmin user we used to log in to the site.

      List of Checkmk users

      Click on the pencil icon next to the cmkadmin user to change its details, including the password.

      Edit form for Checkmk admin user

      Update the password, add an admin email, and make any other desired changes.

      After saving the changes we will be asked to log in again using our new credentials. Do so and return to the dashboard, where there is one more thing we must do to fully apply our new configuration.

      Once again open the Users page from the WATO - Configuration menu on the left. The orange button in the top left corner labeled as 1 Change tells us that we have made some changes to the configuration of Checkmk, and that we need to save and activate them. This will happen every time we change the configuration of our monitoring system, not only after editing a user’s credentials. To save and activate pending changes we have to click on this button and agree to activate the listed changes using the Activate affected option on the following screen.

      List of Checkmk users after modifications
      Activate configuration changes confirmation screen
      Successfully activated configuration changes

      After activating the changes the new user’s data is written to the configuration files and it will be used by all the system’s components. Checkmk automatically takes care of notifying individual monitoring system components, reloading them when necessary, and managing all the needed configuration files.

      The Checkmk installation is now ready for use. In the next step, we will add the first host to our monitoring system.

      Step 5 — Monitoring the First Host

      We are now ready to monitor the first host. To accomplish this, we will first install check-mk-agent on the Ubuntu server. Then, we’ll restrict access to the monitoring data using xinetd.

      The components installed with Checkmk are responsible for receiving, storing, and presenting monitoring information. They do not provide the information itself.

      To gather the actual data, we will use Checkmk agent. Designed specifically for the job, Checkmk agent can monitor all vital system components at once and report that information back to the Checkmk instance.

      Installing the agent

      The first host we will monitor will be your_ubuntu_server—the server on which we have installed the Checkmk instance itself.

      To begin, we must install the Checkmk agent. Packages for all major distributions, including Ubuntu, are available directly from the web interface. Open the Monitoring Agents page from the WATO - Configuration menu on the left. You will see the available agent downloads with the most popular packages under the first section labeled Packaged agents.

      List of available packaged monitoring agents

      The package check-mk-agent_1.6.0p20-1_all.deb is the one suited for Debian based distributions, including Ubuntu. Copy the download link for that package from the web browser and use that address to download the package:

      • wget http://your_ubuntu_server_ip/monitoring/check_mk/agents/check-mk-agent_1.6.0p20-1_all.deb

      After downloading, install the package:

      • apt install -y ./check-mk-agent_1.6.0p20-1_all.deb

      Now verify the agent installation:

      The command will output a very long text that looks like gibberish but combines all vital information about the system in one place:

      Output

      <<<check_mk>>> Version: 1.6.0p20 AgentOS: linux . . . <<<job>>> <<<local>>>

      It is the output from this command that Checkmk uses to gather status data from monitored hosts. Now, we’ll restrict access to the monitoring data with xinetd.

      Restricting Access to Monitoring Data Using xinetd

      By default, the data from check_mk_agent is served using xinetd, a mechanism that outputs data on a certain network port upon accessing it. This means that we can access the check_mk_agent by using telnet to port 6556 (the default port for Checkmk) from any other computer on the internet unless our firewall configuration disallows it.

      It is not a good security policy to publish vital information about servers to anyone on the internet. We should allow only hosts that run Checkmk and are under our supervision to access this data, so that only our monitoring system can gather it.

      If you have followed the initial server setup tutorial including the steps about setting up a firewall, then access to Checkmk agent is by default blocked. It is, however, a good practice to enforce these access restrictions directly in the service configuration and not rely only on the firewall to guard it.

      To restrict access to the agent data, we have to edit the configuration file at /etc/xinetd.d/check_mk. Open the configuration file in your favorite editor. To use nano, type:

      • sudo nano /etc/xinetd.d/check_mk

      Locate this section:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      #only_from      = 127.0.0.1 10.0.20.1 10.0.20.2
      . . .
      

      The only_from setting is responsible for restricting access to certain IP addresses. Because we are now working on monitoring the same server that Checkmk is running on, it is ok to allow only localhost to connect. Uncomment and update the configuration setting to:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      only_from      = 127.0.0.1
      . . .
      

      Save and exit the file.

      The xinetd daemon has to be restarted for changes to take place. Do so now:

      • sudo systemctl restart xinetd

      Now our agent is up and running and restricted to accept only local connections. We can proceed to configure monitoring for that host using Checkmk.

      Configuring Host in Checkmk Web Interface

      First, to add a new host to monitor we have to go to the Hosts menu in the WATO - Configuration menu on the left. From here click Create new host. We will be asked for some information about the host.

      Creating a new host in Checkmk

      The Hostname is the familiar name that Checkmk will use for the monitoring. It may be a fully-qualified domain name, but it is not necessary. In this example, we will name the host monitoring, just like the name of the Checkmk instance itself. Because monitoring is not resolvable to our IP address, we also have to provide the IP address of our server. And since we are monitoring the local host, the IP will simply be 127.0.0.1. Check the IPv4 Address box to enable the manual IP input and enter the value in the text field.

      The default configuration of the Data Sources section relies on Checkmk agent to provide monitoring data, which is fine. The Networking Segment setting is used to denote hosts on remote networks, which are characterized by a higher expected latency that is not a sign of malfunction. Since this is a local host, the default setting is fine as well.

      To save the host and configure which services will be monitored, click the Save & go to services button.

      List of available services to monitor

      Checkmk will do an automatic inventory. That means it will gather the output from the agent and decipher it to know what kinds of services it can monitor. All available services for monitoring will be on the list, including CPU load, memory usage, and free space on disks.

      To enable monitoring of all discovered services, we have to click the Monitor button under the Undecided services (currently not monitored) section. This will refresh the page, but now all services will be listed under the Monitored services section, informing us that they are indeed being monitored.

      As was the case when changing our user password, these new changes must be saved and activated before they go live. Press the 2 changes button and accept the changes using the Activate affected button. After that, the host monitoring will be up and running.

      Now you are ready to work with your server data. Take a look at the main dashboard using the Overview/Main Overview menu item on the left.

      Working with Monitoring Data

      Now let’s take a look at the main dashboard using the Overview/Main Overview menu item on the left:

      Monitoring dashboard with all services healthy

      The Earth sphere is now fully green and the table says that one host is up with no problems. We can see the full host list, which now consists of a single host, in the Hosts/All hosts view (using the menu on the left).

      List of hosts with all services healthy

      There we will see how many services are in good health (shown in green), how many are failing, and how many are pending to be checked. After clicking on the hostname we will be able to see the list of all services with their full statuses and their Perf-O-Meters. Perf-O-Meter shows the performance of a single service relative to what Checkmk considers to be good health.

      Details of a host service status

      All services that return graphable data display a graph icon next to their name. We can use that icon to get access to graphs associated with the service. Since the host monitoring is fresh, there is almost nothing on the graphs—but after some time the graphs will provide valuable information on how our service performance changes over time.

      Graphs depicting CPU load on the server

      When any of these services fails or recovers, the information will be shown on the dashboard. For failing services a red error will be shown, and the problem will also be visible on the Earth graph.

      Dashboard with one host having problems

      After recovery, everything will be shown in green as working properly, but the event log on the right will contain information about past failures.

      Dashboard with one host recovered after problems

      Now that we have explored the dashboard a little, let’s add a second host to our monitoring instance.

      Step 6 — Monitoring a Second CentOS Host

      Monitoring gets really useful when you have multiple hosts. We will now add a second server to our Checkmk instance, this time running CentOS 8.

      As with our Ubuntu server, installing Checkmk agent is necessary to gather monitoring data on CentOS. This time, however, we will need an rpm package from the Monitoring Agents page in the web interface, called check-mk-agent-1.6.0p20-1.noarch.rpm.

      First, however, we must install xinetd, which by default is not available on the CentOS installation. Xinetd, we will remember, is a daemon that is responsible for making the monitoring data provided by check_mk_agent available over the network.

      On your CentOS server, first install xinetd:

      • sudo dnf install -y xinetd

      Now we can download and install the monitoring agent package needed for our CentOS server:

      • sudo dnf install -y http://your_ubuntu_server_ip/monitoring/check_mk/agents/check-mk-agent-1.6.0p20-1.noarch.rpm

      Just like before, we can verify that the agent is working properly by executing check_mk_agent:

      The output will be similar to that from the Ubuntu server. Now we will restrict access to the agent.

      Restricting Access

      This time we will not be monitoring a local host, so xinetd must allow connections coming from the Ubuntu server, where Checkmk is installed, to gather the data. To allow that, first open your configuration file:

      • sudo vi /etc/xinetd.d/check_mk

      Here you will see the configuration for your check_mk service, specifying how Checkmk agent can be accessed through the xinetd daemon. Find the following two commented lines:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      #only_from      = 127.0.0.1 10.0.20.1 10.0.20.2
      . . .
      

      Now uncomment the second line and replace the local IP addresses with your_ubuntu_server_ip:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      only_from      = your_ubuntu_server_ip
      . . .
      

      Save and exit the file by typing :x and then ENTER. Restart the xinetd service using:

      • sudo systemctl restart xinetd

      If you have configured local firewall following the initial server setup tutorial, it is also necessary to adjust the firewall settings. Without doing so, connections to Checkmk agent won’t be allowed. To do so, execute:

      • sudo firewall-cmd --add-port=6556/tcp --permanent

      This would allow incoming TCP traffic to port 6556 which is used by Checkmk. The configuration will update after you reload the firewall:

      • sudo firewall-cmd --reload

      Note: You can learn how to fine tune the firewall settings by following the How To Set Up a Firewall Using firewalld on CentOS 8 guide.

      We can now proceed to configure Checkmk to monitor our CentOS 8 host.

      Configuring the New Host in Checkmk

      To add additional hosts to Checkmk, we use the Hosts menu just like before. This time we will name the host centos, configure its IP address, and choose WAN (high-latency) under the Networking Segment select box, since the host is on another network. If we skipped this and left it as local, Checkmk would soon alert us that the host is down, since it would expect it to respond to agent queries much quicker than is possible over the internet.

      Creating second host configuration screen

      Click Save & go to services, which will show services available for monitoring on the CentOS server. The list will be very similar to the one from the first host. Again, this time we also must click Monitor and then activate the changes using the orange button on the top left corner.

      After activating the changes, we can verify that the host is monitored on the All hosts page. Go there. Two hosts, monitoring and centos, will now be visible.

      List of hosts with two hosts being monitored

      You are now monitoring an Ubuntu server and a CentOS server with Checkmk. It is possible to monitor even more hosts. In fact, there is no upper limit other than server performance, which should not be a problem until your hosts number in the hundreds. Moreover, the procedure is the same for any other host. Checkmk agents in deb and rpm packages work on Ubuntu, CentOS, and the majority of other Linux distributions.

      Conclusion

      In this guide we set up two servers with two different Linux distributions: Ubuntu and CentOS. We then installed and configured Checkmk to monitor both servers, and explored Checkmk’s powerful web interface.

      Checkmk allows for the easy setup of a complete and versatile monitoring system, which packs all the hard work of manual configuration into an easy-to-use web interface full of options and features. With these tools it is possible to monitor multiple hosts; set up email, SMS, or push notifications for problems; set up additional checks for more services; monitor accessibility and performance, and so on.

      To learn more about Checkmk, make sure to visit the official documentation.



      Source link

      How To Monitor Server Health with Checkmk on Ubuntu 18.04


      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      As a systems administrator, it’s a best practice to know the current state of your infrastructure and services. Ideally, you want to notice failing disks or application downtimes before your users do. Monitoring tools like Checkmk can help administrators detect these issues and maintain healthy servers.

      Generally, monitoring software can track your servers’ hardware, uptime, and service statuses, and it can raise alerts when something goes wrong. In a very basic scenario, a monitoring system would alert you if any services go down. In a more robust one, the notifications would come shortly after any suspicious signs arose, such as increased memory usage or an abnormal amount of TCP connections.

      There are many monitoring solutions available offering varying degrees of complexity and feature sets, both free and commercial. In many cases, the installation, configuration, and management of these tools is difficult and time-consuming.

      Checkmk, however, is a monitoring solution that is both robust and simpler to install. It is a self-contained software bundle that combines Nagios (a popular and open-source alerting service) with add-ons for gathering, monitoring, and graphing data. It also comes with Checkmk’s web interface — a comprehensive tool that addresses many of Nagios’s shortcomings. It offers a user-friendly dashboard, a full-featured notification system, and a repository of easy-to-install monitoring agents for many Linux distributions. If it weren’t for Checkmk’s web interface, we would have to use different views for different tasks and it wouldn’t be possible to configure all these features without resorting to extensive file modifications.

      In this guide we will set up Checkmk on an Ubuntu 18.04 server and monitor two separate hosts. We will monitor the Ubuntu server itself as well as a separate CentOS 7 server, but we could use the same approach to add any number of additional hosts to our monitoring configuration.

      Prerequisites

      • One Ubuntu 18.04 server with a regular, non-root user with sudo privileges. You can learn how to prepare your server by following this initial server setup tutorial.
      • One CentOS 7 server with a regular, non-root user with sudo privileges. To prepare this server you can follow this initial server setup tutorial.

      Step 1 — Installing Checkmk on Ubuntu

      In order to use our monitoring site, we first must install Checkmk on the Ubuntu server. This will give us all the tools we need. Checkmk provides official ready-to-use Ubuntu package files that we can use to install the software bundle.

      First, let’s update the packages list so that we have the most recent version of the repository listings:

      To browse the packages we can go to the package listing site. Ubuntu 18.04, among others, can be selected in the page menu.

      Now download the package:

      • wget https://checkmk.com/support/1.6.0p8/check-mk-raw-1.6.0p8_0.bionic_amd64.deb

      Then install the newly downloaded package:

      • sudo apt install -y ./check-mk-raw-1.6.0p8_0.bionic_amd64.deb

      This command will install the Checkmk package along with all necessary dependencies, including the Apache web server that is used to provide web access to the monitoring interface.

      After the installation completes, we now can access the omd command. Try it out:

      This omd command will output the following:

      Output

      Usage (called as root): omd help Show general help . . . General Options: -V <version> set specific version, useful in combination with update/create omd COMMAND -h, --help show available options of COMMAND

      The omd command can manage all Checkmk instances on our server. It can start and stop all the monitoring services at once, and we will use it to create our Checkmk instance. First, however, we have to update our firewall settings to allow outside access to the default web ports.

      Step 2 — Adjusting the Firewall Settings

      Before we’ll be able to work with Checkmk, it’s necessary to allow outside access to the web server in the firewall configuration. Assuming that you followed the firewall configuration steps in the prerequisites, you’ll have a UFW firewall set up to restrict access to your server.

      During installation, Apache registers itself with UFW to provide an easy way to enable or disable access to Apache through the firewall.

      To allow access to Apache, use the following command:

      Now verify the changes:

      You’ll see that Apache is listed among the allowed services:

      Output

      Status: active To Action From -- ------ ---- OpenSSH ALLOW Anywhere Apache ALLOW Anywhere OpenSSH (v6) ALLOW Anywhere (v6) Apache (v6) ALLOW Anywhere (v6)

      This will allow us to access the Checkmk web interface.

      In the next step, we’ll create the first Checkmk monitoring instance.

      Step 3 — Creating a Checkmk Monitoring Instance

      Checkmk uses the concept of instances, or individual installations, to isolate multiple Checkmk copies on a server. In most cases, only one copy of Checkmk is enough and that’s how we will configure the software in this guide.

      First we must give our new instance a name, and we will use monitoring throughout this text. To create the instance, type:

      • sudo omd create monitoring

      The omd tool will set up everything for us automatically. The command output will look similar to the following:

      Output

      Adding /opt/omd/sites/monitoring/tmp to /etc/fstab. Creating temporary filesystem /omd/sites/monitoring/tmp...OK Restarting Apache...OK Created new site monitoring with version 1.6.0p8.cre. The site can be started with omd start monitoring. The default web UI is available at http://your_ubuntu_server/monitoring/ The admin user for the web applications is cmkadmin with password: your-default-password (It can be changed with 'htpasswd -m ~/etc/htpasswd cmkadmin' as site user.) Please do a su - monitoring for administration of this site.

      In this output the URL address, default username, and password for accessing our monitoring interface are highlighted. The instance is now created, but it still needs to be started. To start the instance, type:

      • sudo omd start monitoring

      Now all the necessary tools and services will be started at once. At the end we we’ll see an output verifying that all our services have started successfully:

      Output

      Starting mkeventd...OK Starting rrdcached...OK Starting npcd...OK Starting nagios...OK Starting apache...OK Initializing Crontab...OK

      The instance is up and running.

      To access the Checkmk instance, open http://your_ubuntu_server_ip/monitoring/ in the web browser. You will be prompted for a password. Use the default credentials printed beforehand on the screen; we will change these defaults later on.

      The Checkmk screen opens with a dashboard, which shows all our services and server statuses in lists, and it uses practical graphs resembling the Earth. Straight after installation these are empty, but we will shortly make it display statuses for our services and systems.

      Blank Checkmk dashboard

      In the next step, we will change the default password to secure the site using this interface.

      Step 4 — Changing Your Administrative Password

      During installation, Checkmk generates a random password for the cmkadmin administrative user. This password is meant to be changed upon installation, and as such it is often short and not very secure. We can change this via the web interface.

      First, open the Users page from the WATO – Configuration menu on the left. The list will show all users that currently have access to the Checkmk site. On a fresh installation it will list only two users. The first one, automation, is intended for use with automated tools; the second is the cmkadmin user we used to log in to the site.

      List of Checkmk users

      Click on the pencil icon next to the cmkadmin user to change its details, including the password.

      Edit form for Checkmk admin user

      Update the password, add an admin email, and make any other desired changes.

      After saving the changes we will be asked to log in again using our new credentials. Do so and return to the dashboard, where there is one more thing we must do to fully apply our new configuration.

      Once again open the Users page from the WATO – Configuration menu on the left. The orange button in the top left corner labeled as 1 Change tells us that we have made some changes to the configuration of Checkmk, and that we need to save and activate them. This will happen every time we change the configuration of our monitoring system, not only after editing a user’s credentials. To save and activate pending changes we have to click on this button and agree to activate the listed changes using the Activate affected option on the following screen.

      List of Checkmk users after modifications
      Activate configuration changes confirmation screen
      Successfully activated configuration changes

      After activating the changes the new user’s data is written to the configuration files and it will be used by all the system’s components. Checkmk automatically takes care of notifying individual monitoring system components, reloading them when necessary, and managing all the needed configuration files.

      The Checkmk installation is now ready for use. In the next step, we will add the first host to our monitoring system.

      Step 5 — Monitoring the First Host

      We are now ready to monitor the first host. To accomplish this, we will first install check-mk-agent on the Ubuntu server. Then, we’ll restrict access to the monitoring data using xinetd.

      The components installed with Checkmk are responsible for receiving, storing, and presenting monitoring information. They do not provide the information itself.

      To gather the actual data, we will use Checkmk agent. Designed specifically for the job, Checkmk agent is capable of monitoring all vital system components at once and reporting that information back to the Checkmk instance.

      Installing the agent

      The first host we will monitor will be your_ubuntu_server—the server on which we have installed the Checkmk instance itself.

      To begin, we must install the Checkmk agent. Packages for all major distributions, including Ubuntu, are available directly from the web interface. Open the Monitoring Agents page from the WATO – Configuration menu on the left. You will see the available agent downloads with the most popular packages under the first section labeled Packaged agents.

      List of available packaged monitoring agents

      The package check-mk-agent_1.6.0p8-1_all.deb is the one suited for Debian based distributions, including Ubuntu. Copy the download link for that package from the web browser and use that address to download the package.

      • wget http://your_ubuntu_server_ip/monitoring/check_mk/agents/check-mk-agent_1.6.0p8-1_all.deb

      After downloading, install the package:

      • apt install -y ./check-mk-agent_1.6.0p8-1_all.deb

      Now verify that the agent has been successfully installed:

      The command will output a very long text that looks like gibberish but combines all vital information about the system in one place.

      Output

      <<<check_mk>>> Version: 1.6.0p8 AgentOS: linux . . . ["monitoring"] <<<job>>> <<<local>>>

      It is the output from this command that Checkmk uses to gather status data from monitored hosts. Now, we’ll restrict access to the monitoring data with xinetd.

      Restricting Access to Monitoring Data Using xinetd

      By default, the data from check_mk_agent is served using xinetd, a mechanism that outputs data on a certain network port upon accessing it. This means that we can access the check_mk_agent by using telnet to port 6556 (the default port for Checkmk) from any other computer on the internet unless our firewall configuration disallows it.

      It is not a good security policy to publish vital information about servers to anyone on the internet. We should allow only hosts that run Checkmk and are under our supervision to access this data, so that only our monitoring system can gather it.

      If you have followed the initial server setup tutorial including the steps about setting up a firewall, then access to Checkmk agent is by default blocked. It is, however, a good practice to enforce these access restrictions directly in the service configuration and not rely only on the firewall to guard it.

      To restrict access to the agent data, we have to edit the configuration file at /etc/xinetd.d/check_mk. Open the configuration file in your favorite editor. To use nano, type:

      • sudo nano /etc/xinetd.d/check_mk

      Locate this section:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      #only_from      = 127.0.0.1 10.0.20.1 10.0.20.2
      . . .
      

      The only_from setting is responsible for restricting access to certain IP addresses. Because we are now working on monitoring the same server that Checkmk is running on, it is ok to allow only localhost to connect. Uncomment and update the configuration setting to:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      only_from      = 127.0.0.1
      . . .
      

      Save and exit the file.

      The xinetd daemon has to be restarted for changes to take place. Do so now:

      • sudo systemctl restart xinetd

      Now our agent is up and running and restricted to accept only local connections. We can proceed to configure monitoring for that host using Checkmk.

      Configuring Host in Checkmk Web Interface

      First, to add a new host to monitor we have to go to the Hosts menu in the WATO – Configuration menu on the left. From here click Create new host. We will be asked for some information about the host.

      Creating a new host in Checkmk

      The Hostname is the familiar name that Checkmk will use for the monitoring. It may be a fully-qualified domain name, but it is not necessary. In this example, we will name the host monitoring, just like the name of the Checkmk instance itself. Because monitoring is not resolvable to our IP address, we also have to provide the IP address of our server. And since we are monitoring the local host, the IP will simply be 127.0.0.1. Check the IPv4 Address box to enable the manual IP input and enter the value in the text field.

      The default configuration of the Data Sources section relies on Checkmk agent to provide monitoring data, which is fine. The Networking Segment setting is used to denote hosts on remote networks, which are characterized by a higher expected latency that is not a sign of malfunction. Since this is a local host, the default setting is fine as well.

      To save the host and configure which services will be monitored, click the Save & go to services button.

      List of available services to monitor

      Checkmk will do an automatic inventory. That means it will gather the output from the agent and decipher it to know what kinds of services it can monitor. All available services for monitoring will be on the list, including CPU load, memory usage, and free space on disks.

      To enable monitoring of all discovered services, we have to click the Monitor button under the Undecided services (currently not monitored) section. This will refresh the page, but now all services will be listed under the Monitored services section, informing us that they are indeed being monitored.

      As was the case when changing our user password, these new changes must be saved and activated before they go live. Press the 2 changes button and accept the changes using the Activate affected button. After that, the host monitoring will be up and running.

      Now you are ready to work with your server data. Take a look at the main dashboard using the Overview/Main Overview menu item on the left.

      Working with Monitoring Data

      Now let’s take a look at the main dashboard using the Overview/Main Overview menu item on the left:

      Monitoring dashboard with all services healthy

      The Earth sphere is now fully green and the table says that one host is up with no problems. We can see the full host list, which now consists of a single host, in the Hosts/All hosts view (using the menu on the left).

      List of hosts with all services healthy

      There we will see how many services are in good health (shown in green), how many are failing, and how many are pending to be checked. After clicking on the hostname we will be able to see the list of all services with their full statuses and their Perf-O-Meters. Perf-O-Meter shows the performance of a single service relative to what Checkmk considers to be good health.

      Details of a host service status

      All services that return graphable data display a graph icon next to their name. We can use that icon to get access to graphs associated with the service. Since the host monitoring is fresh, there is almost nothing on the graphs—but after some time the graphs will provide valuable information on how our service performance changes over time.

      Graphs depicting CPU load on the server

      When any of these services fails or recovers, the information will be shown on the dashboard. For failing services a red error will be shown, and the problem will also be visible on the Earth graph.

      Dashboard with one host having problems

      After recovery, everything will be shown in green as working properly, but the event log on the right will contain information about past failures.

      Dashboard with one host recovered after problems

      Now that we have explored the dashboard a little, let’s add a second host to our monitoring instance.

      Step 6 — Monitoring a Second CentOS Host

      Monitoring gets really useful when you have multiple hosts. We will now add a second server to our Checkmk instance, this time running CentOS 7.

      As with our Ubuntu server, installing Checkmk agent is necessary to gather monitoring data on CentOS. This time, however, we will need an rpm package from the Monitoring Agents page in the web interface, called check-mk-agent-1.6.0p8-1.noarch.rpm.

      First, however, we must install xinetd, which by default is not available on the CentOS installation. Xinetd, we will remember, is a daemon that is responsible for making the monitoring data provided by check_mk_agent available over the network.

      On your CentOS server, first install xinetd:

      • sudo yum install -y xinetd

      Now we can download and install the monitoring agent package needed for our CentOS server:

      • sudo yum install -y http://your_ubuntu_server_ip/monitoring/check_mk/agents/check-mk-agent-1.6.0p8-1.noarch.rpm

      Just like before, we can verify that the agent is working properly by executing check_mk_agent:

      The output will be similar to that from the Ubuntu server. Now we will restrict access to the agent.

      Restricting Access

      This time we will not be monitoring a local host, so xinetd must allow connections coming from the Ubuntu server, where Checkmk is installed, to gather the data. To allow that, first open your configuration file:

      • sudo vi /etc/xinetd.d/check_mk

      Here you will see the configuration for your check_mk service, specifying how Checkmk agent can be accessed through the xinetd daemon. Find the following two commented lines:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      #only_from      = 127.0.0.1 10.0.20.1 10.0.20.2
      . . .
      

      Now uncomment the second line and replace the local IP addresses with your_ubuntu_server_ip:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      only_from      = your_ubuntu_server_ip
      . . .
      

      Save and exit the file by typing :x and then ENTER. Restart the xinetd service using:

      • sudo systemctl restart xinetd

      We can now proceed to configure Checkmk to monitor our CentOS 7 host.

      Configuring the New Host in Checkmk

      To add additional hosts to Checkmk, we use the Hosts menu just like before. This time we will name the host centos, configure its IP address, and choose WAN (high-latency) under the Networking Segment select box, since the host is on another network. If we skipped this and left it as local, Checkmk would soon alert us that the host is down, since it would expect it to respond to agent queries much quicker than is possible over the internet.

      Creating second host configuration screen

      Click Save & go to services, which will show services available for monitoring on the CentOS server. The list will be very similar to the one from the first host. Again, this time we also must click Monitor and then activate the changes using the orange button on the top left corner.

      After activating the changes, we can verify that the host is monitored on the All hosts page. Go there. Two hosts, monitoring and centos, will now be visible.

      List of hosts with two hosts being monitored

      You are now monitoring an Ubuntu server and a CentOS server with Checkmk. It is possible to monitor even more hosts. In fact, there is no upper limit other than server performance, which should not be a problem until your hosts number in the hundreds. Moreover, the procedure is the same for any other host. Checkmk agents in deb and rpm packages work on Ubuntu, CentOS, and the majority of other Linux distributions.

      Conclusion

      In this guide we set up two servers with two different Linux distributions: Ubuntu and CentOS. We then installed and configured Checkmk to monitor both servers, and explored Checkmk’s powerful web interface.

      Checkmk allows for the easy setup of a complete and versatile monitoring system, which packs all the hard work of manual configuration into an easy-to-use web interface full of options and features. With these tools it is possible to monitor multiple hosts; set up email, SMS, or push notifications for problems; set up additional checks for more services; monitor accessibility and performance, and so on.

      To learn more about Checkmk, make sure to visit the official documentation.



      Source link