One place for hosting & domains

      Monitor

      How To Install and Configure Zabbix to Securely Monitor Remote Servers on Ubuntu 20.04


      Not using Ubuntu 20.04?


      Choose a different version or distribution.

      The author selected the Computer History Museum to receive a donation as part of the Write for DOnations program.

      Introduction

      Zabbix is open-source monitoring software for networks and applications. It offers real-time monitoring of thousands of metrics collected from servers, virtual machines, network devices, and web applications. These metrics can help you determine the current health of your IT infrastructure and detect problems with hardware or software components before customers complain. Useful information is stored in a database so you can analyze data over time and improve the quality of provided services or plan upgrades of your equipment.

      Zabbix uses several options for collecting metrics, including agentless monitoring of user services and client-server architecture. To collect server metrics, it uses a small agent on the monitored client to gather data and send it to the Zabbix server. Zabbix supports encrypted communication between the server and connected clients, so your data is protected while it travels over insecure networks.

      The Zabbix server stores its data in a relational database powered by MySQL or PostgreSQL. You can also store historical data in NoSQL databases like Elasticsearch and TimescaleDB. Zabbix provides a web interface so you can view data and configure system settings.

      In this tutorial, you will configure Zabbix on two Ubuntu 20.04 machines. One will be configured as the Zabbix server, and the other as a client that you’ll monitor. The Zabbix server will use a MySQL database to record monitoring data and use Nginx to serve the web interface.

      Prerequisites

      To follow this tutorial, you will need:

      • Two Ubuntu 20.04 servers set up by following the Initial Server Setup Guide for Ubuntu 20.04, including a non-root user with sudo privileges and a firewall configured with ufw. On one server, you will install Zabbix; this tutorial will refer to this as the Zabbix server. It will monitor your second server; this second server will be referred to as the second Ubuntu server.

      • The server that will run the Zabbix server needs Nginx, MySQL, and PHP installed. Follow Steps 1–3 of our Ubuntu 20.04 LEMP Stack guide to configure those on your Zabbix server.

      • A registered domain name. This tutorial will use your_domain throughout. You can purchase a domain name from Namecheap, get one for free with Freenom, or use the domain registrar of your choice.

      • Both of the following DNS records set up for your Zabbix server. If you are using DigitalOcean, please see our DNS documentation for details on how to add them.

        • An A record with your_domain pointing to your Zabbix server’s public IP address.
        • An A record with www.your_domain pointing to your Zabbix server’s public IP address.

      Additionally, because the Zabbix Server is used to access valuable information about your infrastructure that you would not want unauthorized users to access, it’s important that you keep your server secure by installing a TLS/SSL certificate. This is optional but strongly encouraged. If you would like to secure your server, follow the Let’s Encrypt on Ubuntu 20.04 guide after Step 3 of this tutorial.

      Step 1 — Installing the Zabbix Server

      First, you need to install Zabbix on the server where you installed MySQL, Nginx, and PHP. Log in to this machine as your non-root user:

      • ssh sammy@zabbix_server_ip_address

      Zabbix is available in Ubuntu’s package manager, but it’s outdated, so use the official Zabbix repository to install the latest stable version. Download and install the repository configuration package:

      • wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1+focal_all.deb
      • sudo dpkg -i zabbix-release_5.0-1+focal_all.deb

      You will see the following output:

      Output

      Selecting previously unselected package zabbix-release. (Reading database ... 64058 files and directories currently installed.) Preparing to unpack zabbix-release_5.0-1+focal_all.deb ... Unpacking zabbix-release (1:5.0-1+focal) ... Setting up zabbix-release (1:5.0-1+focal) ...

      Update the package index so the new repository is included:

      Then install the Zabbix server and web frontend with MySQL database support:

      • sudo apt install zabbix-server-mysql zabbix-frontend-php

      Also, install the Zabbix agent, which will let you collect data about the Zabbix server status itself.

      • sudo apt install zabbix-agent

      Before you can use Zabbix, you have to set up a database to hold the data that the Zabbix server will collect from its agents. You can do this in the next step.

      Step 2 — Configuring the MySQL Database for Zabbix

      You need to create a new MySQL database and populate it with some basic information in order to make it suitable for Zabbix. You’ll also create a specific user for this database so Zabbix isn’t logging in to MySQL with the root account.

      Log in to MySQL as the root user:

      Create the Zabbix database with UTF-8 character support:

      • create database zabbix character set utf8 collate utf8_bin;

      Then create a user that the Zabbix server will use, give it access to the new database, and set the password for the user:

      • create user zabbix@localhost identified by 'your_zabbix_mysql_password';
      • grant all privileges on zabbix.* to zabbix@localhost;

      That takes care of the user and the database. Exit out of the database console.

      Next you have to import the initial schema and data. The Zabbix installation provided you with a file that sets this up.

      Run the following command to set up the schema and import the data into the zabbix database. Use zcat since the data in the file is compressed:

      • zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -uzabbix -p zabbix

      Enter the password for the zabbix MySQL user that you configured when prompted.

      This command may take a minute or two to execute. If you see the error ERROR 1045 (28000): Access denied for userzabbix@'localhost' (using password: YES) then make sure you used the right password for the zabbix user.

      In order for the Zabbix server to use this database, you need to set the database password in the Zabbix server configuration file. Open the configuration file in your preferred text editor. This tutorial will use nano:

      • sudo nano /etc/zabbix/zabbix_server.conf

      Look for the following section of the file:

      /etc/zabbix/zabbix_server.conf

      ...
      ### Option: DBPassword                           
      #       Database password. Ignored for SQLite.   
      #       Comment this line if no password is used.
      #                                                
      # Mandatory: no                                  
      # Default:                                       
      # DBPassword=
      ...
      

      These comments in the file explain how to connect to the database. You need to set the DBPassword value in the file to the password for your database user. Add this line after those comments to configure the database:

      /etc/zabbix/zabbix_server.conf

      ...
      DBPassword=your_zabbix_mysql_password
      ...
      

      Save and close zabbix_server.conf by pressing CTRL+X, followed by Y and then ENTER if you’re using nano.

      You’ve now configured the Zabbix server to connect to the database. Next, you will configure the Nginx web server to serve the Zabbix frontend.

      Step 3 — Configuring Nginx for Zabbix

      To configure Nginx automatically, install the automatic configuration package:

      • sudo apt install zabbix-nginx-conf

      As a result, you will get the configuration file /etc/zabbix/nginx.conf, as well as a link to it in the Nginx configuration directory /etc/nginx/conf.d/zabbix.conf.

      Next, you need to make changes to this file. Open the configuration file:

      • sudo nano /etc/zabbix/nginx.conf

      The file contains an automatically generated Nginx server block configuration. It contains two lines that determine the server name and what port it is listening on:

      /etc/zabbix/nginx.conf

      server {
      #        listen          80;
      #        server_name     example.com;
      ...
      

      Uncomment the two lines, and replace example.com with your domain name. Your settings will look like this:

      /etc/zabbix/nginx.conf

      server {
              listen          80;
              server_name     your_domain;
      ...
      

      Save and close the file. Next, test to make sure that there are no syntax errors in any of your Nginx files and reload the configuration:

      • sudo nginx -t
      • sudo nginx -s reload

      Now that Nginx is set up to serve the Zabbix frontend, you will make some modifications to your PHP setup in order for the Zabbix web interface to work properly.

      Note: As mentioned in the Prerequisites section, it is recommended that you enable SSL/TLS on your server. If you would like to do this, follow our Ubuntu 20.04 Let’s Encrypt tutorial before you move on to Step 4 to obtain a free SSL certificate for Nginx. This process will automatically detect your Zabbix server block and configure it for HTTPS. After obtaining your SSL/TLS certificates, you can come back and complete this tutorial.

      Step 4 — Configuring PHP for Zabbix

      The Zabbix web interface is written in PHP and requires some special PHP server settings. The Zabbix installation process created a PHP-FPM configuration file that contains these settings. It is located in the directory /etc/zabbix and is loaded automatically by PHP-FPM. You need to make a small change to this file, so open it up with the following:

      • sudo nano /etc/zabbix/php-fpm.conf

      The file contains PHP settings that meet the necessary requirements for the Zabbix web interface. However, the timezone setting is commented out by default. To make sure that Zabbix uses the correct time, you need to set the appropriate timezone:

      /etc/zabbix/php-fpm.conf

      ...
      php_value[max_execution_time] = 300
      php_value[memory_limit] = 128M
      php_value[post_max_size] = 16M
      php_value[upload_max_filesize] = 2M
      php_value[max_input_time] = 300
      php_value[max_input_vars] = 10000
      ; php_value[date.timezone] = Europe/Riga
      

      Uncomment the timezone line highlighted in the preceding code block and change it to your timezone. You can use this list of supported time zones to find the right one for you. Then save and close the file.

      Now restart PHP-FPM to apply these new settings:

      • sudo systemctl restart php7.4-fpm.service

      You can now start the Zabbix server:

      • sudo systemctl start zabbix-server

      Then check whether the Zabbix server is running properly:

      • sudo systemctl status zabbix-server

      You will see the following status:

      Output

      ● zabbix-server.service - Zabbix Server Loaded: loaded (/lib/systemd/system/zabbix-server.service; disabled; vendor preset: enabled) Active: active (running) since Fri 2020-06-12 05:59:32 UTC; 36s ago Process: 27026 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS) ...

      Finally, enable the server to start at boot time:

      • sudo systemctl enable zabbix-server

      The server is set up and connected to the database. Next, set up the web frontend.

      Step 5 — Configuring Settings for the Zabbix Web Interface

      The web interface lets you see reports and add hosts that you want to monitor, but it needs some initial setup before you can use it. Launch your browser and go to the address http://zabbix_server_name or https://zabbix_server_name if you set up Let’s Encrypt. On the first screen, you will see a welcome message. Click Next step to continue.

      On the next screen, you will see the table that lists all of the prerequisites to run Zabbix.

      Prerequisites

      All of the values in this table must be OK, so verify that they are. Be sure to scroll down and look at all of the prerequisites. Once you’ve verified that everything is ready to go, click Next step to proceed.

      The next screen asks for database connection information.

      DB Connection

      You told the Zabbix server about your database, but the Zabbix web interface also needs access to the database to manage hosts and read data. Therefore enter the MySQL credentials you configured in Step 2. Click Next step to proceed.

      On the next screen, you can leave the options at their default values.

      Zabbix Server Details

      The Name is optional; it is used in the web interface to distinguish one server from another in case you have several monitoring servers. Click Next step to proceed.

      The next screen will show the pre-installation summary so you can confirm everything is correct.

      Summary

      Click Next step to proceed to the final screen.

      The web interface setup is now complete. This process creates the configuration file /usr/share/zabbix/conf/zabbix.conf.php, which you could back up and use in the future. Click Finish to proceed to the login screen. The default user is Admin and the password is zabbix.

      Before you log in, set up the Zabbix agent on your second Ubuntu server.

      Step 6 — Installing and Configuring the Zabbix Agent

      Now you need to configure the agent software that will send monitoring data to the Zabbix server.

      Log in to the second Ubuntu server:

      • ssh sammy@second_ubuntu_server_ip_address

      Just like on the Zabbix server, run the following commands to install the repository configuration package:

      • wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1+focal_all.deb
      • sudo dpkg -i zabbix-release_5.0-1+focal_all.deb

      Next, update the package index:

      Then install the Zabbix agent:

      • sudo apt install zabbix-agent

      While Zabbix supports certificate-based encryption, setting up a certificate authority is beyond the scope of this tutorial. But you can use pre-shared keys (PSK) to secure the connection between the server and agent.

      First, generate a PSK:

      • sudo sh -c "openssl rand -hex 32 > /etc/zabbix/zabbix_agentd.psk"

      Show the key by using cat so you can copy it somewhere:

      • cat /etc/zabbix/zabbix_agentd.psk

      The key will look something like this:

      Output

      75ad6cb5e17d244ac8c00c96a1b074d0550b8e7b15d0ab3cde60cd79af280fca

      Save this for later; you will need it to configure the host.

      Now edit the Zabbix agent settings to set up its secure connection to the Zabbix server. Open the agent configuration file in your text editor:

      • sudo nano /etc/zabbix/zabbix_agentd.conf

      Each setting within this file is documented via informative comments throughout the file, but you only need to edit some of them.

      First you have to edit the IP address of the Zabbix server. Find the following section:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: Server
      #       List of comma delimited IP addresses, optionally in CIDR notation, or DNS names of Zabbix servers and Zabbix proxies.
      #       Incoming connections will be accepted only from the hosts listed here.
      #       If IPv6 support is enabled then '127.0.0.1', '::127.0.0.1', '::ffff:127.0.0.1' are treated equally
      #       and '::/0' will allow any IPv4 or IPv6 address.
      #       '0.0.0.0/0' can be used to allow any IPv4 address.
      #       Example: Server=127.0.0.1,192.168.1.0/24,::1,2001:db8::/32,zabbix.example.com
      #
      # Mandatory: yes, if StartAgents is not explicitly set to 0
      # Default:
      # Server=
      
      Server=127.0.0.1
      ...
      

      Change the default value to the IP of your Zabbix server:

      /etc/zabbix/zabbix_agentd.conf

      ...
      Server=zabbix_server_ip_address
      ...
      

      By default, Zabbix server connects to the agent. But for some checks (for example, monitoring the logs), a reverse connection is required. For correct operation, you need to specify the Zabbix server address and a unique host name.

      Find the section that configures the active checks and change the default values:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ##### Active checks related
      
      ### Option: ServerActive
      #       List of comma delimited IP:port (or DNS name:port) pairs of Zabbix servers and Zabbix proxies for active checks.
      #       If port is not specified, default port is used.
      #       IPv6 addresses must be enclosed in square brackets if port for that host is specified.
      #       If port is not specified, square brackets for IPv6 addresses are optional.
      #       If this parameter is not specified, active checks are disabled.
      #       Example: ServerActive=127.0.0.1:20051,zabbix.domain,[::1]:30051,::1,[12fc::1]
      #
      # Mandatory: no
      # Default:
      # ServerActive=
      
      ServerActive=zabbix_server_ip_address
      
      ### Option: Hostname
      #       Unique, case sensitive hostname.
      #       Required for active checks and must match hostname as configured on the server.
      #       Value is acquired from HostnameItem if undefined.
      #
      # Mandatory: no
      # Default:
      # Hostname=
      
      Hostname=Second Ubuntu Server
      ...
      

      Next, find the section that configures the secure connection to the Zabbix server and enable pre-shared key support. Find the TLSConnect section, which looks like this:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: TLSConnect
      #       How the agent should connect to server or proxy. Used for active checks.
      #       Only one value can be specified:
      #               unencrypted - connect without encryption
      #               psk         - connect using TLS and a pre-shared key
      #               cert        - connect using TLS and a certificate
      #
      # Mandatory: yes, if TLS certificate or PSK parameters are defined (even for 'unencrypted' connection)
      # Default:
      # TLSConnect=unencrypted
      ...
      

      Then add this line to configure pre-shared key support:

      /etc/zabbix/zabbix_agentd.conf

      ...
      TLSConnect=psk
      ...
      

      Next, locate the TLSAccept section, which looks like this:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: TLSAccept
      #       What incoming connections to accept.
      #       Multiple values can be specified, separated by comma:
      #               unencrypted - accept connections without encryption
      #               psk         - accept connections secured with TLS and a pre-shared key
      #               cert        - accept connections secured with TLS and a certificate
      #
      # Mandatory: yes, if TLS certificate or PSK parameters are defined (even for 'unencrypted' connection)
      # Default:
      # TLSAccept=unencrypted
      ...
      

      Configure incoming connections to support pre-shared keys by adding this line:

      /etc/zabbix/zabbix_agentd.conf

      ...
      TLSAccept=psk
      ...
      

      Next, find the TLSPSKIdentity section, which looks like this:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: TLSPSKIdentity
      #       Unique, case sensitive string used to identify the pre-shared key.
      #
      # Mandatory: no
      # Default:
      # TLSPSKIdentity=
      ...
      

      Choose a unique name to identify your pre-shared key by adding this line:

      /etc/zabbix/zabbix_agentd.conf

      ...
      TLSPSKIdentity=PSK 001
      ...
      

      You’ll use this as the PSK ID when you add your host through the Zabbix web interface.

      Then set the option that points to your previously created pre-shared key. Locate the TLSPSKFile option:

      /etc/zabbix/zabbix_agentd.conf

      ...
      ### Option: TLSPSKFile
      #       Full pathname of a file containing the pre-shared key.
      #
      # Mandatory: no
      # Default:
      # TLSPSKFile=
      ...
      

      Add this line to point the Zabbix agent to your PSK file you created:

      /etc/zabbix/zabbix_agentd.conf

      ...
      TLSPSKFile=/etc/zabbix/zabbix_agentd.psk
      ...
      

      Save and close the file. Now you can restart the Zabbix agent and set it to start at boot time:

      • sudo systemctl restart zabbix-agent
      • sudo systemctl enable zabbix-agent

      For good measure, check that the Zabbix agent is running properly:

      • sudo systemctl status zabbix-agent

      You will see the following status, indicating the agent is running:

      Output

      ● zabbix-agent.service - Zabbix Agent Loaded: loaded (/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2020-06-12 08:19:54 UTC; 25s ago ...

      The agent will listen on port 10050 for connections from the server. Configure UFW to allow connections to this port:

      You can learn more about UFW in How To Set Up a Firewall with UFW on Ubuntu 20.04.

      Your agent is now ready to send data to the Zabbix server. But in order to use it, you have to link to it from the server’s web console. In the next step, you will complete the configuration.

      Step 7 — Adding the New Host to the Zabbix Server

      Installing an agent on a server you want to monitor is only half of the process. Each host you want to monitor needs to be registered on the Zabbix server, which you can do through the web interface.

      Log in to the Zabbix Server web interface by navigating to the address http://zabbix_server_name or https://zabbix_server_name:

      The Zabbix login screen

      When you have logged in, click on Configuration and then Hosts in the left navigation bar. Then click the Create host button in the top right corner of the screen. This will open the host configuration page.

      Creating a host

      Adjust the Host name and IP address to reflect the host name and IP address of your second Ubuntu server, then add the host to a group. You can select an existing group, for example Linux servers, or create your own group. The host can be in multiple groups. To do this, enter the name of an existing or new group in the Groups field and select the desired value from the proposed list.

      Before adding the group, click the Templates tab.

      Adding a template to the host

      Type Template OS Linux by Zabbix agent in the Search field and then select it from the list to add this template to the host.

      Next, navigate to the Encryption tab. Select PSK for both Connections to host and Connections from host. Then set PSK identity to PSK 001, which is the value of the TLSPSKIdentity setting of the Zabbix agent you configured previously. Then set PSK value to the key you generated for the Zabbix agent. It’s the one stored in the file /etc/zabbix/zabbix_agentd.psk on the agent machine.

      Setting up the encryption

      Finally, click the Add button at the bottom of the form to create the host.

      You will see your new host in the list. Wait for a minute and reload the page to see green labels indicating that everything is working fine and the connection is encrypted.

      Zabbix shows your new host

      If you have additional servers you need to monitor, log in to each host, install the Zabbix agent, generate a PSK, configure the agent, and add the host to the web interface following the same steps you followed to add your first host.

      The Zabbix server is now monitoring your second Ubuntu server. Now, set up email notifications to be notified about problems.

      Step 8 — Configuring Email Notifications

      Zabbix automatically supports many types of notifications: email, OTRS, Slack, Telegram, SMS, etc. You can see the full list of integrations at the Zabbix website.

      As an example, this tutorial will configure notifications for the Email media type.

      Click on Administration, and then Media types in the left navigation bar. You will see the list of all media types. There are two preconfigured options for emails: for the plain text notification and for the HTML notifications. In this tutorial you will use plain text notification. Click on Email.

      Adjust the SMTP options according to the settings provided by your email service. This tutorial uses Gmail’s SMTP capabilities to set up email notifications; if you would like more information about setting this up, see How To Use Google’s SMTP Server.


      Note: If you use 2-Step Verification with Gmail, you need to generate an App Password for Zabbix. You’ll only have to enter an App password once during setup. You will find instructions on how to generate this password in the Google Help Center.

      If you are using Gmail, type in smtp.gmail.com for the SMTP server field, 465 for the SMTP server port field, gmail.com for SMTP helo, and your email for SMTP email. Then choose SSL/TLS for Connection security and Username and password for Authentication. Enter your Gmail address as the Username, and the App Password you generated from your Google account as the Password.

      Setting up email media type

      On the Message templates tab you can see the list of predefined messages for various types of notifications. Finally, click the Update button at the bottom of the form to update the email parameters.

      Now you can test sending notifications. To do this, click the Test underlined link in the corresponding line.

      You will see a pop-up window. Enter your email address in the Send to field and click the Test button. You will see a message about the successful sending and you will receive a test message.

      Testing email

      Close the pop-up by clicking the Cancel button.

      Now, create a new user. Click on Administration, and then Users in the left navigation bar. You will see the list of users. Then click the Create user button in the top right corner of the screen. This will open the user configuration page:

      Creating a user

      Enter the new username in the Alias field and set up a new password. Next, add the user to the administrator’s group. Type Zabbix administrators in the Groups field and select it from the proposed list.

      Once you’ve added the group, click the Media tab and click on the Add underlined link (not the Add button below it). You will see a pop-up window.

      Adding an email

      Select the Email option from the Type drop down. Enter your email address in the Send to field. You can leave the rest of the options at the default values. Click the Add button at the bottom to submit.

      Now navigate to the Permissions tab. Select Zabbix Super Admin from the User type drop-down menu.

      Finally, click the Add button at the bottom of the form to create the user.

      Note: Using the default password is not safe. In order to change the password of the built-in user Admin click on the alias in the list of users. Then click Change password, enter a new password, and confirm the changes by clicking Update button.

      Now you need to enable notifications. Click on the Configuration tab and then Actions in the left navigation bar. You will see a pre-configured action, which is responsible for sending notifications to all Zabbix administrators. You can review and change the settings by clicking on its name. For the purposes of this tutorial, use the default parameters. To enable the action, click on the red Disabled link in the Status column.

      Now you are ready to receive alerts. In the next step, you will generate one to test your notification setup.

      Step 9 — Generating a Test Alert

      In this step, you will generate a test alert to ensure everything is connected. By default, Zabbix keeps track of the amount of free disk space on your server. It automatically detects all disk mounts and adds the corresponding checks. This discovery is executed every hour, so you need to wait a while for the notification to be triggered.

      Create a temporary file that’s large enough to trigger Zabbix’s file system usage alert. To do this, log in to your second Ubuntu server if you’re not already connected:

      • ssh sammy@second_ubuntu_server_ip_address

      Next, determine how much free space you have on the server. You can use the df command to find out:

      The command df will report the disk space usage of your file system, and the -h will make the output human-readable. You’ll see output like the following:

      Output

      Filesystem Size Used Avail Use% Mounted on /dev/vda1 78G 1.4G 77G 2% /

      In this case, the free space is 77G. Your free space may differ.

      Use the fallocate command, which allows you to pre-allocate or de-allocate space to a file, to create a file that takes up more than 80% of the available disk space. This will be enough to trigger the alert:

      • fallocate -l 70G /tmp/temp.img

      After around an hour, Zabbix will trigger an alert about the amount of free disk space and will run the action you configured, sending the notification message. You can check your inbox for the message from the Zabbix server. You will see a message like:

      Problem started at 09:49:08 on 2020.06.12
      Problem name: /: Disk space is low (used > 80%)
      Host: Second Ubuntu Server
      Severity: Warning
      Operational data: Space used: 71.34 GB of 77.36 GB (92.23 %)
      Original problem ID: 106
      

      You can also navigate to the Monitoring tab and then Dashboard to see the notification and its details.

      Main dashboard

      Now that you know the alerts are working, delete the temporary file you created so you can reclaim your disk space:

      After a minute Zabbix will send the recovery message and the alert will disappear from the main dashboard.

      Conclusion

      In this tutorial, you learned how to set up a simple and secure monitoring solution that will help you monitor the state of your servers. It can now warn you of problems, and you have the opportunity to analyze the processes occurring in your IT infrastructure.

      To learn more about setting up monitoring infrastructure, check out our Monitoring topic page.



      Source link

      How To Monitor Server Health with Checkmk on Ubuntu 18.04


      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      As a systems administrator, it’s a best practice to know the current state of your infrastructure and services. Ideally, you want to notice failing disks or application downtimes before your users do. Monitoring tools like Checkmk can help administrators detect these issues and maintain healthy servers.

      Generally, monitoring software can track your servers’ hardware, uptime, and service statuses, and it can raise alerts when something goes wrong. In a very basic scenario, a monitoring system would alert you if any services go down. In a more robust one, the notifications would come shortly after any suspicious signs arose, such as increased memory usage or an abnormal amount of TCP connections.

      There are many monitoring solutions available offering varying degrees of complexity and feature sets, both free and commercial. In many cases, the installation, configuration, and management of these tools is difficult and time-consuming.

      Checkmk, however, is a monitoring solution that is both robust and simpler to install. It is a self-contained software bundle that combines Nagios (a popular and open-source alerting service) with add-ons for gathering, monitoring, and graphing data. It also comes with Checkmk’s web interface — a comprehensive tool that addresses many of Nagios’s shortcomings. It offers a user-friendly dashboard, a full-featured notification system, and a repository of easy-to-install monitoring agents for many Linux distributions. If it weren’t for Checkmk’s web interface, we would have to use different views for different tasks and it wouldn’t be possible to configure all these features without resorting to extensive file modifications.

      In this guide we will set up Checkmk on an Ubuntu 18.04 server and monitor two separate hosts. We will monitor the Ubuntu server itself as well as a separate CentOS 7 server, but we could use the same approach to add any number of additional hosts to our monitoring configuration.

      Prerequisites

      • One Ubuntu 18.04 server with a regular, non-root user with sudo privileges. You can learn how to prepare your server by following this initial server setup tutorial.
      • One CentOS 7 server with a regular, non-root user with sudo privileges. To prepare this server you can follow this initial server setup tutorial.

      Step 1 — Installing Checkmk on Ubuntu

      In order to use our monitoring site, we first must install Checkmk on the Ubuntu server. This will give us all the tools we need. Checkmk provides official ready-to-use Ubuntu package files that we can use to install the software bundle.

      First, let’s update the packages list so that we have the most recent version of the repository listings:

      To browse the packages we can go to the package listing site. Ubuntu 18.04, among others, can be selected in the page menu.

      Now download the package:

      • wget https://checkmk.com/support/1.6.0p8/check-mk-raw-1.6.0p8_0.bionic_amd64.deb

      Then install the newly downloaded package:

      • sudo apt install -y ./check-mk-raw-1.6.0p8_0.bionic_amd64.deb

      This command will install the Checkmk package along with all necessary dependencies, including the Apache web server that is used to provide web access to the monitoring interface.

      After the installation completes, we now can access the omd command. Try it out:

      This omd command will output the following:

      Output

      Usage (called as root): omd help Show general help . . . General Options: -V <version> set specific version, useful in combination with update/create omd COMMAND -h, --help show available options of COMMAND

      The omd command can manage all Checkmk instances on our server. It can start and stop all the monitoring services at once, and we will use it to create our Checkmk instance. First, however, we have to update our firewall settings to allow outside access to the default web ports.

      Step 2 — Adjusting the Firewall Settings

      Before we’ll be able to work with Checkmk, it’s necessary to allow outside access to the web server in the firewall configuration. Assuming that you followed the firewall configuration steps in the prerequisites, you’ll have a UFW firewall set up to restrict access to your server.

      During installation, Apache registers itself with UFW to provide an easy way to enable or disable access to Apache through the firewall.

      To allow access to Apache, use the following command:

      Now verify the changes:

      You’ll see that Apache is listed among the allowed services:

      Output

      Status: active To Action From -- ------ ---- OpenSSH ALLOW Anywhere Apache ALLOW Anywhere OpenSSH (v6) ALLOW Anywhere (v6) Apache (v6) ALLOW Anywhere (v6)

      This will allow us to access the Checkmk web interface.

      In the next step, we’ll create the first Checkmk monitoring instance.

      Step 3 — Creating a Checkmk Monitoring Instance

      Checkmk uses the concept of instances, or individual installations, to isolate multiple Checkmk copies on a server. In most cases, only one copy of Checkmk is enough and that’s how we will configure the software in this guide.

      First we must give our new instance a name, and we will use monitoring throughout this text. To create the instance, type:

      • sudo omd create monitoring

      The omd tool will set up everything for us automatically. The command output will look similar to the following:

      Output

      Adding /opt/omd/sites/monitoring/tmp to /etc/fstab. Creating temporary filesystem /omd/sites/monitoring/tmp...OK Restarting Apache...OK Created new site monitoring with version 1.6.0p8.cre. The site can be started with omd start monitoring. The default web UI is available at http://your_ubuntu_server/monitoring/ The admin user for the web applications is cmkadmin with password: your-default-password (It can be changed with 'htpasswd -m ~/etc/htpasswd cmkadmin' as site user.) Please do a su - monitoring for administration of this site.

      In this output the URL address, default username, and password for accessing our monitoring interface are highlighted. The instance is now created, but it still needs to be started. To start the instance, type:

      • sudo omd start monitoring

      Now all the necessary tools and services will be started at once. At the end we we’ll see an output verifying that all our services have started successfully:

      Output

      Starting mkeventd...OK Starting rrdcached...OK Starting npcd...OK Starting nagios...OK Starting apache...OK Initializing Crontab...OK

      The instance is up and running.

      To access the Checkmk instance, open http://your_ubuntu_server_ip/monitoring/ in the web browser. You will be prompted for a password. Use the default credentials printed beforehand on the screen; we will change these defaults later on.

      The Checkmk screen opens with a dashboard, which shows all our services and server statuses in lists, and it uses practical graphs resembling the Earth. Straight after installation these are empty, but we will shortly make it display statuses for our services and systems.

      Blank Checkmk dashboard

      In the next step, we will change the default password to secure the site using this interface.

      Step 4 — Changing Your Administrative Password

      During installation, Checkmk generates a random password for the cmkadmin administrative user. This password is meant to be changed upon installation, and as such it is often short and not very secure. We can change this via the web interface.

      First, open the Users page from the WATO – Configuration menu on the left. The list will show all users that currently have access to the Checkmk site. On a fresh installation it will list only two users. The first one, automation, is intended for use with automated tools; the second is the cmkadmin user we used to log in to the site.

      List of Checkmk users

      Click on the pencil icon next to the cmkadmin user to change its details, including the password.

      Edit form for Checkmk admin user

      Update the password, add an admin email, and make any other desired changes.

      After saving the changes we will be asked to log in again using our new credentials. Do so and return to the dashboard, where there is one more thing we must do to fully apply our new configuration.

      Once again open the Users page from the WATO – Configuration menu on the left. The orange button in the top left corner labeled as 1 Change tells us that we have made some changes to the configuration of Checkmk, and that we need to save and activate them. This will happen every time we change the configuration of our monitoring system, not only after editing a user’s credentials. To save and activate pending changes we have to click on this button and agree to activate the listed changes using the Activate affected option on the following screen.

      List of Checkmk users after modifications
      Activate configuration changes confirmation screen
      Successfully activated configuration changes

      After activating the changes the new user’s data is written to the configuration files and it will be used by all the system’s components. Checkmk automatically takes care of notifying individual monitoring system components, reloading them when necessary, and managing all the needed configuration files.

      The Checkmk installation is now ready for use. In the next step, we will add the first host to our monitoring system.

      Step 5 — Monitoring the First Host

      We are now ready to monitor the first host. To accomplish this, we will first install check-mk-agent on the Ubuntu server. Then, we’ll restrict access to the monitoring data using xinetd.

      The components installed with Checkmk are responsible for receiving, storing, and presenting monitoring information. They do not provide the information itself.

      To gather the actual data, we will use Checkmk agent. Designed specifically for the job, Checkmk agent is capable of monitoring all vital system components at once and reporting that information back to the Checkmk instance.

      Installing the agent

      The first host we will monitor will be your_ubuntu_server—the server on which we have installed the Checkmk instance itself.

      To begin, we must install the Checkmk agent. Packages for all major distributions, including Ubuntu, are available directly from the web interface. Open the Monitoring Agents page from the WATO – Configuration menu on the left. You will see the available agent downloads with the most popular packages under the first section labeled Packaged agents.

      List of available packaged monitoring agents

      The package check-mk-agent_1.6.0p8-1_all.deb is the one suited for Debian based distributions, including Ubuntu. Copy the download link for that package from the web browser and use that address to download the package.

      • wget http://your_ubuntu_server_ip/monitoring/check_mk/agents/check-mk-agent_1.6.0p8-1_all.deb

      After downloading, install the package:

      • apt install -y ./check-mk-agent_1.6.0p8-1_all.deb

      Now verify that the agent has been successfully installed:

      The command will output a very long text that looks like gibberish but combines all vital information about the system in one place.

      Output

      <<<check_mk>>> Version: 1.6.0p8 AgentOS: linux . . . ["monitoring"] <<<job>>> <<<local>>>

      It is the output from this command that Checkmk uses to gather status data from monitored hosts. Now, we’ll restrict access to the monitoring data with xinetd.

      Restricting Access to Monitoring Data Using xinetd

      By default, the data from check_mk_agent is served using xinetd, a mechanism that outputs data on a certain network port upon accessing it. This means that we can access the check_mk_agent by using telnet to port 6556 (the default port for Checkmk) from any other computer on the internet unless our firewall configuration disallows it.

      It is not a good security policy to publish vital information about servers to anyone on the internet. We should allow only hosts that run Checkmk and are under our supervision to access this data, so that only our monitoring system can gather it.

      If you have followed the initial server setup tutorial including the steps about setting up a firewall, then access to Checkmk agent is by default blocked. It is, however, a good practice to enforce these access restrictions directly in the service configuration and not rely only on the firewall to guard it.

      To restrict access to the agent data, we have to edit the configuration file at /etc/xinetd.d/check_mk. Open the configuration file in your favorite editor. To use nano, type:

      • sudo nano /etc/xinetd.d/check_mk

      Locate this section:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      #only_from      = 127.0.0.1 10.0.20.1 10.0.20.2
      . . .
      

      The only_from setting is responsible for restricting access to certain IP addresses. Because we are now working on monitoring the same server that Checkmk is running on, it is ok to allow only localhost to connect. Uncomment and update the configuration setting to:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      only_from      = 127.0.0.1
      . . .
      

      Save and exit the file.

      The xinetd daemon has to be restarted for changes to take place. Do so now:

      • sudo systemctl restart xinetd

      Now our agent is up and running and restricted to accept only local connections. We can proceed to configure monitoring for that host using Checkmk.

      Configuring Host in Checkmk Web Interface

      First, to add a new host to monitor we have to go to the Hosts menu in the WATO – Configuration menu on the left. From here click Create new host. We will be asked for some information about the host.

      Creating a new host in Checkmk

      The Hostname is the familiar name that Checkmk will use for the monitoring. It may be a fully-qualified domain name, but it is not necessary. In this example, we will name the host monitoring, just like the name of the Checkmk instance itself. Because monitoring is not resolvable to our IP address, we also have to provide the IP address of our server. And since we are monitoring the local host, the IP will simply be 127.0.0.1. Check the IPv4 Address box to enable the manual IP input and enter the value in the text field.

      The default configuration of the Data Sources section relies on Checkmk agent to provide monitoring data, which is fine. The Networking Segment setting is used to denote hosts on remote networks, which are characterized by a higher expected latency that is not a sign of malfunction. Since this is a local host, the default setting is fine as well.

      To save the host and configure which services will be monitored, click the Save & go to services button.

      List of available services to monitor

      Checkmk will do an automatic inventory. That means it will gather the output from the agent and decipher it to know what kinds of services it can monitor. All available services for monitoring will be on the list, including CPU load, memory usage, and free space on disks.

      To enable monitoring of all discovered services, we have to click the Monitor button under the Undecided services (currently not monitored) section. This will refresh the page, but now all services will be listed under the Monitored services section, informing us that they are indeed being monitored.

      As was the case when changing our user password, these new changes must be saved and activated before they go live. Press the 2 changes button and accept the changes using the Activate affected button. After that, the host monitoring will be up and running.

      Now you are ready to work with your server data. Take a look at the main dashboard using the Overview/Main Overview menu item on the left.

      Working with Monitoring Data

      Now let’s take a look at the main dashboard using the Overview/Main Overview menu item on the left:

      Monitoring dashboard with all services healthy

      The Earth sphere is now fully green and the table says that one host is up with no problems. We can see the full host list, which now consists of a single host, in the Hosts/All hosts view (using the menu on the left).

      List of hosts with all services healthy

      There we will see how many services are in good health (shown in green), how many are failing, and how many are pending to be checked. After clicking on the hostname we will be able to see the list of all services with their full statuses and their Perf-O-Meters. Perf-O-Meter shows the performance of a single service relative to what Checkmk considers to be good health.

      Details of a host service status

      All services that return graphable data display a graph icon next to their name. We can use that icon to get access to graphs associated with the service. Since the host monitoring is fresh, there is almost nothing on the graphs—but after some time the graphs will provide valuable information on how our service performance changes over time.

      Graphs depicting CPU load on the server

      When any of these services fails or recovers, the information will be shown on the dashboard. For failing services a red error will be shown, and the problem will also be visible on the Earth graph.

      Dashboard with one host having problems

      After recovery, everything will be shown in green as working properly, but the event log on the right will contain information about past failures.

      Dashboard with one host recovered after problems

      Now that we have explored the dashboard a little, let’s add a second host to our monitoring instance.

      Step 6 — Monitoring a Second CentOS Host

      Monitoring gets really useful when you have multiple hosts. We will now add a second server to our Checkmk instance, this time running CentOS 7.

      As with our Ubuntu server, installing Checkmk agent is necessary to gather monitoring data on CentOS. This time, however, we will need an rpm package from the Monitoring Agents page in the web interface, called check-mk-agent-1.6.0p8-1.noarch.rpm.

      First, however, we must install xinetd, which by default is not available on the CentOS installation. Xinetd, we will remember, is a daemon that is responsible for making the monitoring data provided by check_mk_agent available over the network.

      On your CentOS server, first install xinetd:

      • sudo yum install -y xinetd

      Now we can download and install the monitoring agent package needed for our CentOS server:

      • sudo yum install -y http://your_ubuntu_server_ip/monitoring/check_mk/agents/check-mk-agent-1.6.0p8-1.noarch.rpm

      Just like before, we can verify that the agent is working properly by executing check_mk_agent:

      The output will be similar to that from the Ubuntu server. Now we will restrict access to the agent.

      Restricting Access

      This time we will not be monitoring a local host, so xinetd must allow connections coming from the Ubuntu server, where Checkmk is installed, to gather the data. To allow that, first open your configuration file:

      • sudo vi /etc/xinetd.d/check_mk

      Here you will see the configuration for your check_mk service, specifying how Checkmk agent can be accessed through the xinetd daemon. Find the following two commented lines:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      #only_from      = 127.0.0.1 10.0.20.1 10.0.20.2
      . . .
      

      Now uncomment the second line and replace the local IP addresses with your_ubuntu_server_ip:

      /etc/xinetd.d/check_mk

      . . .
      # configure the IP address(es) of your Nagios server here:
      only_from      = your_ubuntu_server_ip
      . . .
      

      Save and exit the file by typing :x and then ENTER. Restart the xinetd service using:

      • sudo systemctl restart xinetd

      We can now proceed to configure Checkmk to monitor our CentOS 7 host.

      Configuring the New Host in Checkmk

      To add additional hosts to Checkmk, we use the Hosts menu just like before. This time we will name the host centos, configure its IP address, and choose WAN (high-latency) under the Networking Segment select box, since the host is on another network. If we skipped this and left it as local, Checkmk would soon alert us that the host is down, since it would expect it to respond to agent queries much quicker than is possible over the internet.

      Creating second host configuration screen

      Click Save & go to services, which will show services available for monitoring on the CentOS server. The list will be very similar to the one from the first host. Again, this time we also must click Monitor and then activate the changes using the orange button on the top left corner.

      After activating the changes, we can verify that the host is monitored on the All hosts page. Go there. Two hosts, monitoring and centos, will now be visible.

      List of hosts with two hosts being monitored

      You are now monitoring an Ubuntu server and a CentOS server with Checkmk. It is possible to monitor even more hosts. In fact, there is no upper limit other than server performance, which should not be a problem until your hosts number in the hundreds. Moreover, the procedure is the same for any other host. Checkmk agents in deb and rpm packages work on Ubuntu, CentOS, and the majority of other Linux distributions.

      Conclusion

      In this guide we set up two servers with two different Linux distributions: Ubuntu and CentOS. We then installed and configured Checkmk to monitor both servers, and explored Checkmk’s powerful web interface.

      Checkmk allows for the easy setup of a complete and versatile monitoring system, which packs all the hard work of manual configuration into an easy-to-use web interface full of options and features. With these tools it is possible to monitor multiple hosts; set up email, SMS, or push notifications for problems; set up additional checks for more services; monitor accessibility and performance, and so on.

      To learn more about Checkmk, make sure to visit the official documentation.



      Source link

      How To Monitor Your Managed PostgreSQL Database Using Nagios Core on Ubuntu 18.04


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Database monitoring is key to understanding how a database performs over time. It can help you uncover hidden usage problems and bottlenecks happening in your database. Implementing database monitoring systems can quickly turn out to be a long-term advantage, which will positively influence your infrastructure management process. You’ll be able to swiftly react to status changes of your database and will quickly be notified when monitored services return to normal functioning.

      Nagios Core is a popular monitoring system that you can use to monitor your managed database. The benefits of using Nagios for this task are its versatility—it’s easy to configure and use—a large repository of available plugins, and most importantly, integrated alerting.

      In this tutorial, you will set up PostgreSQL database monitoring in Nagios Core using the check_postgres Nagios plugin and set up Slack-based alerting. In the end, you’ll have a monitoring system in place for your managed PostgreSQL database, and will be notified of status changes of various functionality immediately.

      Prerequisites

      • An Ubuntu 18.04 server with root privileges, and a secondary, non-root account. You can set this up by following this initial server setup guide. For this tutorial the non-root user is sammy.

      • Nagios Core installed on your server. To achieve this, complete the first five steps of the How To Install Nagios 4 and Monitor Your Servers on Ubuntu 18.04 tutorial.

      • A DigitalOcean account and a PostgreSQL managed database provisioned from DigitalOcean with connection information available. Make sure that your server’s IP address is on the whitelist. To learn more about DigitalOcean Managed Databases, visit the product docs.

      • A Slack account with full access, added to a workspace where you’ll want to receive status updates.

      Step 1 — Installing check_postgres

      In this section, you’ll download the latest version of the check_postgres plugin from Github and make it available to Nagios Core. You’ll also install the PostgreSQL client (psql), so that check_postgres will be able to connect to your managed database.

      Start off by installing the PostgreSQL client by running the following command:

      • sudo apt install postgresql-client

      Next, you’ll download check_postgres to your home directory. First, navigate to it:

      Head over to the Github releases page and copy the link of the latest version of the plugin. At the time of writing, the latest version of check_postgres was 2.24.0; keep in mind that this will update, and where possible it's best practice to use the latest version.

      Now download it using curl:

      • curl -LO https://github.com/bucardo/check_postgres/releases/download/2.24.0/check_postgres-2.24.0.tar.gz

      Extract it using the following command:

      • tar xvf check_postgres-*.tar.gz

      This will create a directory with the same name as the file you have downloaded. That folder contains the check_postgres executable, which you'll need to copy to the directory where Nagios stores its plugins (usually /usr/local/nagios/libexec/). Copy it by running the following command:

      • sudo cp check_postgres-*/check_postgres.pl /usr/local/nagios/libexec/

      Next, you'll need to give the nagios user ownership of it, so that it can be run from Nagios:

      • sudo chown nagios:nagios /usr/local/nagios/libexec/check_postgres.pl

      check_postgres is now available to Nagios and can be used from it. However, it provides a lot of commands pertaining to different aspects of PostgreSQL, and for better service maintainability, it's better to break them up so that they can be called separately. You'll achieve this by creating a symlink to every check_postgres command in the plugin directory.

      Navigate to the directory where Nagios stores plugins by running the following command:

      • cd /usr/local/nagios/libexec

      Then, create the symlinks with:

      • sudo perl check_postgres.pl --symlinks

      The output will look like this:

      Output

      Created "check_postgres_archive_ready" Created "check_postgres_autovac_freeze" Created "check_postgres_backends" Created "check_postgres_bloat" Created "check_postgres_checkpoint" Created "check_postgres_cluster_id" Created "check_postgres_commitratio" Created "check_postgres_connection" Created "check_postgres_custom_query" Created "check_postgres_database_size" Created "check_postgres_dbstats" Created "check_postgres_disabled_triggers" Created "check_postgres_disk_space" Created "check_postgres_fsm_pages" Created "check_postgres_fsm_relations" Created "check_postgres_hitratio" Created "check_postgres_hot_standby_delay" Created "check_postgres_index_size" Created "check_postgres_indexes_size" Created "check_postgres_last_analyze" Created "check_postgres_last_autoanalyze" Created "check_postgres_last_autovacuum" Created "check_postgres_last_vacuum" Created "check_postgres_listener" Created "check_postgres_locks" Created "check_postgres_logfile" Created "check_postgres_new_version_bc" Created "check_postgres_new_version_box" Created "check_postgres_new_version_cp" Created "check_postgres_new_version_pg" Created "check_postgres_new_version_tnm" Created "check_postgres_pgagent_jobs" Created "check_postgres_pgb_pool_cl_active" Created "check_postgres_pgb_pool_cl_waiting" Created "check_postgres_pgb_pool_maxwait" Created "check_postgres_pgb_pool_sv_active" Created "check_postgres_pgb_pool_sv_idle" Created "check_postgres_pgb_pool_sv_login" Created "check_postgres_pgb_pool_sv_tested" Created "check_postgres_pgb_pool_sv_used" Created "check_postgres_pgbouncer_backends" Created "check_postgres_pgbouncer_checksum" Created "check_postgres_prepared_txns" Created "check_postgres_query_runtime" Created "check_postgres_query_time" Created "check_postgres_relation_size" Created "check_postgres_replicate_row" Created "check_postgres_replication_slots" Created "check_postgres_same_schema" Created "check_postgres_sequence" Created "check_postgres_settings_checksum" Created "check_postgres_slony_status" Created "check_postgres_table_size" Created "check_postgres_timesync" Created "check_postgres_total_relation_size" Created "check_postgres_txn_idle" Created "check_postgres_txn_time" Created "check_postgres_txn_wraparound" Created "check_postgres_version" Created "check_postgres_wal_files"

      Perl listed all the functions it created a symlink for. These can now be executed from the command line as usual.

      You've downloaded and installed the check_postgres plugin. You have also created symlinks to all the commands of the plugin, so that they can be used individually from Nagios. In the next step, you'll create a connection service file, which check_postgres will use to connect to your managed database.

      Step 2 — Configuring Your Database

      In this section, you will create a PostgreSQL connection service file containing the connection information of your database. Then, you will test the connection data by invoking check_postgres on it.

      The connection service file is by convention called pg_service.conf, and must be located under /etc/postgresql-common/. Create it for editing with your favorite editor (for example, nano):

      • sudo nano /etc/postgresql-common/pg_service.conf

      Add the following lines, replacing the highlighted placeholders with the actual values shown in your Managed Database Control Panel under the section Connection Details:

      /etc/postgresql-common/pg_service.conf

      [managed-db]
      host=host
      port=port
      user=username
      password=password
      dbname=defaultdb
      sslmode=require
      

      The connection service file can house multiple database connection info groups. The beginning of a group is signaled by putting its name in square brackets. After that comes the connection parameters (host, port, user, password, and so on), separated by new lines, which must be given a value.

      Save and close the file when you are finished.

      You'll now test the validity of the configuration by connecting to the database via check_postgres by running the following command:

      • ./check_postgres.pl --dbservice=managed-db --action=connection

      Here, you tell check_postgres which database connection info group to use with the parameter --dbservice, and also specify that it should only try to connect to it by specifying connection as the action.

      Your output will look similar to this:

      Output

      POSTGRES_CONNECTION OK: service=managed-db version 11.4 | time=0.10s

      This means that check_postgres succeeded in connecting to the database, according to the parameters from pg_service.conf. If you get an error, double check what you have just entered in that config file.

      You've created and filled out a PostgreSQL connection service file, which works as a connection string. You have also tested the connection data by running check_postgres on it and observing the output. In the next step, you will configure Nagios to monitor various parts of your database.

      Step 3 — Creating Monitoring Services in Nagios

      Now you will configure Nagios to watch over various metrics of your database by defining a host and multiple services, which will call the check_postgres plugin and its symlinks.

      Nagios stores your custom configuration files under /usr/local/nagios/etc/objects. New files you add there must be manually enabled in the central Nagios config file, located at /usr/local/nagios/etc/nagios.cfg. You'll now define commands, a host, and multiple services, which you'll use to monitor your managed database in Nagios.

      First, create a folder under /usr/local/nagios/etc/objects to store your PostgreSQL related configuration by running the following command:

      • sudo mkdir /usr/local/nagios/etc/objects/postgresql

      You'll store Nagios commands for check_nagios in a file named commands.cfg. Create it for editing:

      • sudo nano /usr/local/nagios/etc/objects/postgresql/commands.cfg

      Add the following lines:

      /usr/local/nagios/etc/objects/postgresql/commands.cfg

      define command {
          command_name           check_postgres_connection
          command_line           /usr/local/nagios/libexec/check_postgres_connection --dbservice=$ARG1$
      }
      
      define command {
          command_name           check_postgres_database_size
          command_line           /usr/local/nagios/libexec/check_postgres_database_size --dbservice=$ARG1$ --critical='$ARG2$'
      }
      
      define command {
          command_name           check_postgres_locks
          command_line           /usr/local/nagios/libexec/check_postgres_locks --dbservice=$ARG1$
      }
      
      define command {
          command_name           check_postgres_backends
          command_line           /usr/local/nagios/libexec/check_postgres_backends --dbservice=$ARG1$
      }
      

      Save and close the file.

      In this file, you define four Nagios commands that call different parts of the check_postgres plugin (checking connectivity, getting the number of locks and connections, and the size of the whole database). They all accept an argument that is passed to the --dbservice parameter, and specify which of the databases defined in pg_service.conf to connect to.

      The check_postgres_database_size command accepts a second argument that gets passed to the --critical parameter, which specifies the point at which the database storage is becoming full. Accepted values include 1 KB for a kilobyte, 1 MB for a megabyte, and so on, up to exabytes (EB). A number without a capacity unit is treated as being expressed in bytes.

      Now that the necessary commands are defined, you'll define the host (essentially, the database) and its monitoring services in a file named services.cfg. Create it using your favorite editor:

      • sudo nano /usr/local/nagios/etc/objects/postgresql/services.cfg

      Add the following lines, replacing db_max_storage_size with a value pertaining to the available storage of your database. It is recommended to set it to 90 percent of the storage size you have allocated to it:

      /usr/local/nagios/etc/objects/postgresql/services.cfg

      define host {
            use                    linux-server
            host_name              postgres
            check_command          check_postgres_connection!managed-db
      }
      
      define service {
            use                    generic-service
            host_name              postgres
            service_description    PostgreSQL Connection
            check_command          check_postgres_connection!managed-db
            notification_options   w,u,c,r,f,s
      }
      
      define service {
            use                    generic-service
            host_name              postgres
            service_description    PostgreSQL Database Size
            check_command          check_postgres_database_size!managed-db!db_max_storage_size
            notification_options   w,u,c,r,f,s
      }
      
      define service {
            use                    generic-service
            host_name              postgres
            service_description    PostgreSQL Locks
            check_command          check_postgres_locks!managed-db
            notification_options   w,u,c,r,f,s
      }
      
      define service {
            use                    generic-service
            host_name              postgres
            service_description    PostgreSQL Backends
            check_command          check_postgres_backends!managed-db
            notification_options   w,u,c,r,f,s
      }
      

      You first define a host, so that Nagios will know what entity the services relate to. Then, you create four services, which call the commands you just defined. Each one passes managed-db as the argument, detailing that the managed-db you defined in Step 2 should be monitored.

      Regarding notification options, each service specifies that notifications should be sent out when the service state becomes WARNING, UNKNOWN, CRITICAL, OK (when it recovers from downtime), when the service starts flapping, or when scheduled downtime starts or ends. Without explicitly giving this option a value, no notifications would be sent out (to available contacts) at all, except if triggered manually.

      Save and close the file.

      Next, you'll need to explicitly tell Nagios to read config files from this new directory, by editing the general Nagios config file. Open it for editing by running the following command:

      • sudo nano /usr/local/nagios/etc/nagios.cfg

      Find this highlighted line in the file:

      /usr/local/nagios/etc/nagios.cfg

      ...
      # directive as shown below:
      
      cfg_dir=/usr/local/nagios/etc/servers
      #cfg_dir=/usr/local/nagios/etc/printers
      ...
      

      Above it, add the following highlighted line:

      /usr/local/nagios/etc/nagios.cfg

      ...
      cfg_dir=/usr/local/nagios/etc/objects/postgresql
      cfg_dir=/usr/local/nagios/etc/servers
      ...
      

      Save and close the file. This line tells Nagios to load all config files from the /usr/local/nagios/etc/objects/postgresql directory, where your configuration files are located.

      Before restarting Nagios, check the validity of the configuration by running the following command:

      • sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

      The end of the output will look similar to this:

      Output

      Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check

      This means that Nagios found no errors in the configuration. If it shows you an error, you'll also see a hint as to what went wrong, so you'll be able to fix the error more easily.

      To make Nagios reload its configuration, restart its service by running the following command:

      • sudo systemctl restart nagios

      You can now navigate to Nagios in your browser. Once it loads, press on the Services option from the left-hand menu. You'll see the postgres host and a list of services, along with their current statuses:

      PostgreSQL Monitoring Services - Pending

      They will all soon turn to green and show an OK status. You'll see the command output under the Status Information column. You can click on the service name and see detailed information about its status and availability.

      You've added check_postgres commands, a host, and multiple services to your Nagios installation to monitor your database. You've also checked that the services are working properly by examining them via the Nagios web interface. In the next step, you will configure Slack-based alerting.

      Step 4 — Configuring Slack Alerting

      In this section, you will configure Nagios to alert you about events via Slack, by posting them into desired channels in your workspace.

      Before you start, log in to your desired workspace on Slack and create two channels where you'll want to receive status messages from Nagios: one for host, and the other one for service notifications. If you wish, you can create only one channel where you'll receive both kinds of alerts.

      Then, head over to the Nagios app in the Slack App Directory and press on Add Configuration. You'll see a page for adding the Nagios Integration.

      Slack - Add Nagios Integration

      Press on Add Nagios Integration. When the page loads, scroll down and take note of the token, because you'll need it further on.

      Slack - Integration Token

      You'll now install and configure the Slack plugin (written in Perl) for Nagios on your server. First, install the required Perl prerequisites by running the following command:

      • sudo apt install libwww-perl libcrypt-ssleay-perl -y

      Then, download the plugin to your Nagios plugin directory:

      • sudo curl https://raw.githubusercontent.com/tinyspeck/services-examples/master/nagios.pl -o slack.pl

      Make it executable by running the following command:

      Now, you'll need to edit it to connect to your workspace using the token you got from Slack. Open it for editing:

      Find the following lines in the file:

      /usr/local/nagios/libexec/slack.pl

      ...
      my $opt_domain = "foo.slack.com"; # Your team's domain
      my $opt_token = "your_token"; # The token from your Nagios services page
      ...
      

      Replace foo.slack.com with your workspace domain and your_token with your Nagios app integration token, then save and close the file. The script will now be able to send proper requests to Slack, which you'll now test by running the following command:

      • ./slack.pl -field slack_channel=#your_channel_name -field HOSTALIAS="Test Host" -field HOSTSTATE="UP" -field HOSTOUTPUT="Host is UP" -field NOTIFICATIONTYPE="RECOVERY"

      Replace your_channel_name with the name of the channel where you'll want to receive status alerts. The script will output information about the HTTP request it made to Slack, and if everything went through correctly, the last line of the output will be ok. If you get an error, double check if the Slack channel you specified exists in the workspace.

      You can now head over to your Slack workspace and select the channel you specified. You'll see a test message coming from Nagios.

      Slack - Nagios Test Message

      This confirms that you have properly configured the Slack script. You'll now move on to configuring Nagios to alert you via Slack using this script.

      You'll need to create a contact for Slack and two commands that will send messages to it. You'll store this config in a file named slack.cfg, in the same folder as the previous config files. Create it for editing by running the following command:

      • sudo nano /usr/local/nagios/etc/objects/postgresql/slack.cfg

      Add the following lines:

      /usr/local/nagios/etc/objects/postgresql/slack.cfg

      define contact {
            contact_name                             slack
            alias                                    Slack
            service_notification_period              24x7
            host_notification_period                 24x7
            service_notification_options             w,u,c,f,s,r
            host_notification_options                d,u,r,f,s
            service_notification_commands            notify-service-by-slack
            host_notification_commands               notify-host-by-slack
      }
      
      define command {
            command_name     notify-service-by-slack
            command_line     /usr/local/nagios/libexec/slack.pl -field slack_channel=#service_alerts_channel
      }
      
      define command {
            command_name     notify-host-by-slack
            command_line     /usr/local/nagios/libexec/slack.pl -field slack_channel=#host_alerts_channel
      }
      

      Here you define a contact named slack, state that it can be contacted anytime and specify which commands to use for notifying service and host related events. Those two commands are defined after it and call the script you have just configured. You'll need to replace service_alerts_channel and host_alerts_channel with the names of the channels where you want to receive service and host messages, respectively. If preferred, you can use the same channel names.

      Similarly to the service creation in the last step, setting service and host notification options on the contact is crucial, because it governs what kind of alerts the contact will receive. Omitting those options would result in sending out notifications only when manually triggered from the web interface.

      When you are done with editing, save and close the file.

      To enable alerting via the slack contact you just defined, you'll need to add it to the admin contact group, defined in the contacts.cfg config file, located under /usr/local/nagios/etc/objects/. Open it for editing by running the following command:

      • sudo nano /usr/local/nagios/etc/objects/contacts.cfg

      Find the config block that looks like this:

      /usr/local/nagios/etc/objects/contacts.cfg

      define contactgroup {
      
          contactgroup_name       admins
          alias                   Nagios Administrators
          members                 nagiosadmin
      }
      

      Add slack to the list of members, like so:

      /usr/local/nagios/etc/objects/contacts.cfg

      define contactgroup {
      
          contactgroup_name       admins
          alias                   Nagios Administrators
          members                 nagiosadmin,slack
      }
      

      Save and close the file.

      By default when running scripts, Nagios does not make host and service information available via environment variables, which is what the Slack script requires in order to send meaningful messages. To remedy this, you'll need to set the enable_environment_macros setting in nagios.cfg to 1. Open it for editing by running the following command:

      • sudo nano /usr/local/nagios/etc/nagios.cfg

      Find the line that looks like this:

      /usr/local/nagios/etc/nagios.cfg

      enable_environment_macros=0
      

      Change the value to 1, like so:

      /usr/local/nagios/etc/nagios.cfg

      enable_environment_macros=1
      

      Save and close the file.

      Test the validity of the Nagios configuration by running the following command:

      • sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

      The end of the output will look like:

      Output

      Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check

      Proceed to restart Nagios by running the following command:

      • sudo systemctl restart nagios

      To test the Slack integration, you'll send out a custom notification via the web interface. Reload the Nagios Services status page in your browser. Press on the PostgreSQL Backends service and press on Send custom service notification on the right when the page loads.

      Nagios - Custom Service Notification

      Type in a comment of your choice and press on Commit, and then press on Done. You'll immediately receive a new message in Slack.

      Slack - Status Alert From Nagios

      You have now integrated Slack with Nagios, so you'll receive messages about critical events and status changes immediately. You've also tested the integration by manually triggering an event from within Nagios.

      Conclusion

      You now have Nagios Core configured to watch over your managed PostgreSQL database and report any status changes and events to Slack, so you'll always be in the loop of what is happening to your database. This will allow you to swiftly react in case of an emergency, because you'll be getting the status feed in real time.

      If you'd like to learn more about the features of check_postgres, check out its docs, where you'll find a lot more commands that you can possibly use.

      For more information about what you can do with your PostgreSQL Managed Database, visit the product docs.



      Source link