One place for hosting & domains

      February 2019

      How To Set Filesystem Quotas on Ubuntu 18.04


      Introduction

      Quotas are used to limit the amount of disk space a user or group can use on a filesystem. Without such limits, a user could fill up the machine’s disk and cause problems for other users and services.

      In this tutorial we will install command line tools to create and inspect disk quotas, then set a quota for an example user.

      Prerequisites

      This tutorial assumes you are logged into an Ubuntu 18.04 server, with a non-root, sudo-enabled user, as described in Initial Server Setup with Ubuntu 18.04.

      The techniques in this tutorial should generally work on Linux distributions other than Ubuntu, but may require some adaptation.

      To set and check quotas, we first need to install the quota command line tools using apt. Let’s update our package list, then install the package:

      sudo apt update
      sudo apt install quota
      

      You can verify that the tools are installed by running the quota command and asking for its version information:

      quota --version
      

      Output

      Quota utilities version 4.04. . . .

      It’s fine if your output shows a slightly different version number.

      Next we’ll make sure we have the appropriate kernel modules for monitoring quotas.

      Step 2 – Installing the Quota Kernel Module

      If you are on a cloud-based virtual server, your default Ubuntu Linux installation may not have the kernel modules needed to support quota management. To check, we will use find to search for the quota_v1 and quota_v2 modules in the /lib/modules/... directory:

      find /lib/modules/`uname -r` -type f -name '*quota_v*.ko*'
      

      Output

      /lib/modules/4.15.0-45-generic/kernel/fs/quota/quota_v1.ko /lib/modules/4.15.0-45-generic/kernel/fs/quota/quota_v2.ko

      Your kernel version – highlighted in the file paths above – will likely be different, but as long as the two modules are listed, you’re all set and can skip the rest of this step.

      If you get no output from the above command, install the linux-image-extra-virtual package:

      sudo apt install linux-image-extra-virtual
      

      This will provide the kernel modules necessary for implementing quotas. Run the previous find command again to verify that the installation was successful.

      Next we will update our filesystem’s mount options to enable quotas on our root filesystem.

      Step 3 – Updating Filesystem Mount Options

      To activate quotas on a particular filesystem, we need to mount it with a few quota-related options specified. We do this by updating the filesystem’s entry in the /etc/fstab configuration file. Open that file in your favorite text editor now:

      sudo nano /etc/fstab
      

      This file’s contents will be similar to the following:

      /etc/fstab

      LABEL=cloudimg-rootfs   /        ext4   defaults        0 0
      LABEL=UEFI      /boot/efi       vfat    defaults        0 0
      

      This fstab file is from a virtual server. A desktop or laptop computer will probably have a slightly different fstab, but in most cases you’ll have a / or root filesystem that represents all of your disk space.

      Update the line pointing to the root filesystem by replacing the defaults option with the following highlighted options:

      /etc/fstab

      LABEL=cloudimg-rootfs   /        ext4   usrquota,grpquota        0 0
      . . .
      

      This change will allow us to enable both user- (usrquota) and group-based (grpquota) quotas on the filesystem. If you only need one or the other, you may leave out the unused option. If your fstab line already had some options listed instead of defaults, you should add the new options to the end of whatever is already there, being sure to separate all options with a comma and no spaces.

      Remount the filesystem to make the new options take effect:

      sudo mount -o remount /
      

      Note: Be certain there are no spaces between the options listed in your /etc/fstab file. If you put a space after the , comma, you will see an error like the following:

      Output

      mount: /etc/fstab: parse error

      If you see this message after running the previous mount command, reopen the fstab file, correct any errors, and repeat the mount command before continuing.

      We can verify that the new options were used to mount the filesystem by looking at the /proc/mounts file. Here, we use grep to show only the root filesystem entry in that file:

      cat /proc/mounts | grep ' / '
      

      Output

      /dev/vda1 / ext4 rw,relatime,quota,usrquota,grpquota,data=ordered 0 0

      Note the two options that we specified. Now that we’ve installed our tools and updated our filesystem options, we can turn on the quota system.

      Step 4 – Enabling Quotas

      Before finally turning on the quota system, we need to manually run the quotacheck command once:

      sudo quotacheck -ugm /
      

      This command creates the files /aquota.user and /aquota.group. These files contain information about the limits and usage of the filesystem, and they need to exist before we turn on quota monitoring. The quotacheck parameters we’ve used are:

      • u: specifies that a user-based quota file should be created
      • g: indicates that a group-based quota file should be created
      • m: disables remounting the filesystem as read-only while performing the initial tallying of quotas. Remounting the filesystem as read-only will give more accurate results in case a user is actively saving files during the process, but is not necessary during this initial setup.

      If you don’t need to enable user- or group-based quotas, you can leave off the corresponding quotacheck option.

      We can verify that the appropriate files were created by listing the root directory:

      ls /
      

      Output

      aquota.group bin dev home initrd.img.old lib64 media opt root sbin srv tmp var vmlinuz.old aquota.user boot etc initrd.img lib lost+found mnt proc run snap sys usr vmlinuz

      If you didn’t include the u or g options in the quotacheck command, the corresponding file will be missing. Now we’re ready to turn on the quota system:

      sudo quotaon -v /
      

      Our server is now monitoring and enforcing quotas, but we’ve not set any yet! Next we’ll set a disk quota for a single user.

      Step 5 – Configuring Quotas for a User

      There are a few ways we can set quotas for users or groups. Here, we’ll go over how to set quotas with both the edquota and setquota commands.

      Using edquota to Set a User Quota

      We use the edquota command to edit quotas. Let’s edit our example sammy user’s quota:

      sudo edquota -u sammy
      

      The -u option specifies that this is a user quota we’ll be editing. If you’d like to edit a group’s quota instead, use the -g option in its place.

      This will open up a file in your default text editor, similar to how crontab -e opens a temporary file for you to edit. The file will look similar to this:

      Disk quotas for user sammy (uid 1000):
        Filesystem                   blocks       soft       hard     inodes     soft     hard
        /dev/vda1                        40          0          0         13        0        0
      

      This lists the username and uid, the filesystems that have quotas enabled on them, and the block– and inode-based usage and limits. Setting an inode-based quota would limit how many files and directories a user can create, regardless of the amount of disk space they use. Most people will want block-based quotas, which specifically limit disk space usage. This is what we will configure.

      Note: The concept of a block is poorly specified and can change depending on many factors, including which command line tool is reporting them. In the context of setting quotas on Ubuntu, it’s fairly safe to assume that 1 block equals 1 kilobyte of disk space.

      In the above listing, our user sammy is using 40 blocks, or 40KB of space on the /dev/vda1 drive. The soft and hard limits are both disabled with a 0 value.

      Each type of quota allows you to set both a soft limit and a hard limit. When a user exceeds the soft limit, they are over quota, but they are not immediately prevented from consuming more space or inodes. Instead, some leeway is given: the user has – by default – seven days to get their disk use back under the soft limit. At the end of the seven day grace period, if the user is still over the soft limit it will be treated as a hard limit. A hard limit is less forgiving: all creation of new blocks or inodes is immediately halted when you hit the specified hard limit. This behaves as if the disk is completely out of space: writes will fail, temporary files will fail to be created, and the user will start to see warnings and errors while performing common tasks.

      Let’s update our sammy user to have a block quota with a 100MB soft limit, and a 110MB hard limit:

      Disk quotas for user sammy (uid 1000):
        Filesystem                   blocks       soft       hard     inodes     soft     hard
        /dev/vda1                        40       100M       110M         13        0        0
      

      Save and close the file. To check the new quota we can use the quota command:

      sudo quota -vs sammy
      

      Output

      Disk quotas for user sammy (uid 1000): Filesystem space quota limit grace files quota limit grace /dev/vda1 40K 100M 110M 13 0 0

      The command outputs our current quota status, and shows that our quota is 100M while our limit is 110M. This corresponds to the soft and hard limits respectively.

      Note: If you want your users to be able to check their own quotas without having sudo access, you’ll need to give them permission to read the quota files we created in Step 4. One way to do this would be to make a users group, make those files readable by the users group, and then make sure all your users are also placed in the group.

      To learn more about Linux permissions, including user and group ownership, please read An Introduction to Linux Permissions

      Using setquota to Set a User Quota

      Unlike edquota, setquota will update our user’s quota information in a single command, without an interactive editing step. We will specify the username and the soft and hard limits for both block- and inode-based quotas, and finally the filesystem to apply the quota to:

      sudo setquota -u sammy 200M 220M 0 0 /
      

      The above command will double sammy‘s block-based quota limits to 200 megabytes and 220 megabytes. The 0 0 for inode-based soft and hard limits indicates that they remain unset. This is required even if we’re not setting any inode-based quotas.

      Once again, use the quota command to check our work:

      sudo quota -vs sammy
      

      Output

      Disk quotas for user sammy (uid 1000): Filesystem space quota limit grace files quota limit grace /dev/vda1 40K 200M 220M 13 0 0

      Now that we have set some quotas, let’s find out how to generate a quota report.

      Step 6 – Generating Quota Reports

      To generate a report on current quota usage for all users on a particular filesystem, use the repquota command:

      sudo repquota -s /
      

      Output

      *** Report for user quotas on device /dev/vda1 Block grace time: 7days; Inode grace time: 7days Space limits File limits User used soft hard grace used soft hard grace ---------------------------------------------------------------------- root -- 1696M 0K 0K 75018 0 0 daemon -- 64K 0K 0K 4 0 0 man -- 1048K 0K 0K 81 0 0 nobody -- 7664K 0K 0K 3 0 0 syslog -- 2376K 0K 0K 12 0 0 sammy -- 40K 100M 110M 13 0 0

      In this instance we’re generating a report for the / root filesystem. The -s command tells repquota to use human-readable numbers when possible. There are a few system users listed, which probably have no quotas set by default. Our user sammy is listed at the bottom, with the amounts used and soft and hard limits.

      Also note the Block grace time: 7days callout, and the grace column. If our user was over the soft limit, the grace column would show how much time they had left to get back under the limit.

      In the next step we’ll update the grace periods for our quota system.

      Step 7 – Configuring a Grace Period for Overages

      We can configure the period of time where a user is allowed to float above the soft limit. We use the setquota command to do so:

      sudo setquota -t 864000 864000 /
      

      The above command sets both the block and inode grace times to 864000 seconds, or 10 days. This setting applies to all users, and both values must be provided even if you don’t use both types of quota (block vs. inode).

      Note that the values must be specified in seconds.

      Run repquota again to check that the changes took effect:

      sudo repquota -s /
      

      Output

      Block grace time: 10days; Inode grace time: 10days . . .

      The changes should be reflected immediately in the repquota output.

      Conclusion

      In this tutorial we installed the quota command line tools, verified that our Linux kernel can handle monitoring quotas, set up a block-based quota for one user, and generated a report on our filesystem’s quota usage.

      The following are some common errors you may see when setting up and manipulating filesystem quotas.

      quotaon Output

      quotaon: cannot find //aquota.group on /dev/vda1 [/] quotaon: cannot find //aquota.user on /dev/vda1 [/]

      This is an error you might see if you tried to turn on quotas (using quotaon) before running the initial quotacheck command. The quotacheck command creates the aquota or quota files needed to turn on the quota system. See Step 4 for more information.

      quotaon Output

      quotaon: using //aquota.group on /dev/vda1 [/]: No such process quotaon: Quota format not supported in kernel. quotaon: using //aquota.user on /dev/vda1 [/]: No such process quotaon: Quota format not supported in kernel.

      This quotaon error is telling us that our kernel does not support quotas, or at least doesn’t support the correct version (there is both a quota_v1 and quota_v2 version). This means the kernel modules we need are not installed or are not being loaded properly. On Ubuntu Server the most likely cause of this is using a pared-down installation image on a cloud-based virtual server.

      If this is the case, it can be fixed by installing the linux-image-extra-virtual package with apt. See Step 2 for more details.

      quota Output

      quota: Cannot open quotafile //aquota.user: Permission denied quota: Cannot open quotafile //aquota.user: Permission denied quota: Cannot open quotafile //quota.user: No such file or directory

      This is the error you’ll see if you run quota and your current user does not have permission to read the quota files for your filesystem. You (or your system administrator) will need to adjust the file permissions appropriately, or use sudo when running commands that require access to the quota file.

      To learn more about Linux permissions, including user and group ownership, please read An Introduction to Linux Permissions



      Source link

      Managed Databases Connection Pools and PostgreSQL Benchmarking Using pgbench


      Introduction

      DigitalOcean Managed Databases allows you to scale your PostgreSQL database using several methods. One such method is a built-in connection pooler that allows you to efficiently handle large numbers of client connections and reduce the CPU and memory footprint of these open connections. By using a connection pool and sharing a fixed set of recyclable connections, you can handle significantly more concurrent client connections, and squeeze extra performance out of your PostgreSQL database.

      In this tutorial we’ll use pgbench, PostgreSQL’s built-in benchmarking tool, to run load tests on a DigitalOcean Managed PostgreSQL Database. We’ll dive in to connection pools, describe how they work, and show how to create one using the Cloud Control panel. Finally, using results from the pgbench tests, we’ll demonstrate how using a connection pool can be an inexpensive method of increasing database throughput.

      Prerequisites

      To complete this tutorial, you’ll need:

      • A DigitalOcean Managed PostgreSQL Database cluster. To learn how to provision and configure a DigitalOcean PostgreSQL cluster, consult the Managed Database product documentation.
      • A client machine with PostgreSQL installed. By default, your PostgreSQL installation will contain the pgbench benchmarking utility and the psql client, both of which we’ll use in this guide. Consult How To Install and Use PostgreSQL on Ubuntu 18.04 to learn how to Install PostgreSQL. If you’re not running Ubuntu on your client machine, you can use the version finder to find the appropriate tutorial.

      Once you have a DigitalOcean PostgreSQL cluster up and running and a client machine with pgbench installed, you’re ready to begin with this guide.

      Step 1 — Creating and Initializing benchmark Database

      Before we create a connection pool for our database, we’ll first create the benchmark database on our PostgreSQL cluster and populate it with some dummy data on which pgbench will run its tests. The pgbench utility repeatedly runs a series of five SQL commands (consisting of SELECT, UPDATE, and INSERT queries) in a transaction, using multiple threads and clients, and calculates a useful performance metric called Transactions per Second (TPS). TPS is a measure of database throughput, counting the number of atomic transactions processed by the database in one second. To learn more about the specific commands executed by pgbench, consult What is the “Transaction” Actually Performed in pgbench? from the official pgbench documentation.

      Let’s begin by connecting to our PostgreSQL cluster and creating the benchmark database.

      First, retrieve your cluster’s Connection Details by navigating to Databases and locating your PostgreSQL cluster. Click into your cluster. You should see a cluster overview page containing the following Connection Details box:

      PostgreSQL Cluster Connection Details

      From this, we can parse the following config variables:

      • Admin user: doadmin
      • Admin password: your_password
      • Cluster endpoint: dbaas-test-do-user-3587522-0.db.ondigitalocean.com
      • Connection port: 25060
      • Database to connect to: defaultdb
      • SSL Mode: require (use an SSL-encrypted connection for increased security)

      Take note of these parameters, as you’ll need them when using both the psql client and pgbench tool.

      Click on the dropdown above this box and select Connection String. We’ll copy this string and pass it in to psql to connect to this PostgreSQL node.

      Connect to your cluster using psql and the connection string you just copied:

      • psql postgresql://doadmin:your_password@your_cluster_endpoint:25060/defaultdb?sslmode=require

      You should see the following PostgreSQL client prompt, indicating that you’ve connected to your PostgreSQL cluster successfully:

      Output

      psql (10.6 (Ubuntu 10.6-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. defaultdb=>

      From here, create the benchmark database:

      • CREATE DATABASE benchmark;

      You should see the following output:

      Output

      CREATE DATABASE

      Now, disconnect from the cluster:

      Before we run the pgbench tests, we need to populate this benchmark database with some tables and dummy data required to run the tests.

      To do this, we’ll run pgbench with the following flags:

      • -h: The PostgreSQL cluster endpoint
      • -p: The PostgreSQL cluster connection port
      • -U: The database username
      • -i: Indicates that we'd like to initialize the benchmark database with benchmarking tables and their dummy data.
      • -s : Set a scale factor of 150, which will multiply table sizes by 150. The default scale factor of 1 results in tables of the following sizes:

        table                   # of rows
        ---------------------------------
        pgbench_branches        1
        pgbench_tellers         10
        pgbench_accounts        100000
        pgbench_history         0
        

        Using a scale factor of 150, the pgbench_accounts table will contain 15,000,000 rows.

        Note: To avoid excessive blocked transactions, be sure to set the scale factor to a value at least as large as the number of concurrent clients you intend to test with. In this tutorial we'll test with 150 clients at most, so we set -s to 150 here. To learn more, consult these recommended practices from the official pgbench documentation.

      Run the complete pgbench command:

      • pgbench -h your_cluster_endpoint -p 25060 -U doadmin -i -s 150 benchmark

      After running this command, you will be prompted to enter the password for the database user you specified. Enter the password, and hit ENTER.

      You should see the following output:

      Output

      dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data... 100000 of 15000000 tuples (0%) done (elapsed 0.19 s, remaining 27.93 s) 200000 of 15000000 tuples (1%) done (elapsed 0.85 s, remaining 62.62 s) 300000 of 15000000 tuples (2%) done (elapsed 1.21 s, remaining 59.23 s) 400000 of 15000000 tuples (2%) done (elapsed 1.63 s, remaining 59.44 s) 500000 of 15000000 tuples (3%) done (elapsed 2.05 s, remaining 59.51 s) . . . 14700000 of 15000000 tuples (98%) done (elapsed 70.87 s, remaining 1.45 s) 14800000 of 15000000 tuples (98%) done (elapsed 71.39 s, remaining 0.96 s) 14900000 of 15000000 tuples (99%) done (elapsed 71.91 s, remaining 0.48 s) 15000000 of 15000000 tuples (100%) done (elapsed 72.42 s, remaining 0.00 s) vacuuming... creating primary keys... done.

      At this point, we've created a benchmarking database, populated with the tables and data required to run the pgbench tests. We can now move on to running a baseline test which we'll use to compare performance before and after connection pooling is enabled.

      Step 2 — Running a Baseline pgbench Test

      Before we run our first benchmark, it's worth diving into what we're trying to optimize with connection pools.

      Typically when a client connects to a PostgreSQL database, the main PostgreSQL OS process forks itself into a child process corresponding to this new connection. When there are only a few connections, this rarely presents an issue. However, as clients and connections scale, the CPU and memory overhead of creating and maintaining these connections begins to add up, especially if the application in question does not efficiently use database connections. In addition, the max_connections PostgreSQL setting may limit the number of client connections allowed, resulting in additional connections being refused or dropped.

      A connection pool keeps open a fixed number of database connections, the pool size, which it then uses to distribute and execute client requests. This means that you can accommodate far more simultaneous connections, efficiently deal with idle or stagnant clients, as well as queue up client requests during traffic spikes instead of rejecting them. By recycling connections, you can more efficiently use your machine's resources in an environment where there is a heavy connection volume, and squeeze extra performance out of your database.

      A connection pool can be implemented either on the application side or as middleware between the database and your application. The Managed Databases connection pooler is built on top of pgBouncer, a lightweight, open-source middleware connection pooler for PostgreSQL. Its interface is available via the Cloud Control Panel UI.

      Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool. You should see the following configuration window:

      Connection Pools Config Window

      Here, you can configure the following fields:

      • Pool Name: A unique name for your connection pool
      • Database: The database for which you'd like to pool connections
      • User: The PostgreSQL user the connection pool will authenticate as
      • Mode: One of Session, Transaction, or Statement. This option controls how long the pool assigns a backend connection to a client.
        • Session: The client holds on to the connection until it explicitly disconnects.
        • Transaction: The client obtains the connection until it completes a transaction, after which the connection is returned to the pool.
        • Statement: The pool aggressively recycles connections after each client statement. In statement mode, multi-statement transactions are not allowed. To learn more, consult the Connection Pools product documentation.
      • Pool Size: The number of connections the connection pool will keep open between itself and the database.

      Before we create a connection pool, we'll run a baseline test to which we can compare database performance with connection pooling.

      In this tutorial, we'll use a 4 GB RAM, 2 vCPU, 80 GB Disk, primary node only Managed Database setup. You can scale the benchmark test parameters in this section according to your PostgreSQL cluster specs.

      DigitalOcean Managed Database clusters have the PostgreSQL max_connections parameter preset to 25 connections per 1 GB RAM. A 4 GB RAM PostgreSQL node therefore has max_connections set to 100. In addition, for all clusters, 3 connections are reserved for maintenance. So for this 4 GB RAM PostgreSQL cluster, 97 connections are available for connection pooling.

      With this in mind, let's run our first baseline pgbench test.

      Log in to your client machine. We’ll run pgbench, specifying the database endpoint, port and user as usual. In addition, we’ll provide the following flags:

      • -c: The number of concurrent clients or database sessions to simulate. We set this to 50 so as to simulate a number of concurrent connections smaller than the max_connections parameter for our PostgreSQL cluster.
      • -j: The number of worker threads pgbench will use to run the benchmark. If you're using a multi-CPU machine, you can tune this upwards to distribute clients across threads. On a two-core machine, we set this to 2.
      • -P: Display progress and metrics every 60 seconds.
      • -T: Run the benchmark for 600 seconds (10 minutes). To produce consistent, reproducible results, it's important that you run the benchmark for several minutes, or through one checkpoint cycle.

      We’ll also specify that we'd like to run the benchmark against the benchmark database we created and populated earlier.

      Run the following complete pgbench command:

      • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 50 -j 2 -P 60 -T 600 benchmark

      Hit ENTER and then type in the password for the doadmin user to begin running the test. You should see output similar to the following (results will depend on the specs of your PostgreSQL cluster):

      Output

      starting vacuum...end. progress: 60.0 s, 157.4 tps, lat 282.988 ms stddev 40.261 progress: 120.0 s, 176.2 tps, lat 283.726 ms stddev 38.722 progress: 180.0 s, 167.4 tps, lat 298.663 ms stddev 238.124 progress: 240.0 s, 178.9 tps, lat 279.564 ms stddev 43.619 progress: 300.0 s, 178.5 tps, lat 280.016 ms stddev 43.235 progress: 360.0 s, 178.8 tps, lat 279.737 ms stddev 43.307 progress: 420.0 s, 179.3 tps, lat 278.837 ms stddev 43.783 progress: 480.0 s, 178.5 tps, lat 280.203 ms stddev 43.921 progress: 540.0 s, 180.0 tps, lat 277.816 ms stddev 43.742 progress: 600.0 s, 178.5 tps, lat 280.044 ms stddev 43.705 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 105256 latency average = 282.039 ms latency stddev = 84.244 ms tps = 175.329321 (including connections establishing) tps = 175.404174 (excluding connections establishing)

      Here, we observed that over a 10 minute run with 50 concurrent sessions, we processed 105,256 transactions with a throughput of roughly 175 transactions per second.

      Now, let's run the same test, this time using 150 concurrent clients, a value that is higher than max_connections for this database, to synthetically simulate a mass influx of client connections:

      • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 150 -j 2 -P 60 -T 600 benchmark

      You should see output similar to the following:

      Output

      starting vacuum...end. connection to database "pgbench" failed: FATAL: remaining connection slots are reserved for non-replication superuser connections progress: 60.0 s, 182.6 tps, lat 280.069 ms stddev 42.009 progress: 120.0 s, 253.8 tps, lat 295.612 ms stddev 237.448 progress: 180.0 s, 271.3 tps, lat 276.411 ms stddev 40.643 progress: 240.0 s, 273.0 tps, lat 274.653 ms stddev 40.942 progress: 300.0 s, 272.8 tps, lat 274.977 ms stddev 41.660 progress: 360.0 s, 250.0 tps, lat 300.033 ms stddev 282.712 progress: 420.0 s, 272.1 tps, lat 275.614 ms stddev 42.901 progress: 480.0 s, 261.1 tps, lat 287.226 ms stddev 112.499 progress: 540.0 s, 272.5 tps, lat 275.309 ms stddev 41.740 progress: 600.0 s, 271.2 tps, lat 276.585 ms stddev 41.221 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 154892 latency average = 281.421 ms latency stddev = 125.929 ms tps = 257.994941 (including connections establishing) tps = 258.049251 (excluding connections establishing)

      Note the FATAL error, indicating that pgbench hit the 100 connection limit threshold set by max_connections, resulting in a refused connection. The test was still able to complete, with a TPS of roughly 257.

      At this point we can investigate how a connection pool could potentially improve our database's throughput.

      Step 3 — Creating and Testing a Connection Pool

      In this step we'll create a connection pool and rerun the previous pgbench test to see if we can improve our database's throughput.

      In general, the max_connections setting and connection pool parameters are tuned in tandem to max out the database's load. However, because max_connections is abstracted away from the user in DigitalOcean Managed Databases, our main levers here are the connection pool Mode and Size settings.

      To begin, let's create a connection pool in Transaction mode that keeps open all the available backend connections.

      Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool.

      In the configuration window that appears, fill in the following values:

      Connection Pool Configuration Values

      Here we name our connection pool test-pool, and use it with the benchmark database. Our database user is doadmin and we set the connection pool to Transaction mode. Recall from earlier that for a managed database cluster with 4GB of RAM, there are 97 available database connections. Accordingly, configure the pool to keep open 97 database connections.

      When you're done, hit Create Pool.

      You should now see this pool in the Control Panel:

      Connection Pool in Control Panel

      Grab its URI by clicking Connection Details. It should look something like the following

      postgres://doadmin:password@pool_endpoint:pool_port/test-pool?sslmode=require
      

      You should notice a different port here, and potentially a different endpoint and database name, corresponding to the pool name test-pool.

      Now that we've created the test-pool connection pool, we can rerun the pgbench test we ran above.

      Rerun pgbench

      From your client machine, run the following pgbench command (with 150 concurrent clients), making sure to substitute the highlighted values with those in your connection pool URI:

      • pgbench -h pool_endpoint -p pool_port -U doadmin -c 150 -j 2 -P 60 -T 600 test-pool

      Here we once again use 150 concurrent clients, run the test across 2 threads, print progress every 60 seconds, and run the test for 600 seconds. We set the database name to test-pool, the name of the connection pool.

      Once the test completes, you should see output similar to the following (note that these results will vary depending on the specs of your database node):

      Output

      starting vacuum...end. progress: 60.0 s, 240.0 tps, lat 425.251 ms stddev 59.773 progress: 120.0 s, 350.0 tps, lat 428.647 ms stddev 57.084 progress: 180.0 s, 340.3 tps, lat 440.680 ms stddev 313.631 progress: 240.0 s, 364.9 tps, lat 411.083 ms stddev 61.106 progress: 300.0 s, 366.5 tps, lat 409.367 ms stddev 60.165 progress: 360.0 s, 362.5 tps, lat 413.750 ms stddev 59.005 progress: 420.0 s, 359.5 tps, lat 417.292 ms stddev 60.395 progress: 480.0 s, 363.8 tps, lat 412.130 ms stddev 60.361 progress: 540.0 s, 351.6 tps, lat 426.661 ms stddev 62.960 progress: 600.0 s, 344.5 tps, lat 435.516 ms stddev 65.182 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 206768 latency average = 421.719 ms latency stddev = 114.676 ms tps = 344.240797 (including connections establishing) tps = 344.385646 (excluding connections establishing)

      Notice here that we were able to increase our database's throughput from 257 TPS to 344 TPS with 150 concurrent connections (an increase of 33%), and did not run up against the max_connections limit we previously hit without a connection pool. By placing a connection pool in front of the database, we can avoid dropped connections and significantly increase database throughput in an environment with a large number of simultaneous connections.

      If you run this same test, but with a -c value of 50 (specifying a smaller number of clients), the gains from using a connection pool become much less evident:

      Output

      starting vacuum...end. progress: 60.0 s, 154.0 tps, lat 290.592 ms stddev 35.530 progress: 120.0 s, 162.7 tps, lat 307.168 ms stddev 241.003 progress: 180.0 s, 172.0 tps, lat 290.678 ms stddev 36.225 progress: 240.0 s, 172.4 tps, lat 290.169 ms stddev 37.603 progress: 300.0 s, 177.8 tps, lat 281.214 ms stddev 35.365 progress: 360.0 s, 177.7 tps, lat 281.402 ms stddev 35.227 progress: 420.0 s, 174.5 tps, lat 286.404 ms stddev 34.797 progress: 480.0 s, 176.1 tps, lat 284.107 ms stddev 36.540 progress: 540.0 s, 173.1 tps, lat 288.771 ms stddev 38.059 progress: 600.0 s, 174.5 tps, lat 286.508 ms stddev 59.941 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 102938 latency average = 288.509 ms latency stddev = 83.503 ms tps = 171.482966 (including connections establishing) tps = 171.553434 (excluding connections establishing)

      Here we see that we were not able to increase throughput by using a connection pool. Our throughput went down to 171 TPS from 175 TPS.

      Although in this guide we use pgbench with its built-in benchmark data set, the best test for determining whether or not to use a connection pool is a benchmark load that accurately represents production load on your database, against production data. Creating custom benchmarking scripts and data is beyond the scope of this guide, but to learn more, consult the official pgbench documentation.

      Note: The pool size setting is highly workload-specific. In this guide, we configured the connection pool to use all the available backend database connections. This was because throughout our benchmark, the database rarely reached full utilization (you can monitor database load from the Metrics tab in the Cloud Control Panel). Depending on your database's load, this may not be the optimal setting. If you notice that your database is constantly fully saturated, shrinking the connection pool may increase throughput and improve performance by queuing additional requests instead of trying to execute them all at the same time on an already loaded server.

      Conclusion

      DigitalOcean Managed Databases connection pooling is a powerful feature that can help you quickly squeeze extra performance out of your database. Along with other techniques like replication, caching, and sharding, connection pooling can help you scale your database layer to process an even greater volume of requests.

      In this guide we focused on a simplistic and synthetic testing scenario using PostgreSQL's built-in pgbench benchmarking tool and its default benchmark test. In any production scenario, you should run benchmarks against actual production data while simulating production load. This will allow you to tune your database for your particular usage pattern.

      Along with pgbench, other tools exist to benchmark and load your database. One such tool developed by Percona is sysbench-tpcc. Another is Apache's JMeter, which can load test databases as well as web applications.

      To learn more about DigitalOcean Managed Databases, consult the Managed Databases product documentation. To learn more about sharding, another useful scaling technique, consult Understanding Database Sharding.

      References



      Source link