One place for hosting & domains

      Queries

      How To Create Queries in MongoDB


      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Documents stored in a MongoDB database can vary widely. Some might be relatively small and contain only a few entries, like items in a shopping list. Others might be highly complex, containing dozens of fields of different types, arrays holding multiple values, and even other documents nested within the larger structure.

      Regardless of how complex your documents are or how many you have, most often you won’t need to review the data in all of them at once. Instead, you’ll more likely want to only retrieve documents that satisfy one or more particular conditions. Similar to how you would find your holiday destination by selecting a range of filters on a booking website, such as distance from the seaside, pet-friendliness, a pool, and nearby parking, you can precisely query MongoDB to find exactly the documents you need. MongoDB provides a robust query mechanism for defining filtering criteria when retrieving documents.

      In this tutorial, you’ll learn how to query MongoDB collections using a different range of filters and conditions. You will also learn what cursors are and how to use them within the MongoDB shell.

      Prerequisites

      To follow this tutorial, you will need:

      Note: The linked tutorials on how to configure your server, install, and then secure MongoDB installation refer to Ubuntu 20.04. This tutorial concentrates on MongoDB itself, not the underlying operating system. It will generally work with any MongoDB installation regardless of the operating system as long as authentication has been enabled.

      Step 1 — Preparing the Sample Database

      To explain how to create queries in MongoDB — including how to filter documents with multiple fields, nested documents, and arrays — this guide uses an example database containing a collection of documents that describe the five highest mountains in the world.

      To create this sample collection, connect to the MongoDB shell as your administrative user. This tutorial follows the conventions of the prerequisite MongoDB security tutorial and assumes the name of this administrative user is AdminSammy and its authentication database is admin. Be sure to change these details in the following command to reflect your own setup, if different:

      • mongo -u AdminSammy -p --authenticationDatabase admin

      When prompted, enter the password you set when you created your administrative user. After providing the password, your prompt will change to a greater-than (>) sign:

      Note: On a fresh connection, the MongoDB shell will automatically connect to the test database by default. You can safely use this database to experiment with MongoDB and the MongoDB shell.

      Alternatively, you could also switch to another database to run all of the example commands given in this tutorial. To switch to another database, run the use command followed by the name of your database:

      To understand how MongoDB filters documents with multiple fields, nested documents and arrays, you’ll need sample data complex enough to allow exploring different types of queries. As mentioned previously, this guide uses a sample collection of the five highest mountains in the world.

      The documents in this collection will follow this format. This example document describes Mount Everest:

      Mount Everest document

      {
          "name": "Everest",
          "height": 8848,
          "location": ["Nepal", "China"],
          "ascents": {
              "first": {
                  "year": 1953,
              },
              "first_winter": {
                  "year": 1980,
              },
              "total": 5656,
          }
      }
      

      This document contains the following fields and values:

      • name: the peak’s name
      • height: the peak’s elevation, in meters
      • location: the countries in which the mountain is located. This field stores values as an array to allow for mountains located in more than one country
      • ascents: this field’s value is another document. When one document is stored within another document like this, it’s known as an embedded or nested document. Each ascents document describes successful ascents of the given mountain. Specifically, each ascents document contains a total field that lists the total number of successful ascents of each given peak. Additionally, each of these nested documents contain two fields whose values are also nested documents:
        • first: this field’s value is a nested document that contains one field, year, which describes the year of the first overall successful ascent
        • first_winter: this field’s value is a nested document that also contains a year field, the value of which represents the year of the first successful winter ascent of the given mountain

      The reason why the first ascents are represented as nested documents even though only the year is included now is to make it easier to expand the ascent details with more fields in the future, such as the summiters’ names or the expedition details.

      Run the following insertMany() method in the MongoDB shell to simultaneously create a collection named peaks and insert five sample documents into it. These documents describe the five tallest mountain peaks in the world:

      • db.peaks.insertMany([
      • {
      • "name": "Everest",
      • "height": 8848,
      • "location": ["Nepal", "China"],
      • "ascents": {
      • "first": {
      • "year": 1953
      • },
      • "first_winter": {
      • "year": 1980
      • },
      • "total": 5656
      • }
      • },
      • {
      • "name": "K2",
      • "height": 8611,
      • "location": ["Pakistan", "China"],
      • "ascents": {
      • "first": {
      • "year": 1954
      • },
      • "first_winter": {
      • "year": 1921
      • },
      • "total": 306
      • }
      • },
      • {
      • "name": "Kangchenjunga",
      • "height": 8586,
      • "location": ["Nepal", "India"],
      • "ascents": {
      • "first": {
      • "year": 1955
      • },
      • "first_winter": {
      • "year": 1986
      • },
      • "total": 283
      • }
      • },
      • {
      • "name": "Lhotse",
      • "height": 8516,
      • "location": ["Nepal", "China"],
      • "ascents": {
      • "first": {
      • "year": 1956
      • },
      • "first_winter": {
      • "year": 1988
      • },
      • "total": 461
      • }
      • },
      • {
      • "name": "Makalu",
      • "height": 8485,
      • "location": ["China", "Nepal"],
      • "ascents": {
      • "first": {
      • "year": 1955
      • },
      • "first_winter": {
      • "year": 2009
      • },
      • "total": 361
      • }
      • }
      • ])

      The output will contain a list of object identifiers assigned to the newly-inserted objects.

      Output

      { "acknowledged" : true, "insertedIds" : [ ObjectId("610c23828a94efbbf0cf6004"), ObjectId("610c23828a94efbbf0cf6005"), ObjectId("610c23828a94efbbf0cf6006"), ObjectId("610c23828a94efbbf0cf6007"), ObjectId("610c23828a94efbbf0cf6008") ] }

      You can verify that the documents were properly inserted by running the find() method with no arguments, which will retrieve all the documents you just added:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } } . . .

      With that, you have successfully created the list of example documents of mountains that will serve as the test data for creating queries. Next, you’ll learn how to query with conditions referring to individual fields.

      Step 2 — Querying Individual Fields

      At the end of the previous step, you used MongoDB’s find() method to return every document from the peaks collection. A query like this won’t be very useful in practice, though, as it doesn’t filter any documents and always returns the same result set.

      You can filter query results in MongoDB by defining a specific condition that documents must adhere to in order to be included in a result set. If you have followed the How To Perform CRUD operations in MongoDB tutorial, you have already used the most basic filtering condition: the equality condition.

      As an example, run the following query which returns any documents whose name value is equal to Everest:

      • db.peaks.find(
      • { "name": "Everest" }
      • )

      The second line — { "name": "Everest" } — is the query filter document, a JSON object specifying the filters to apply when searching the collection in order to find documents that satisfy the condition. This example operation tells MongoDB to retrieve any documents in the peaks collection whose name value matches the string Everest:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } }

      MongoDB returned a single document, as there is only one Mt. Everest in the peaks collection.

      The equality condition specifies a single value that MongoDB will attempt to match against documents in the collection. MongoDB provides comparison query operators that allow you to specify other conditions that also refer to a single field, but filter documents in ways that are more complex than searching for exact matches.

      A comparison operator consists of the operator itself, a single key preceded by a dollar sign ($), and the value the query operator will use to filter documents.

      To illustrate, run the following query which searches for any documents whose name value does not equal Everest:

      • db.peaks.find(
      • { "name": { $ne: "Everest" } }
      • )

      This time, the query filter document includes { $ne: "Everest" }. $ne is the comparison operator in this example, and it stands for “not equal”. The peak name, Everest, appears again as the value for this operator. Because this query is searching for documents whose name value is not equal to Everest, it returns four documents:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6005"), "name" : "K2", "height" : 8611, "location" : [ "Pakistan", "China" ], "ascents" : { "first" : { "year" : 1954 }, "first_winter" : { "year" : 1921 }, "total" : 306 } } { "_id" : ObjectId("610c23828a94efbbf0cf6006"), "name" : "Kangchenjunga", "height" : 8586, "location" : [ "Nepal", "India" ], "ascents" : { "first" : { "year" : 1955 }, "first_winter" : { "year" : 1986 }, "total" : 283 } } . . .

      The $in operator allows you to write queries that will return documents with values matching one of multiple values held in an array.

      The following example query includes the $in operator, and will return documents whose name value matches either Everest or K2:

      • db.peaks.find(
      • { "name": { $in: ["Everest", "K2"] } }
      • )

      Instead of a single value, the value passed to the $in operator is an array of two peak names in square braces. MongoDB returns two documents, just as expected:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } } { "_id" : ObjectId("610c23828a94efbbf0cf6005"), "name" : "K2", "height" : 8611, "location" : [ "Pakistan", "China" ], "ascents" : { "first" : { "year" : 1954 }, "first_winter" : { "year" : 1921 }, "total" : 306 } }

      The examples so far have queried the name field with text values. You can also filter documents based on numerical values.

      The following example query searches for documents whose height value is greater than 8500:

      • db.peaks.find(
      • { "height": { $gt: 8500 } }
      • )

      This query includes the $gt operator, which stands for greater than. By passing it the value 8500, MongoDB will return documents whose height value is greater than 8500:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } } { "_id" : ObjectId("610c23828a94efbbf0cf6005"), "name" : "K2", "height" : 8611, "location" : [ "Pakistan", "China" ], "ascents" : { "first" : { "year" : 1954 }, "first_winter" : { "year" : 1921 }, "total" : 306 } } { "_id" : ObjectId("610c23828a94efbbf0cf6006"), "name" : "Kangchenjunga", "height" : 8586, "location" : [ "Nepal", "India" ], "ascents" : { "first" : { "year" : 1955 }, "first_winter" : { "year" : 1986 }, "total" : 283 } } { "_id" : ObjectId("610c23828a94efbbf0cf6007"), "name" : "Lhotse", "height" : 8516, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1956 }, "first_winter" : { "year" : 1988 }, "total" : 461 } }

      MongoDB offers a number of comparison query operators in addition to the ones outlined in this section. For a full list of these operators, see the official documentation on the subject.

      Now that you know how to use equality conditions and comparison operators on a single document field, you can move onto learning how to join multiple conditions together in a single query.

      Step 3 — Using Multiple Conditions

      Sometimes, filtering based on a single document field is not enough to precisely select documents of interest. In such cases, you might want to filter documents using multiple conditions at once.

      There are two ways to connect multiple conditions in MongoDB. The first is to use a logical AND conjunction to select documents in the collection matching all the conditions, or the logical OR to select documents matching at least one condition from the list.

      In MongoDB, the AND conjunction is implicit when using more than one field in the query filter document. Try selecting a mountain that matches the name Everest and the exact height of 8848 meters:

      • db.peaks.find(
      • { "name": "Everest", "height": 8848 }
      • )

      Notice that the syntax is similar to the equality condition example from the previous step, but this time two fields appear in the query filter document. MongoDB checks for equality on both fields and requires both to match the requested values in order for a document to be selected:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } }

      In this case, a single document is returned, but if you try changing the height to any other numerical value the result set will be empty since any returned documents must match both conditions. For instance, the following example will not return any output to the shell:

      • db.peaks.find(
      • { "name": "Everest", "height": 9000 }
      • )

      This implicit AND can be made explicit by including the $and logical query operator followed by a list of conditions that returned documents must satisfy. The following example is essentially the same query as the previous one, but includes the $and operator instead of an implicit AND conjunction:

      • db.peaks.find(
      • { $and: [{"name": "Everest"}, {"height": 8848}] }
      • )

      This time the JSON object containing the $and query operator is the query filter document itself. Here, the comparison operator takes two separate equality conditions that appear in the list, one for name matches and the latter for height matches.

      In order to select documents matching any of the chosen conditions rather than all of them, you can instead use the $or operator:

      • db.peaks.find(
      • { $or: [{"name": "Everest"}, {"name": "K2"}] }
      • )

      When using the $or operator, a document only needs to satisfy one of the two the equality filters:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } } { "_id" : ObjectId("610c23828a94efbbf0cf6005"), "name" : "K2", "height" : 8611, "location" : [ "Pakistan", "China" ], "ascents" : { "first" : { "year" : 1954 }, "first_winter" : { "year" : 1921 }, "total" : 306 } }

      Although each of this example’s conditions are single-field equality conditions, both the $and and $or operators can contain any valid query filter documents. They can even include nested AND/OR condition lists.

      Joining multiple filters together using $and and $or operators as outlined in this step can be very helpful with retrieving fine-grained query results. However, the examples so far have all used query filter documents that filter based on individual values. The next step outlines how to query against values stored in an array field.

      Step 4 — Querying for Array Values

      Sometimes a single field may contain multiple values stored in an array. In our example with mountain peaks, location is such a field. Because mountains often span more than one country, like Kangchenjunga in Nepal and India, a single value may not always be enough for this field.

      In this step, you’ll learn how to construct query filters that match items in array fields.

      Let’s start by trying to select documents representing mountains that are in Nepal. For this example, though, it’s okay if the mountain has multiple locations listed, as long as one of them is Nepal:

      • db.peaks.find(
      • { "location": "Nepal" }
      • )

      This query uses an equality condition that tells MongoDB to return documents whose location value exactly matches the given string value, Nepal, similar to the previous examples that used the name field. MongoDB will select any documents in which the requested value appears in any place in the arrays:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } } { "_id" : ObjectId("610c23828a94efbbf0cf6006"), "name" : "Kangchenjunga", "height" : 8586, "location" : [ "Nepal", "India" ], "ascents" : { "first" : { "year" : 1955 }, "first_winter" : { "year" : 1986 }, "total" : 283 } } { "_id" : ObjectId("610c23828a94efbbf0cf6007"), "name" : "Lhotse", "height" : 8516, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1956 }, "first_winter" : { "year" : 1988 }, "total" : 461 } } { "_id" : ObjectId("610c23828a94efbbf0cf6008"), "name" : "Makalu", "height" : 8485, "location" : [ "China", "Nepal" ], "ascents" : { "first" : { "year" : 1955 }, "first_winter" : { "year" : 2009 }, "total" : 361 } }

      For this query, MongoDB returned the four documents in which Nepal appears in the location field.

      However, what if you wanted to find mountains located in both China and Nepal? To do this, you could include an array in the filter document, rather than a single value:

      • db.peaks.find(
      • { "location": ["China", "Nepal"] }
      • )

      Even though there are four mountains in Nepal and China in the database, there is only one in which the countries are listed in the order given in this query, so this query returns a single document:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6008"), "name" : "Makalu", "height" : 8485, "location" : [ "China", "Nepal" ], "ascents" : { "first" : { "year" : 1955 }, "first_winter" : { "year" : 2009 }, "total" : 361 } }

      Notice that the value of the location field for Makalu is identical to the query’s filter document. When you supply an array as the value for the equality condition like this, MongoDB will retrieve documents where the location field matches the query filter exactly, including the order of elements inside the array. To illustrate, run the query again but swap China with Nepal:

      • db.peaks.find(
      • { "location": ["Nepal", "China"] }
      • )

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } } { "_id" : ObjectId("610c23828a94efbbf0cf6007"), "name" : "Lhotse", "height" : 8516, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1956 }, "first_winter" : { "year" : 1988 }, "total" : 461 } }

      Now, two other mountains are returned, but Makalu is not.

      Using the equality condition like this is not helpful in cases where you care only about elements in an array (regardless of their order) rather than an exact match. Fortunately, MongoDB allows you to retrieve documents containing more than one array element anywhere in an array using the $all query operator.

      To illustrate, run the following query:

      • db.peaks.find(
      • { "location": { $all: ["China", "Nepal"] } }
      • )

      The $all operator will ensure that documents will be checked whether their location array contains both China and Nepal inside in any order. MongoDB will return all three mountains in a single query:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } } { "_id" : ObjectId("610c23828a94efbbf0cf6007"), "name" : "Lhotse", "height" : 8516, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1956 }, "first_winter" : { "year" : 1988 }, "total" : 461 } } { "_id" : ObjectId("610c23828a94efbbf0cf6008"), "name" : "Makalu", "height" : 8485, "location" : [ "China", "Nepal" ], "ascents" : { "first" : { "year" : 1955 }, "first_winter" : { "year" : 2009 }, "total" : 361 } }

      This step outlined how to use arrays in query filter documents to retrieve documents with more than one value in a single field. If you want to query data held within a nested document, you’ll need to use the special syntax required for such an operation. Continue onto the next step to learn how to do this.

      Step 5 — Querying Fields in Nested Documents

      Recall that the example database documents include an ascent field that holds various details about each mountain’s first ascents an array. This way, the data about the first ascent, the winter ascent, and the total number of ascents is cleanly grouped inside a single nested document. This step explains how you can access fields within a nested document when building queries.

      Review the sample Everest document once more:

      The Everest document

      {
          "name": "Everest",
          "height": 8848,
          "location": ["Nepal", "China"],
          "ascents": {
              "first": {
                  "year": 1953,
              },
              "first_winter": {
                  "year": 1980,
              },
              "total": 5656,
          }
      }
      

      Accessing the name and height fields was straightforward, as a single value resides under these keys. But say you wanted to find the total number of ascents for a given peak. The ascents field contains more data than just the total number of ascents inside. There is a total field, but it’s not part of the main document, so there’s no way to access it directly.

      To solve this issue, MongoDB provides a dot notation to access fields in nested documents.

      To illustrate how MongoDB’s dot notation works, run the following query. This will return all the mountains in the collection that have been ascended more than 1000 times, using the $gt operator highlighted previously:

      • db.peaks.find(
      • { "ascents.total": { $gt: 1000 } }
      • )

      Mt. Everest is the only mountain in the collection with more than 1000 ascents, so only its document will be returned:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848, "location" : [ "Nepal", "China" ], "ascents" : { "first" : { "year" : 1953 }, "first_winter" : { "year" : 1980 }, "total" : 5656 } }

      While the { $gt: 1000 } query filter with $gt operator is familiar, notice how this query accesses the total field held within the document stored in the ascents field. In nested documents, the access path to any given field is constructed with dots indicating the action of going inside the nested object.

      So, ascents.total means that MongoDB should first open the nested document that the ascents field points to and then find the total field within it.

      The notation works with multiple nested documents as well:

      • db.peaks.find(
      • { "ascents.first_winter.year": { $gt: 2000 } }
      • )

      This query will return any documents describing mountains that were first ascended in winter only after the year 2000:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6008"), "name" : "Makalu", "height" : 8485, "location" : [ "China", "Nepal" ], "ascents" : { "first" : { "year" : 1955 }, "first_winter" : { "year" : 2009 }, "total" : 361 } }

      As before, the ascents.first_winter.year notation means MongoDB first finds the ascents field and finds the nested documents there. It then goes into another nested document, first_winter, and finally retrieves the year field from within it.

      The dot notation can be used to access any depth of nested documents in MongoDB.

      By now you will have a good understanding of how to access data from nested documents and how to filter query results. You can move on to learning how to limit the list of fields returned by your queries.

      Step 6 — Returning a Subset of Fields

      In all the examples so far, whenever you queried the peaks collection, MongoDB returned one or more full documents. Oftentimes, you’ll only need information from a handful of fields. As an example, you might only want to find the names of the mountains in the database.

      This isn’t just a matter of legibility, but also of performance. If only a small part of a document is needed, retrieving whole document objects would be an unnecessary performance burden on the database. This may not be a problem when working with small datasets like this tutorial’s examples, but it becomes an important consideration when working with many large, complex documents.

      As an example, say you’re only interested in mountain names stored in the peaks collection, but the ascent details or location are not important this time. You could limit the fields your query will return by following the query filter document with a projection.

      A projection document is a JSON object where keys correspond to the fields of the queried documents. Projections can be either constructed as inclusion projections or exclusion projections. When the projection document contains keys with 1 as their values, it describes the list of fields that will be included in the result. If, on the other hand, projection keys are set to 0, the projection document describes the list of fields that will be excluded from the result.

      Run the following query, which includes the by-now familiar find() method. This query’s find() method includes two arguments, instead of one. The first, {}, is the query filter document. Here it’s an empty JSON object, meaning it won’t apply any filtering. The second argument, { "name": 1 }, describes the projection and means that the query results will only include each document’s name field:

      • db.peaks.find(
      • {},
      • { "name": 1 }
      • )

      After running this example query, MongoDB returns the following results:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest" } { "_id" : ObjectId("610c23828a94efbbf0cf6005"), "name" : "K2" } { "_id" : ObjectId("610c23828a94efbbf0cf6006"), "name" : "Kangchenjunga" } { "_id" : ObjectId("610c23828a94efbbf0cf6007"), "name" : "Lhotse" } { "_id" : ObjectId("610c23828a94efbbf0cf6008"), "name" : "Makalu" }

      Notice that the returned documents are simplified, and contain only the name and _id fields. MongoDB always includes the _id key, even if it’s not explicitly requested.

      To illustrate how to specify what fields to exclude, run the following query. It will return data from each document, but will exclude the ascents and location fields:

      • db.peaks.find(
      • {},
      • { "ascents": 0, "location": 0 }
      • )

      MongoDB returns all five mountains once again, but this time only the name, height, and _id fields are present:

      Output

      { "_id" : ObjectId("610c23828a94efbbf0cf6004"), "name" : "Everest", "height" : 8848 } { "_id" : ObjectId("610c23828a94efbbf0cf6005"), "name" : "K2", "height" : 8611 } { "_id" : ObjectId("610c23828a94efbbf0cf6006"), "name" : "Kangchenjunga", "height" : 8586 } { "_id" : ObjectId("610c23828a94efbbf0cf6007"), "name" : "Lhotse", "height" : 8516 } { "_id" : ObjectId("610c23828a94efbbf0cf6008"), "name" : "Makalu", "height" : 8485 }

      Note: When specifying projections, you cannot mix inclusions and exclusions. You either have to specify the list of fields to include, or a list of fields to exclude.

      There is, however, one exception to this rule. MongoDB allows you to exclude the _id field from a result set even when the query has an inclusion projection applied. To suppress the _id field, you can append "_id": 0 to the projection document. The following example is similar to the previous example query, but will exclude every field, including _id, except for the name field:

      • db.peaks.find(
      • {},
      • { "_id": 0, "name": 1 }
      • )

      Output

      { "name" : "Everest" } { "name" : "K2" } { "name" : "Kangchenjunga" } { "name" : "Lhotse" } { "name" : "Makalu" }

      Projections can also be used to include or exclude fields in nested documents. Say, for example, that you want to know each mountain’s first winter ascent and the total number of ascents, both of which are nested within the ascents field. Additionally, you want to return each mountain’s name. To do this, you could run a query like this:

      • db.peaks.find(
      • {},
      • { "_id": 0, "name": 1, "ascents": { "first_winter": 1, "total": 1 } }
      • )

      Notice how the projection is specified for the ascents fields and how it follows the structure of the nested document, being a nested projection itself. By using "first_winter": 1, "total": 1 this query tells the database to include only these two fields from the nested document and no other.

      The returned documents will contain only the requested fields:

      Output

      { "name" : "Everest", "ascents" : { "first_winter" : { "year" : 1980 }, "total" : 5656 } } { "name" : "K2", "ascents" : { "first_winter" : { "year" : 1921 }, "total" : 306 } } { "name" : "Kangchenjunga", "ascents" : { "first_winter" : { "year" : 1988 }, "total" : 461 } } { "name" : "Makalu", "ascents" : { "first_winter" : { "year" : 2009 }, "total" : 361 } }

      Limiting the size of returned documents to only a subset of fields can be helpful with making result sets more readable and can even improve performance. The next step outlines how to limit the number of documents returned by a query, and also details how to sort the data returned by a query.

      Step 7 — Using Cursors to Sort and Limit Query Results

      When retrieving objects from a large collection, there may be times when you want to limit the number of results or perhaps sort them in a particular order. For example, a popular approach for shopping sites is to sort products by their price. MongoDB uses cursors which allow you to limit the number of documents returned in a query result set and also sort the results in ascending or descending order.

      Recall this example query from Step 1:

      You may recall that the result set returned by this query includes all the data from each document in the peaks collection. While it may seem like MongoDB returns all the objects from the peaks collection, this is not the case. What MongoDB actually returns is a cursor object.

      A cursor is a pointer to the result set of a query but it is not the result set itself. It’s an object that can be iterated, meaning that you can request the cursor to return the next document in line, and only then will the full document be retrieved from the database. Until that happens, the cursor only points to the next document on the list.

      With cursors, MongoDB can ensure that the actual document retrieval happens only when it’s needed. This can have significant performance implications when the documents in question are large or many of them are requested at once.

      To illustrate how cursors work, run the following operation which includes both the find() and count() methods:

      MongoDB will respond with 5:

      Output

      5

      Under the hood, the find() method finds and then returns a cursor, and then the count() method is called on that cursor. This lets MongoDB know that you’re interested in the object count and not the documents themselves. This means that documents won’t be a part of the results — all the database will return is the count. Using methods on the cursor object to further modify the query before retrieving documents from the cursor, you can ensure only the database operations that you ask for will be performed on the collection.

      Note: When executing queries, the MongoDB shell automatically iterates over the returned cursors 20 times so as to display the first 20 results on the screen. This is specific to the MongoDB shell. When working with MongoDB programmatically, it won’t immediately retrieve any results from a cursor.

      Another MongoDB method that uses cursors to alter a result set is the limit() method. As its name implies, you can use limit() to limit the number of results a query will return.

      Run the following query which will retrieve only three mountain peaks from the collection:

      • db.peaks.find(
      • {},
      • { "_id": 0, "name": 1, "height": 1 }
      • ).limit(3)

      MongoDB shell will respond with three objects rather than five, even though the query isn’t filtering any data:

      Output

      { "name" : "Everest", "height" : 8848 } { "name" : "K2", "height" : 8611 } { "name" : "Kangchenjunga", "height" : 8586 }

      The limit(3) method applied on the cursor tells the cursor to stop returning further documents after reaching the first 3. Using the limit() cursor method like this with large collections will help to ensure that you only retrieve the results you need and no more.

      By default, MongoDB will return objects in the order of their insertion, but you might want to alter that behavior. Say you’re interested in finding the three lowest mountain peaks held in the database. You could run the following query:

      • db.peaks.find(
      • {},
      • { "_id": 0, "name": 1, "height": 1 }
      • ).limit(3).sort({ "height": 1 })

      The added sort({ "height": 1 }) causes the result set to differ from the previous example:

      Output

      { "name" : "Makalu", "height" : 8485 } { "name" : "Lhotse", "height" : 8516 } { "name" : "Kangchenjunga", "height" : 8586 }

      Again, only three mountain peaks are returned. However, this time they have been sorted ascending from the one with the lowest height value.

      The sort() method on the cursor accepts a JSON object — height — as an argument, similar to the projection document. It also accepts the list of keys that will be used to sort against. The accepted value is either 1 for ascending or -1 for descending sort order for each key.

      Conclusion

      By reading this article, you familiarized yourself with the way MongoDB uses to filter query results. You filtered collection documents against individual fields, multiple conditions, and complex structures such as arrays and nested documents. You have also learned to select only a subset of fields and sort the results using cursor methods. These techniques can be used to retrieve only documents of interest from otherwise large collections.

      The tutorial described only a handful of query operators put forward by MongoDB to allow precise document querying. You can study the official official MongoDB documentation to learn more about different query operators.



      Source link

      How To Refine Database Queries in Laravel with Eloquent Where()



      Part of the Series:
      A Practical Introduction to Laravel Eloquent ORM

      Eloquent is an object relational mapper (ORM) that is included by default within the Laravel framework. In this project-based series, you’ll learn how to make database queries and how to work with relationships in Laravel Eloquent. To practice the examples explained throughout the series, you’ll improve a demo application with new models and relationships.

      In a previous part of this series, you updated the demo application to organize links into lists in your database. The main application view now shows a menu with all lists that are currently registered within the database, but the menu has no active links yet.

      In this section, you’ll create a new route within the application to show links by list. You’ll also learn how to use the where() method in Eloquent to better filter results in a database query.

      To get started, open the routes/web.php file in your code editor:

      routes/web.php
      

      The file currently has the following content:

      routes/web.php

      <?php
      
      use IlluminateSupportFacadesRoute;
      use AppModelsLink;
      use AppModelsLinkList;
      
      /*
      |--------------------------------------------------------------------------
      | Web Routes
      |--------------------------------------------------------------------------
      |
      | Here is where you can register web routes for your application. These
      | routes are loaded by the RouteServiceProvider within a group which
      | contains the "web" middleware group. Now create something great!
      |
      */
      
      Route::get('/', function () {
          $links = Link::all()->sortDesc();
          return view('index', [
              'links' => $links,
              'lists' => LinkList::all()
          ]);
      });
      
      

      The Route::get call defines an HTTP GET route for the application entry page. When a request to / is made to the application, this callback function will be triggered to return the index view.

      You’ll now create a second route to show lists of links based on a list slug. A slug is a short string that is typically used to build user-friendly URLs. The new route must query the database’s link_lists table for a list that has the provided URL parameter as its slug field. If a list with that slug cannot be found, the application should inform the user with an HTTP 404 or not found error.

      The following code creates a GET route using a dynamic parameter (defined by {slug}), named link-list. This will:

      • Use the LinkList Eloquent model to query the database with the where() method, using slug as search criteria. The first() method will make sure only one object is returned.
      • If a list with the specified slug can’t be found, a 404 error is thrown with the abort method.
      • If a valid list is found, the index view is then rendered, and the list is provided as a template parameter.

      The $lists parameter is provided to build the list menu, and the $links parameter is provided for compatibility with the current version of the index view, since it loops through a variable with that name.

      Include the following code at the bottom of your routes/web.php file:

      routes/web.php

      Route::get('/{slug}', function ($slug) {
          $list = LinkList::where('slug', $slug)->first();
          if (!$list) {
              abort(404);
          }
      
          return view('index', [
              'list' => $list,
              'links' => $list->links,
              'lists' => LinkList::all()
          ]);
      })->name('link-list');
      
      

      Save the file when you’re done.

      Although there are shortcuts to implement routes that reference Eloquent models, in this tutorial we focus on using the where() method for learning purposes.

      To test that your new route works as expected, you can go to your browser and access the link for the default list page. If you’ve followed all steps in this series so far and your database is not empty, the default list page should be available at the following local address:

      http://localhost:8000/default
      

      You’ll see the same page as before, but links are now limited to those from the default list. If you have additional lists, you can access their page by replacing the highlighted default slug in the URL with the slug of your list.

      With your new route configured, you can now use the route Blade method to dynamically generate URLs for your link lists, from your template view. You can also customize the page header to show information about a list in case one is available.

      Open the resources/views/index.blade.php file in your code editor:

      resources/views/index.blade.php
      

      This file has 2 lines that need updating. First, locate the subtitle paragraph containing the menu you created in another part of this series. This is how it looks like now:

      resources/views/index.blade.php

              <p class="subtitle">
                  @foreach ($lists as $list)<a href="https://www.digitalocean.com/community/tutorials/#" title="{{ $list->title }}" class="tag is-info is-light">{{ $list->title }}</a> @endforeach
              </p>
      

      You’ll update the href hyperlink to include the current URL for the list page, using the route blade method. This method expects the name of the route as the first argument, with URL parameters provided as additional arguments to the method call. Replace the # character with the following highlighted content:

      resources/views/index.blade.php

              <p class="subtitle">
                  @foreach ($lists as $list)<a href="https://www.digitalocean.com/community/tutorials/{{ route('link-list', $list->slug) }}" title="{{ $list->title }}" class="tag is-info is-light">{{ $list->title }}</a> @endforeach
              </p>
      

      Next, locate the links section and the foreach loop within. You’ll need to include another call to the route() method where the list name is printed for each link. This will be similar to the previous example, however, the list object is accessed differently, through the $link variable:

      resources/views/index.blade.php

                  <p>{{$link->url}}</p>
                  <p class="mt-2"><a href="https://www.digitalocean.com/community/tutorials/{{ route('link-list', $link->link_list->slug) }}" title="{{ $link->link_list->title }}" class="tag is-info">{{ $link->link_list->title }}</a></p>
      

      Next, you may want to include information about a list, when additional information is provided. You can check for the existence of a $list variable, and print the list title only when that variable is available.

      Replace your title section with the following highlighted code:

              <h1 class="title">
                  @if (isset($list))
                      {{ $list->title }}
                  @else
                      Check out my awesome links
                  @endif
              </h1>
      

      This is how the index.blade.php file will look once you’re finished. The changes are highlighted for your convenience:

      resources/views/index.blade.php

      <!DOCTYPE html>
      <html>
      <head>
          <meta charset="utf-8">
          <meta name="viewport" content="width=device-width, initial-scale=1">
          <title>My Awesome Links</title>
          <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/css/bulma.min.css">
      
          <style>
              html {
                  background: url("https://i.imgur.com/BWIdYTM.jpeg") no-repeat center center fixed;
                  -webkit-background-size: cover;
                  -moz-background-size: cover;
                  -o-background-size: cover;
                  background-size: cover;
              }
      
              div.link h3 {
                  font-size: large;
              }
      
              div.link p {
                  font-size: small;
                  color: #718096;
              }
          </style>
      </head>
      <body>
      <section class="section">
          <div class="container">
              <h1 class="title">
                  @if (isset($list))
                      {{ $list->title }}
                  @else
                      Check out my awesome links
                  @endif
              </h1>
              <p class="subtitle">
                  @foreach ($lists as $list)<a href="https://www.digitalocean.com/community/tutorials/{{ route('link-list', $list->slug) }}" title="{{ $list->title }}" class="tag is-info is-light">{{ $list->title }}</a> @endforeach
              </p>
      
              <section class="links">
                  @foreach ($links as $link)
                      <div class="box link">
                          <h3><a href="https://www.digitalocean.com/community/tutorials/{{ $link->url }}" target="_blank" title="Visit Link: {{ $link->url }}">{{ $link->description }}</a></h3>
                          <p>{{$link->url}}</p>
                          <p class="mt-2"><a href="https://www.digitalocean.com/community/tutorials/{{ route('link-list', $link->link_list->slug) }}" title="{{ $link->link_list->title }}" class="tag is-info">{{ $link->link_list->title }}</a></p>
                      </div>
                  @endforeach
              </section>
          </div>
      </section>
      </body>
      </html>
      

      Save the file when you’re done updating its contents.

      You can now access the application main page through your browser. If you are using the included Docker Compose setup, the application should be available through the following local address:

      http://localhost:8000
      

      You’ll get a page similar to the following screenshot:

      The screenshot shows the updated Landing Laravel application with working links to individual list pages

      In the next part of this series, you’ll learn how to order query results in Laravel Eloquent.

      This tutorial is part of an ongoing weekly series about Laravel Eloquent. You can subscribe to the Laravel tag if you want to be notified when new tutorials are published.



      Source link

      How To Optimize MySQL Queries with ProxySQL Caching on Ubuntu 16.04


      The author selected the Free Software Foundation to receive a donation as part of the Write for DOnations program.

      Introduction

      ProxySQL is a SQL-aware proxy server that can be positioned between your application and your database. It offers many features, such as load-balancing between multiple MySQL servers and serving as a caching layer for queries. This tutorial will focus on ProxySQL’s caching feature, and how it can optimize queries for your MySQL database.

      MySQL caching occurs when the result of a query is stored so that, when that query is repeated, the result can be returned without needing to sort through the database. This can significantly increase the speed of common queries. But in many caching methods, developers must modify the code of their application, which could introduce a bug into the codebase. To avoid this error-prone practice, ProxySQL allows you to set up transparent caching.

      In transparent caching, only database administrators need to change the ProxySQL configuration to enable caching for the most common queries, and these changes can be done through the ProxySQL admin interface. All the developer needs to do is connect to the protocol-aware proxy, and the proxy will decide if the query can be served from the cache without hitting the back-end server.

      In this tutorial, you will use ProxySQL to set up transparent caching for a MySQL server on Ubuntu 16.04. You will then test its performance using mysqlslap with and without caching to demonstrate the effect of caching and how much time it can save when executing many similar queries.

      Prerequisites

      Before you begin this guide you’ll need the following:

      Step 1 — Installing and Setting Up the MySQL Server

      First, you will install MySQL server and configure it to be used by ProxySQL as a back-end server for serving client queries.

      On Ubuntu 16.04, mysql-server can be installed using this command:

      • sudo apt-get install mysql-server

      Press Y to confirm the installation.

      You will then be prompted for your MySQL root user password. Enter a strong password and save it for later use.

      Now that you have your MySQL server ready, you will configure it for ProxySQL to work correctly. You need to add a monitor user for ProxySQL to monitor the MySQL server, since ProxySQL listens to the back-end server via the SQL protocol, rather than using a TCP connection or HTTP GET requests to make sure that the backend is running. monitor will use a dummy SQL connection to determine if the server is alive or not.

      First, log in to the MySQL shell:

      -uroot logs you in using the MySQL root user, and -p prompts for the root user’s password. This root user is different from your server’s root user, and the password is the one you entered when installing the mysql-server package.

      Enter the root password and press ENTER.

      Now you will create two users, one named monitor for ProxySQL and another that you will use to execute client queries and grant them the right privileges. This tutorial will name this user sammy.

      Create the monitor user:

      • CREATE USER 'monitor'@'%' IDENTIFIED BY 'monitor_password';

      The CREATE USER query is used to create a new user that can connect from specific IPs. Using % denotes that the user can connect from any IP address. IDENTIFIED BY sets the password for the new user; enter whatever password you like, but make sure to remember it for later use.

      With the user monitor created, next make the sammy user:

      • CREATE USER 'sammy'@'%' IDENTIFIED BY 'sammy_password';

      Next, grant privileges to your new users. Run the following command to configure monitor:

      • GRANT SELECT ON sys.* TO 'monitor'@'%';

      The GRANT query is used to give privileges to users. Here you granted only SELECT on all tables in the sys database to the monitor user; it only needs this privilege to listen to the back-end server.

      Now grant all privileges to all databases to the user sammy:

      • GRANT ALL PRIVILEGES on *.* TO 'sammy'@'%';

      This will allow sammy to make the necessary queries to test your database later.

      Apply the privilege changes by running the following:

      Finally, exit the mysql shell:

      You’ve now installed mysql-server and created a user to be used by ProxySQL to monitor your MySQL server, and another one to execute client queries. Next you will install and configure ProxySQL.

      Step 2 — Installing and Configuring ProxySQL Server

      Now you can install ProxySQL server, which will be used as a caching layer for your queries. A caching layer exists as a stop between your application servers and database back-end servers; it is used to connect to the database and to save the results of some queries in its memory for fast access later.

      The ProxySQL releases Github page offers installation files for common Linux distributions. For this tutorial, you will use wget to download the ProxySQL version 2.0.4 Debian installation file:

      • wget https://github.com/sysown/proxysql/releases/download/v2.0.4/proxysql_2.0.4-ubuntu16_amd64.deb

      Next, install the package using dpkg:

      • sudo dpkg -i proxysql_2.0.4-ubuntu16_amd64.deb

      Once it is installed, start ProxySQL with this command:

      • sudo systemctl start proxysql

      You can check if ProxySQL started correctly with this command:

      • sudo systemctl status proxysql

      You will get an output similar to this:

      Output

      root@ubuntu-s-1vcpu-2gb-sgp1-01:~# systemctl status proxysql ● proxysql.service - LSB: High Performance Advanced Proxy for MySQL Loaded: loaded (/etc/init.d/proxysql; bad; vendor preset: enabled) Active: active (exited) since Wed 2019-06-12 21:32:50 UTC; 6 months 7 days ago Docs: man:systemd-sysv-generator(8) Tasks: 0 Memory: 0B CPU: 0

      Now it is time to connect your ProxySQL server to the MySQL server. For this purpose, use the ProxySQL admin SQL interface, which by default listens to port 6032 on localhost and has admin as its username and password.

      Connect to the interface by running the following:

      • mysql -uadmin -p -h 127.0.0.1 -P6032

      Enter admin when prompted for the password.

      -uadmin sets the username as admin, and the -h flag specifies the host as localhost. The port is 6032, specified using the -P flag.

      Here you had to specify the host and port explicitly because, by default, the MySQL client connects using a local sockets file and port 3306.

      Now that you are logged into the mysql shell as admin, configure the monitor user so that ProxySQL can use it. First, use standard SQL queries to set the values of two global variables:

      • UPDATE global_variables SET variable_value='monitor' WHERE variable_name='mysql-monitor_username';
      • UPDATE global_variables SET variable_value='monitor_password' WHERE variable_name='mysql-monitor_password';

      The variable mysql-monitor_username specifies the MySQL username that will be used to check if the back-end server is alive or not. The variable mysql-monitor_password points to the password that will be used when connecting to the back-end server. Use the password you created for the monitor username.

      Every time you create a change in the ProxySQL admin interface, you need to use the right LOAD command to apply changes to the running ProxySQL instance. You changed MySQL global variables, so load them to RUNTIME to apply changes:

      • LOAD MYSQL VARIABLES TO RUNTIME;

      Next, SAVE the changes to the on-disk database to persist changes between restarts. ProxySQL uses its own SQLite local database to store its own tables and variables:

      • SAVE MYSQL VARIABLES TO DISK;

      Now, you will tell ProxySQL about the back-end server. The table mysql_servers holds information about each back-end server where ProxySQL can connect and execute queries, so add a new record using a standard SQL INSERT statement with the following values for hostgroup_id, hostname, and port:

      • INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES (1, '127.0.0.1', 3306);

      To apply the changes, run LOAD and SAVE again:

      • LOAD MYSQL SERVERS TO RUNTIME;
      • SAVE MYSQL SERVERS TO DISK;

      Finally, you will tell ProxySQL which user will connect to the back-end server; set sammy as the user, and replace sammy_password with the password you created earlier:

      • INSERT INTO mysql_users(username, password, default_hostgroup) VALUES ('sammy', 'sammy_password', 1);

      The table mysql_users holds information about users used to connect to the back-end servers; you specified the username, password, and default_hostgroup.

      LOAD and SAVE the changes:

      • LOAD MYSQL USERS TO RUNTIME;
      • SAVE MYSQL USERS TO DISK;

      Then exit the mysql shell:

      To test that you can connect to your back-end server using ProxySQL, execute the following test query:

      • mysql -usammy -h127.0.0.1 -p -P6033 -e "SELECT @@HOSTNAME as hostname"

      In this command, you used the -e flag to execute a query and close the connection. The query prints the hostname of the back-end server.

      Note: ProxySQL uses port 6033 by default for listening to incoming connections.

      The output will look like this, with your_hostname replaced by your hostname:

      Output

      +----------------------------+ | hostname | +----------------------------+ | your_hostname | +----------------------------+

      To learn more about ProxySQL configuration, see Step 3 of How To Use ProxySQL as a Load Balancer for MySQL on Ubuntu 16.04.

      So far, you configured ProxySQL to use your MySQL server as a backend and connected to the backend using ProxySQL. Now, you are ready to use mysqlslap to benchmark the query performance without caching.

      Step 3 — Testing Using mysqlslap Without Caching

      In this step, you will download a test database so you can execute queries against it with mysqlslap to test the latency without caching, setting a benchmark for the speed of your queries. You will also explore how ProxySQL keeps records of queries in the stats_mysql_query_digest table.

      mysqlslap is a load emulation client that is used as a load testing tool for MySQL. It can test a MySQL server with auto-generated queries or with some custom queries executed on a database. It comes installed with the MySQL client package, so you do not need to install it; instead, you will download a database for testing purposes only, on which you can use mysqlslap.

      In this tutorial, you will use a sample employee database. You will be using this employee database because it features a large data set that can illustrate differences in query optimization. The database has six tables, but the data it contains has more than 300,000 employee records. This will help you emulate a large-scale production workload.

      To download the database, first clone the Github repository using this command:

      • git clone https://github.com/datacharmer/test_db.git

      Then enter the test_db directory and load the database into the MySQL server using these commands:

      • cd test_db
      • mysql -uroot -p < employees.sql

      This command uses shell redirection to read the SQL queries in employees.sql file and execute them on the MySQL server to create the database structure.

      You will see output like this:

      Output

      INFO CREATING DATABASE STRUCTURE INFO storage engine: InnoDB INFO LOADING departments INFO LOADING employees INFO LOADING dept_emp INFO LOADING dept_manager INFO LOADING titles INFO LOADING salaries data_load_time_diff 00:00:32

      Once the database is loaded into your MySQL server, test that mysqlslap is working with the following query:

      • mysqlslap -usammy -p -P6033 -h127.0.0.1 --auto-generate-sql --verbose

      mysqlslap has similar flags to the mysql client; here are the ones used in this command:

      • -u specifies the user used to connect to the server.
      • -p prompts for the user’s password.
      • -P connects using the specified port.
      • -h connects to the specified host.
      • --auto-generate-sql lets MySQL perform load testing using its own generated queries.
      • --verbose makes the output show more information.

      You will get output similar to the following:

      Output

      Benchmark Average number of seconds to run all queries: 0.015 seconds Minimum number of seconds to run all queries: 0.015 seconds Maximum number of seconds to run all queries: 0.015 seconds Number of clients running queries: 1 Average number of queries per client: 0

      In this output, you can see the average, minimum, and maximum number of seconds spent to execute all queries. This gives you an indication about the amount of time needed to execute the queries by a number of clients. In this output, only one client was used to execute queries.

      Next, find out what queries mysqlslap executed in the last command by looking at ProxySQL’s stats_mysql_query_digest. This will give us information like the digest of the queries, which is a normalized form of the SQL statement that can be referenced later to enable caching.

      Enter the ProxySQL admin interface with this command:

      • mysql -uadmin -p -h 127.0.0.1 -P6032

      Then execute this query to find information in the stats_mysql_query_digest table:

      • SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC;

      You will see output similar to the following:

      +------------+----------+-----------+--------------------+----------------------------------+
      | count_star | sum_time | hostgroup | digest             | digest_text                      |
      +------------+----------+-----------+--------------------+----------------------------------+
      | 1          | 598      | 1         | 0xF8F780C47A8D1D82 | SELECT @@HOSTNAME as hostname    |
      | 1          | 0        | 1         | 0x226CD90D52A2BA0B | select @@version_comment limit ? |
      +------------+----------+-----------+--------------------+----------------------------------+
      2 rows in set (0.01 sec)
      

      The previous query selects data from the stats_mysql_query_digest table, which contains information about all executed queries in ProxySQL. Here you have five columns selected:

      • count_star: The number of times this query was executed.
      • sum_time: Total time in milliseconds that this query took to execute.
      • hostgroup: The hostgroup used to execute the query.
      • digest: A digest of the executed query.
      • digest_text: The actual query. In this tutorial’s example, the second query is parameterized using ? marks in place of variable parameters. select @@version_comment limit 1 and select @@version_comment limit 2, therefore, are grouped together as the same query with the same digest.

      Now that you know how to check query data in the stats_mysql_query_digest table, exit the mysql shell:

      The database you downloaded contains some tables with demo data. You will now test queries on the dept_emp table by selecting any records whose from_date is greater than 2000-04-20 and recording the average execution time.

      Use this command to run the test:

      • mysqlslap -usammy -P6033 -p -h127.0.0.1 --concurrency=100 --iterations=20 --create-schema=employees --query="SELECT * from dept_emp WHERE from_date>'2000-04-20'" --verbose

      Here you are using some new flags:

      • --concurrency=100: This sets the number of users to simulate, in this case 100.
      • --iterations=20: This causes the test to run 20 times and calculate results from all of them.
      • --create-schema=employees: Here you selected the employees database.
      • --query="SELECT * from dept_emp WHERE from_date>'2000-04-20'": Here you specified the query executed in the test.

      The test will take a few minutes. After it is done, you will get results similar to the following:

      Output

      Benchmark Average number of seconds to run all queries: 18.117 seconds Minimum number of seconds to run all queries: 8.726 seconds Maximum number of seconds to run all queries: 22.697 seconds Number of clients running queries: 100 Average number of queries per client: 1

      Your numbers could be a little different. Keep these numbers somewhere in order to compare them with the results from after you enable caching.

      After testing ProxySQL without caching, it is time to run the same test again, but this time with caching enabled.

      Step 4 — Testing Using mysqlslap With Caching

      In this step, caching will help us to decrease latency when executing similar queries. Here, you will identify the queries executed, take their digests from ProxySQL’s stats_mysql_query_digest table, and use them to enable caching. Then, you will test again to check the difference.

      To enable caching, you need to know the digests of the queries that will be cached. Log in to the ProxySQL admin interface using this command:

      • mysql -uadmin -p -h127.0.0.1 -P6032

      Then execute this query again to get a list of queries executed and their digests:

      • SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest ORDER BY sum_time DESC;

      You will get a result similar to this:

      Output

      +------------+-------------+-----------+--------------------+------------------------------------------+ | count_star | sum_time | hostgroup | digest | digest_text | +------------+-------------+-----------+--------------------+------------------------------------------+ | 2000 | 33727110501 | 1 | 0xC5DDECD7E966A6C4 | SELECT * from dept_emp WHERE from_date>? | | 1 | 601 | 1 | 0xF8F780C47A8D1D82 | SELECT @@HOSTNAME as hostname | | 1 | 0 | 1 | 0x226CD90D52A2BA0B | select @@version_comment limit ? | +------------+-------------+-----------+--------------------+------------------------------------------+ 3 rows in set (0.00 sec)

      Look at the first row. It is about a query that was executed 2000 times. This is the benchmarked query executed previously. Take its digest and save it to be used in adding a query rule for caching.

      The next few queries will add a new query rule to ProxySQL that will match the digest of the previous query and put a cache_ttl value for it. cache_ttl is the number of milliseconds that the result will be cached in memory:

      • INSERT INTO mysql_query_rules(active, digest, cache_ttl, apply) VALUES(1,'0xC5DDECD7E966A6C4',2000,1);

      In this command you are adding a new record to the mysql_query_rules table; this table holds all the rules applied before executing a query. In this example, you are adding a value for the cache_ttl column that will cause the matched query by the given digest to be cached for a number of milliseconds specified in this column. You put 1 in the apply column to make sure that the rule is applied to queries.

      LOAD and SAVE these changes, then exit the mysql shell:

      • LOAD MYSQL QUERY RULES TO RUNTIME;
      • SAVE MYSQL QUERY RULES TO DISK;
      • exit;

      Now that caching is enabled, re-run the test again to check the result:

      • mysqlslap -usammy -P6033 -p -h127.0.0.1 --concurrency=100 --iterations=20 --create-schema=employees --query="SELECT * from dept_emp WHERE from_date>'2000-04-20'" --verbose

      This will give output similar to the following:

      Output

      Benchmark Average number of seconds to run all queries: 7.020 seconds Minimum number of seconds to run all queries: 0.274 seconds Maximum number of seconds to run all queries: 23.014 seconds Number of clients running queries: 100 Average number of queries per client: 1

      Here you can see the big difference in average execution time: it dropped from 18.117 seconds to 7.020.

      Conclusion

      In this article, you set up transparent caching with ProxySQL to cache database query results. You also tested the query speed with and without caching to see the difference that caching can make.

      You’ve used one level of caching in this tutorial. You could also try, web caching, which sits in front of a web server and caches the responses to similar requests, sending the response back to the client without hitting the back-end servers. This is very similar to ProxySQL caching but at a different level. To learn more about web caching, check out our Web Caching Basics: Terminology, HTTP Headers, and Caching Strategies primer.

      MySQL server also has its own query cache; you can learn more about it in our How To Optimize MySQL with Query Cache on Ubuntu 18.04 tutorial.



      Source link