One place for hosting & domains

      Databases

      Introduction to Relational Databases and RDBMSs


      Many programs need to save or store data for later use and read data that is recorded. Although there are many ways to do this, the most common approach is to use a Relational Database Management System (RDBMS).
      MySQL
      ,
      PostgreSQL
      , and
      SQLite
      are a few industry-standard open-source RDBMSs that have been widely adopted by software development projects. This guide provides an overview of relational databases and RDBMS concepts.

      What is a Relational Database?

      A database is an application for storing and retrieving data. Although the mechanisms differ, most databases provide an API allowing users to add, delete, access, search, and manage their data. As an alternative to using a database, data can be stored in text files or hash tables. However, this technique is not as fast or as convenient as using a database and is rarely used in modern systems.

      Early database applications evolved into the modern relational database, which allows users to store massive amounts of data. A relational database management system (RDBMS) is a software application that creates and maintains relational databases. An RDBMS no longer forces users to store data in one big table. It provides more structured ways of partitioning the data and is designed for more efficient access. RDBMS applications are optimized for fast reads and writes and bulk transfer of information.

      Database designers conceptualize and organize the data in terms of tables, columns, and rows. A row is also referred to as a record, or tuple. Contemporary relational databases structure the data using the following concepts:

      • Each database contains one or more tables.
      • When the user creates a table, they specify the columns within the table at the same time.
      • Each column represents a specific attribute, or field, within the record. A column is designed to hold data of a particular data type, for example, VARCHAR, which stands for a variable-length string.
      • A table contains a cluster of rows.
      • Each row within a table represents a unique database entry. Each column within the row contains an individual field in that entry.
      • A database table is like a two-by-two matrix. Each square inside the matrix contains a piece of data.

      An RDBMS is considered to be relational because it allows users to define relationships within and between the various tables using keys and indices. A relational database permits a user to provide or generate a primary key for each row. SQL can guarantee that this key is unique within the table. The fields in these tables might be related to one another based on their primary and foreign keys. These relationships help structure and organize the database and limit the amount of data duplication.

      An RDBMS application always provides the capabilities listed below. Individual applications might offer more options.

      • It allows for the creation, definition, modification, and removal of database tables, columns, rows, primary keys, and indices.
      • It accepts SQL queries and stores or retrieves the relevant data, combining information from different database tables as necessary.
      • It guarantees the integrity of the data and the references between the tables. For example, a foreign key always points to a valid row in another table.
      • It automatically updates indices, timestamps, and other internally-generated attributes as required.

      Relational databases use the Structured Query Language (SQL) to query and update the database. For example, an RDBMS client uses the SQL INSERT command to add a new row to one of the database tables. When a user adds a new row, they simultaneously specify a value for each column. Additional SQL commands are used to modify and delete rows, manage database items, and retrieve a list of records meeting specific criteria.

      For example, consider a database for a school. This database has several tables, for teachers, students, courses, classrooms, and so forth. The definition of the Students table might contain columns for the student’s first and last name, ID, grade, family, and more. Each row in this table symbolizes an individual student and serves to represent and collect all relevant information about that student. If the student’s name is “John”, the first_name column in this row contains John. The student ID can serve as the index and primary key and could be used to cross-reference the student in other tables.

      For instance, a simplified Students table can be defined using the structure displayed below. The top row represents the names of the columns in the table. The table below currently has two rows of data, one for each student.

      first_name last_name grade family_id student_id
      John Doe 4 1116 5005
      Jane Student 5 1224 5350

      What are Some Common RDBMS Terms?

      The following terms are frequently used in relation to databases:

      • Column: A set of values of the same data type, representing one attribute within a table. Columns are defined when a table is created.
      • Compound Key: A key consisting of multiple columns. A compound key is used when a single column cannot reliably identify a row.
      • Database: An organized group of data that is stored electronically. A database is usually organized into smaller clusters of information.
      • Foreign Key: An index used to cross-link a table entry to a row in another table.
      • Index: A method of more quickly accessing database entries. An index can be created using any combination of attributes, but implementation is application-specific. A database index is similar to an index in a book.
      • Primary Key: A column serving as an index to uniquely identify a row inside a table. A primary key can either be auto-generated or defined in the table definition. A primary key can be used to locate a specific row within a table.
      • Referential Integrity: An internal database property to ensure a foreign key always references a valid row in another table.
      • Relational DataBase Management System (RDBMS): A type of database system based on relationships between tables and entries.
      • Row: A structured entry within a table consisting of a set of related data. Each row in a table has the same structure, which corresponds to the column specifications in the table definition. A row is also referred to as a record or a tuple.
      • Structured Query Language (SQL): A simplified domain-specific programming language used to manage data in an RDBMS.
      • Table: A collection of database records, consisting of a series of rows and columns. A table can be thought of as a two-dimension matrix of information.

      SQL vs. MySQL

      The terms SQL and MySQL are often mixed up or used interchangeably, but they are not the same. SQL is the standard programming language for querying RDBMS applications. It is used to write database queries and can be used with any database system that supports it. MySQL is a specific instance of an RDBMS that uses SQL. Database users send SQL commands to an RDBMS such as MySQL to read and write data, and to administer the database. There is no application named SQL, so it does not make sense to make a “SQL vs MySQL” comparison. However, the term SQL database is often used informally as a shorthand term for any relational database.

      The SQL Language

      The SQL language is specified as a series of statements. It is not considered a general-purpose imperative programming language like Python, because it lacks a full range of data structures and control statements. It is instead a domain-specific language intended for a single purpose. SQL is designed for the querying, definition, and manipulation of data. It’s also designed to provide data access control. One advantage of SQL is that it can access multiple records using only one command. It does not specify how the database should access an entry.

      The SQL language consists of designated keywords, expressions, queries, statements, operators, and optional clauses. Object identifiers are used to refer to database entities, including tables and columns. SQL supports a large number of predefined data types, such as CHAR, for the character, and INTEGER. Some of the most important SQL operators include =, <>, >, <, IN, LIKE, TRUE, FALSE, and NOT. Recent releases of SQL now support a simple CASE statement. The MySQL documentation contains more information about the SQL
      language structure
      ,
      data types
      , and
      statements
      .

      Some of the most widely-used SQL statements and clauses include the following:

      • ALTER: Modifies the structure of a database object.
      • CREATE: Creates a database object, such as a table or database.
      • DELETE: Removes one or more existing rows from the database.
      • DROP: Permanently deletes an object from the database.
      • FROM: Indicates which table to use for the query.
      • GRANT: Authorizes a database user to perform a particular action.
      • GROUP BY: A clause to organize output from a SELECT statement.
      • INSERT: Adds rows to the database.
      • JOIN: A clause specifying how to combine and assemble data from multiple tables.
      • MERGE: Combines data from multiple tables.
      • ORDER BY: A clause for sorting the output from a query.
      • SELECT: Retrieves data from one or more tables. This command does not alter the database or change any data.
      • UPDATE: Modifies one or more existing rows.
      • WHERE: A clause to identify the rows a query should operate on. It is typically used with a comparison operator.

      The wildcard * operator is often used in conjunction with the SELECT command. This command instructs SQL to display all columns in the output.

      Below are a couple of examples of SQL queries. The following SQL command displays the name of each class in the Class database for each row where the value of the subject column is math.

      SELECT name
          FROM Class
          WHERE subject="math";
      

      The next SQL statement creates the Class table. The CREATE statement defines each column in the table, along with its data type, in sequential order. The VARCHAR data type is used to hold a variable-length string. The SMALLINT data type is used for small integer values from the signed range of -32768 to 32767.

      CREATE TABLE Class (
          classID smallint,
          name varchar(255),
          subject varchar(255),
          level smallint
      );
      

      SQL vs. NoSQL

      NoSQL systems are an alternative to traditional SQL-based RDBMS applications. As the name implies, they use a non-relational model to handle data. They are typically less structured and more flexible than an RDBMS. NoSQL systems are not standardized and can take a variety of formats. However, they are typically key-value, graph, or document-based, not table-based. Some NoSQL applications can use structured domain-specific languages or even accept SQL queries in parallel. A few examples of NoSQL applications include Redis and
      MongoDB
      . For more information on NoSQL systems, consult the Linode guide for a
      comparison between SQL and NoSQL databases
      .



      Source link

      An Introduction to Document-Oriented Databases


      Introduction

      Although they were first invented decades ago, computer-based databases have become ubiquitous on today’s internet. More and more commonly, websites and applications involve collecting, storing, and retrieving data from a database. For many years the database landscape was dominated by relational databases, which organize data in tables made up of rows. To break free from the rigid structure imposed by the relational model, though, a number of different database types have emerged in recent years.

      These new database models are jointly referred to as NoSQL databases, as they usually do not use Structured Query Language — also known as SQL — which relational databases typically employ to manage and query data. NoSQL databases offer a high level of scalability as well as flexibility in terms of data structure. These features make NoSQL databases useful for handling large volumes of data and fast-paced, agile development.

      This conceptual article outlines the key concepts related to document databases as well as the benefits of using them. Examples used in this article reference MongoDB, a widely-used document-oriented database, but most of the concepts highlighted here are applicable for most other document databases as well.

      What is a Document Database?

      Breaking free from thinking about databases as consisting of rows and columns, as is the case in a table within a relational database, document databases store data as documents. You might think of a document as a self-contained data entry containing everything needed to understand its meaning, similar to documents used in the real world.

      The following is an example of a document that might appear in a document database like MongoDB. This sample document represents a company contact card, describing an employee called Sammy:

      Sammy’s contact card document

      {
          "_id": "sammyshark",
          "firstName": "Sammy",
          "lastName": "Shark",
          "email": "sammy.shark@digitalocean.com",
          "department": "Finance"
      }
      

      Notice that the document is written as a JSON object. JSON is a human-readable data format that has become quite popular in recent years. While many different formats can be used to represent data within a document database, such as XML or YAML, JSON is one of the most common choices. For example, MongoDB adopted JSON as the primary data format to define and manage data.

      All data in JSON documents are represented as field-and-value pairs that take the form of field: value. In the previous example, the first line shows an _id field with the value sammyshark. The example also includes fields for the employee’s first and last names, their email address, as well as what department they work in.

      Field names allow you to understand what kind of data is held within a document with just a glance. Documents in document databases are self-describing, which means they contain both the data values as well as the information on what kind of data is being stored. When retrieving a document from the database, you always get the whole picture.

      The following is another sample document representing a colleague of Sammy’s named Tom, who works in multiple departments and also uses a middle name:

      Tom’s contact card document

      {
          "_id": "tomjohnson",
          "firstName": "Tom",
          "middleName": "William",
          "lastName": "Johnson",
          "email": "tom.johnson@digitalocean.com",
          "department": ["Finance", "Accounting"]
      }
      

      This second document has a few differences from the first example. For instance, it adds a new field called middleName. Also, this document’s department field stores not a single value, but an array of two values: "Finance" and "Accounting".

      Because these documents hold different fields of data, they can be said to have different schemas. A database’s schema is its formal structure, which outlines what kind of data it can hold. In the case of documents, their schemas are reflected in their field names and what kinds of values those fields represent.

      In a relational database, you’d be unable to store both of these example contact cards in the same table, as they differ in structure. You would have to adapt the database schema both to allow storing multiple departments as well as middle names, and you would have to provide a middle name for Sammy or else fill the column for that row with a NULL value. This is not the case with document databases, which offer you the freedom to save multiple documents with different schemas together with no changes to the database itself.

      In document databases, documents are not only self-describing but also their schema is dynamic, which means that you don’t have to define it before you start saving data. Fields can differ between different documents in the same database, and you can modify the document’s structure at will, adding or removing fields as you go. Documents can be also nested — meaning that a field within one document can have a value consisting of another document — making it possible to store complex data within a single document entry.

      Let’s imagine the contact card must store information about social media accounts the employee uses and add them as nested objects to the document:

      Tom’s contact card document with social media accounts information attached

      {
          "_id": "tomjohnson",
          "firstName": "Tom",
          "middleName": "William",
          "lastName": "Johnson",
          "email": "tom.johnson@digitalocean.com",
          "department": ["Finance", "Accounting"],
          "socialMediaAccounts": [
              {
                  "type": "facebook",
                  "username": "tom_william_johnson_23"
              },
              {
                  "type": "twitter",
                  "username": "@tomwilliamjohnson23"
              }
          ]
      }
      

      A new field called socialMediaAccounts appears in the document, but instead of a single value, it refers to an array of nested objects describing individual social media accounts. Each of these accounts could be a document on its own, but here they’re stored directly within the contact card. Once again, there is no need to change the database structure to accommodate this requirement. You can immediately save the new document to the database.

      Note: In MongoDB, it’s customary to name fields and collections using a camelCase notation, with no spaces between words, the first word written entirely in lowercase, and any additional words having their first letters capitalized. That said, you can also use different notations such as snake_case, in which words are all written in lowercase and separated with underscores. Whichever notation you choose, it’s considered bast practice to use it consistently across the whole database.

      All these attributes make it intuitive to work with document databases from the developer’s perspective. The database facilitates storing actual objects describing data within the application, encouraging experimentation and allowing great flexibility when reshaping data as the software grows and evolves.

      Benefits of Document Databases

      While document-oriented databases may not be the right choice for every use case, there are many benefits of choosing one over a relational database. A few of the most important benefits are:

      • Flexibility and adaptability: with a high level of control over the data structure, document databases enable experimentation and adaptation to new emerging requirements. New fields can be added right away and existing ones can be changed any time. It’s up to the developer to decide whether old documents must be amended or the change can be implemented only going forward.

      • Ability to manage structured and unstructured data: as mentioned previously, relational databases are well suited for storing data that conforms to a rigid structure. Document databases can be used to handle structured data as well, but they’re also quite useful for storing unstructured data where necessary. You can imagine structured data as the kind of information you would easily represent in a spreadsheet with rows and columns, whereas unstructured data is everything not as straightforward to frame. Examples of unstructured data are rich social media posts with human-generated texts and multimedia, server logs that don’t follow unified format, or data coming from a multitude of different sensors in smart homes.

      • Scalability by design: relational databases are often write constrained, and increasing their performance requires you to scale vertically (meaning you must migrate their data to more powerful and performant database servers). Conversely, document databases are designed as distributed systems that instead allow you to scale horizontally (meaning that you split a single database up across multiple servers). Because documents are independent units containing both data and schema, it’s relatively trivial to distribute them across server nodes. This makes it possible to store large amounts of data with less operational complexity.

      In real-world applications, both document databases and other NoSQL and relational databases are often used together, each responsible for what it’s best suited for. This paradigm of mixing various types of databases is known as polyglot persistence.

      Grouping Documents Into Collections

      While document databases allow great flexibility in how the documents are structured, having some means of organizing data into categories sharing similar characteristics is crucial for ensuring that a database is healthy and manageable.

      Imagine a database as an individual cabinet in a company archive with many draws. For example, one drawer might keep records of employment contracts, with another keeping agreements with business partners. While it is technically possible to put both kinds of documents into a single drawer, it would be difficult to browse the documents later on.

      In a document database, such drawers are often called collections, logically similar to tables in relational databases. The role of a collection is to group together documents that share a similar logical function, even if individual documents may slightly differ in their schema. For instance, say you have one employment contract for a fixed-term and another that describes a contractor’s additional benefits. Both documents are employment contracts and, as such, it could make sense to group them into a single collection:

      Document collection

      Note: While it’s a popular approach, not all document databases use the concept of collections to organize documents together. Some database systems use tags or tree-like hierarchies, others store documents directly within a database with no further subdivisions. MongoDB is one of the popular document-oriented databases that use collections for document organization.

      Having similar characteristics between documents within a collection also allows you to build indexes in order to allow for more performant retrieval of documents based on queries related to certain fields. Indexes are special data structures that store a portion of a collection’s data in a way that’s faster to traverse and filter.

      As an example, you might have a collection of documents in a database that all share a similar field. Because each document shares the same field, it’s likely you would often use that field when running queries. Without indexes, any query asking the database to retrieve a particular document requires a collection scan — browsing all documents within a collection one by one to find the requested match. By creating an index, however, the database only needs to browse through indexed fields, thereby improving query performance.

      Data Types and Schema Validation

      While we mentioned that document-oriented databases can store documents in different formats, such as XML, YAML or JSON, these are often further extended with additional traits that are specific to a given database system, such as additional data types or structure validation features.

      For example, MongoDB internally uses a binary format called BSON (short for Binary JSON) instead of a pure JSON. This not only allows for better performance, but it also extends the format with data types that JSON does not support natively. Thanks to this, we can reliably store different kinds of data in MongoDB documents without being restricted to standard JSON types and use filtering, sorting, and aggregation features specific to individual data types.

      The following sample document uses several different data types supported by MongoDB:

      {
          "_id": ObjectId("5a934e000102030405000000"),
          "code": NumberLong(2090845886852),
          "image": BinData(0, "TGVhcm5pbmcgTW9uZ29EQg=="),
          "lastPurchased": ISODate("2021-01-19T06:01:17.171Z"),
          "name": "Document database sticker",
          "price": NumberDecimal("13.23"),
          "quantity": 317,
          "tags": [
              "stickers",
              "accessories"
          ]
      }
      

      Notice that some of these data types not typical to JSON, such as decimal numbers with exact precision or dates which are represented as objects, such as NumberDecimal or ISODate. This ensures that these fields will always be interpreted properly and not mistakenly cast to another similar data type, like a decimal number being cast to a regular double.

      This variety of supported data types, combined with schema validation features, makes it possible to implement a set of rules and validity requirements to provide your document database structure. This allows you to model not only unstructured data, but to also create collections of documents following more rigid and precise requirements.

      Conclusion

      Thanks to their flexibility, scalability, and ease of use, document databases are becoming an increasingly popular choice of database for application developers. They are well suited to different applications and work well on their own or as a part of bigger, multi-database ecosystems. The wide array of document-oriented databases has distinct advantages and use cases, making it possible to choose the best database for any given task.

      You can learn more about document-oriented databases and other NoSQL databases from DigitalOcean’s community articles on that topic.

      To learn more about MongoDB in particular, we encourage you to follow this tutorial series covering many topics on using and administering MongoDB and to check the official MongoDB documentation, a vast source of knowledge about MongoDB as well as document databases in general.



      Source link

      What’s New In DigitalOcean Managed Databases


      How to Join

      This Tech Talk is free and open to everyone. Register below to get a link to join the live stream or receive the video recording after it airs.

      Date Time RSVP
      March 16, 2021 11:00–12:00 p.m. ET / 4:00–5:00 p.m. GMT

      About the Talk

      A managed database-as-a-service (DbaaS) is an easy way to eliminate the overhead of database management and maintenance, giving developers more time to focus on building great products.

      What You’ll Learn

      • Benefits of utilizing a managed database
      • New and upcoming features from DigitalOcean Managed Databases that make it easier for you to get started and scale

      This Talk Is Designed For

      Any developer or organization that wants to free themselves from managing their own database.

      Prerequisites

      Basic knowledge of databases.

      Resources

      DigitalOcean Managed Databases
      DigitalOcean Managed Databases Docs

      To join the live Tech Talk, register here.



      Source link