One place for hosting & domains

      Design

      How To Design a Document Schema in MongoDB


      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      If you have a lot of experience working with relational databases, it can be difficult to move past the principles of the relational model, such as thinking in terms of tables and relationships. Document-oriented databases like MongoDB make it possible to break free from rigidity and limitations of the relational model. However, the flexibility and freedom that comes with being able to store self-descriptive documents in the database can lead to other pitfalls and difficulties.

      This conceptual article outlines five common guidelines related to schema design in a document-oriented database and highlights various considerations one should make when modeling relationships between data. It will also walk through several strategies one can employ to model such relationships, including embedding documents within arrays and using child and parent references, as well as when these strategies would be most appropriate to use.

      Guideline 1 — Storing Together What Needs to be Accessed Together

      In a typical relational database, data is kept in tables, and each table is constructed with a fixed list of columns representing various attributes that make up an entity, object, or event. For example, in a table representing students at a a university, you might find columns holding each student’s first name, last name, date of birth, and a unique identification number.

      Typically, each table represents a single subject. If you wanted to store information about a student’s current studies, scholarships, or prior education, it could make sense to keep that data in a separate table from the one holding their personal information. You could then connect these tables to signify that there is a relationship between the data in each one, indicating that the information they contain has a meaningful connection.

      For instance, a table describing each student’s scholarship status could refer to students by their student ID number, but it would not store the student’s name or address directly, avoiding data duplication. In such a case, to retrieve information about any student with all information on the student’s social media accounts, prior education, and scholarships, a query would need to access more than one table at a time and then compile the results from different tables into one.

      This method of describing relationships through references is known as a normalized data model. Storing data this way — using multiple separate, concise objects related to each other — is also possible in document-oriented databases. However, the flexibility of the document model and the freedom it gives to store embedded documents and arrays within a single document means that you can model data differently than you might in a relational database.

      The underlying concept for modeling data in a document-oriented database is to “store together what will be accessed together.”“ Digging further into the student example, say that most students at this school have more than one email address. Because of this, the university wants the ability to store multiple email addresses with each student’s contact information.

      In a case like this, an example document could have a structure like the following:

      {
          "_id": ObjectId("612d1e835ebee16872a109a4"),
          "first_name": "Sammy",
          "last_name": "Shark",
          "emails": [
              {
                  "email": "sammy@digitalocean.com",
                  "type": "work"
              },
              {
                  "email": "sammy@example.com",
                  "type": "home"
              }
          ]
      }
      

      Notice that this example document contains an embedded list of email addresses.

      Representing more than a single subject inside a single document characterizes a denormalized data model. It allows applications to retrieve and manipulate all the relevant data for a given object (here, a student) in one go without a need to access multiple separate objects and collections. Doing so also guarantees the atomicity of operations on such a document without having to use multi-document transactions to guarantee integrity.

      Storing together what needs to be accessed together using embedded documents is often the optimal way to represent data in a document-oriented database. In the following guidelines, you’ll learn how different relationships between objects, such as one-to-one or one-to-many relationships, can be best modeled in a document-oriented database.

      Guideline 2 — Modeling One-to-One Relationships with Embedded Documents

      A one-to-one relationship represents an association between two distinct objects where one object is connected with exactly one of another kind.

      Continuing with the student example from the previous section, each student has only one valid student ID card at any given point in time. One card never belongs to multiple students, and no student can have multiple identification cards. If you were to store all this data in a relational database, it would likely make sense to model the relationship between students and their ID cards by storing the student records and the ID card records in separate tables that are tied together through references.

      One common method for representing such relationships in a document database is by using embedded documents. As an example, the following document describes a student named Sammy and their student ID card:

      {
          "_id": ObjectId("612d1e835ebee16872a109a4"),
          "first_name": "Sammy",
          "last_name": "Shark",
          "id_card": {
              "number": "123-1234-123",
              "issued_on": ISODate("2020-01-23"),
              "expires_on": ISODate("2020-01-23")
          }
      }
      

      Notice that instead of a single value, this example document’s id_card field holds an embedded document representing the student’s identification card, described by an ID number, the card’s date of issue, and the card’s expiration date. The identity card essentially becomes a part of the document describing the student Sammy, even though it’s a separate object in real life. Usually, structuring the document schema like this so that you can retrieve all related information through a single query is a sound choice.

      Things become less straightforward if you encounter relationships connecting one object of a kind with many objects of another type, such as a student’s email addresses, the courses they attend, or the messages they post on the student council’s message board. In the next few guidelines, you’ll use these data examples to learn different approaches for working with one-to-many and many-to-many relationships.

      Guideline 3 — Modeling One-to-Few Relationships with Embedded Documents

      When an object of one type is related to multiple objects of another type, it can be described as a one-to-many relationship. A student can have multiple email addresses, a car can have numerous parts, or a shopping order can consist of multiple items. Each of these examples represents a one-to-many relationship.

      While the most common way to represent a one-to-one relationship in a document database is through an embedded document, there are several ways to model one-to-many relationships in a document schema. When considering your options for how to best model these, though, there are three properties of the given relationship you should consider:

      • Cardinality: Cardinality is the measure of the number of individual elements in a given set. For example, if a class has 30 students, you could say that class has a cardinality of 30. In a one-to-many relationship, the cardinality can be different in each case. A student could have one email address or multiple. They could be registered for just a few classes or they could have a completely full schedule. In a one-to-many relationship, the size of “many” will affect how you might model the data.
      • Independent access: Some related data will rarely, if ever, be accessed separately from the main object. For example, it might be uncommon to retrieve a single student’s email address without other student details. On the other hand, a university’s courses might need to be accessed and updated individually, regardless of the student or students that are registered to attend them. Whether or not you will ever access a related document alone will also affect how you might model the data.
      • Whether the relationship between data is strictly a one-to-many relationship: Consider the courses an example student attends at a university. From the student’s perspective, they can participate in multiple courses. On the surface, this may seem like a one-to-many relationship. However, university courses are rarely attended by a single student; more often, multiple students will attend the same class. In cases like this, the relationship in question is not really a one-to-many relationship, but a many-to-many relationship, and thus you’d take a different approach to model this relationship than you would a one-to-many relationship.

      Imagine you’re deciding how to store student email addresses. Each student can have multiple email addresses, such as one for work, one for personal use, and one provided by the university. A document representing a single email address might take a form like this:

      {
          "email": "sammy@digitalocean.com",
          "type": "work"
      }
      

      In terms of cardinality, there will be only a few email addresses for each student, since it’s unlikely that a student will have dozens — let alone hundreds — of email addresses. Thus, this relationship can be characterized as a one-to-few relationship, which is a compelling reason to embed email addresses directly into the student document and store them together. You don’t run any risk that the list of email addresses will grow indefinitely, which would make the document big and inefficient to use.

      Note: Be aware that there are certain pitfalls associated with storing data in arrays. For instance, a single MongoDB document cannot exceed 16MB in size. While it is possible and common to embed multiple documents using array fields, if the list of objects grows uncontrollably the document could quickly reach this size limit. Additionally, storing a large amount of data inside embedded arrays have a big impact on query performance.

      Embedding multiple documents in an array field will likely be suitable in many situations, but know that it also may not always be the best solution.

      Regarding independent access, email addresses will likely not be accessed separately from the student. As such, there is no clear incentive to store them as separate documents in a separate collection. This is another compelling reason to embed them inside the student’s document.

      The last thing to consider is whether this relationship is really a one-to-many relationship instead of a many-to-many relationship. Because an email address belongs to a single person, it’s reasonable to describe this relationship as a one-to-many relationship (or, perhaps more accurately, a one-to-few relationship) instead of a many-to-many relationship.

      These three assumptions suggest that embedding students’ various email addresses within the same documents that describe students themselves would be a good choice for storing this kind of data. A sample student’s document with email addresses embedded might take this shape:

      {
          "_id": ObjectId("612d1e835ebee16872a109a4"),
          "first_name": "Sammy",
          "last_name": "Shark",
          "emails": [
              {
                  "email": "sammy@digitalocean.com",
                  "type": "work"
              },
              {
                  "email": "sammy@example.com",
                  "type": "home"
              }
          ]
      }
      

      Using this structure, every time you retrieve a student’s document you will also retrieve the embedded email addresses in the same read operation.

      If you model a relationship of the one-to-few variety, where the related documents do not need to be accessed independently, embedding documents directly like this is usually desirable, as this can reduce the complexity of the schema.

      As mentioned previously, though, embedding documents like this isn’t always the optimal solution. The next section provides more details on why this might be the case in some scenarios, and outlines how to use child references as an alternative way to represent relationships in a document database.

      Guideline 4 — Modeling One-to-Many and Many-to-Many Relationships with Child References

      The nature of the relationship between students and their email addresses informed how that relationship could best be modeled in a document database. There are some differences between this and the relationship between students and the courses they attend, so the way you model the relationships between students and their courses will be different as well.

      A document describing a single course that a student attends could follow a structure like this:

      {
          "name": "Physics 101",
          "department": "Department of Physics",
          "points": 7
      }
      

      Say that you decided from the outset to use embedded documents to store information about each students’ courses, as in this example:

      {
          "_id": ObjectId("612d1e835ebee16872a109a4"),
          "first_name": "Sammy",
          "last_name": "Shark",
          "emails": [
              {
                  "email": "sammy@digitalocean.com",
                  "type": "work"
              },
              {
                  "email": "sammy@example.com",
                  "type": "home"
              }
          ],
          "courses": [
              {
                  "name": "Physics 101",
                  "department": "Department of Physics",
                  "points": 7
              },
              {
                  "name": "Introduction to Cloud Computing",
                  "department": "Department of Computer Science",
                  "points": 4
              }
          ]
      }
      

      This would be a perfectly valid MongoDB document and could well serve the purpose, but consider the three relationship properties you learned about in the previous guideline.

      The first one is cardinality. A student will likely only maintain a few email addresses, but they can attend multiple courses during their study. After several years of attendance, there could be dozens of courses the student took part in. Plus, they’d attend these courses along with many other students who are likewise attending their own set of courses over their years of attendance.

      If you decided to embed each course like the previous example, the student’s document would quickly get unwieldy. With a higher cardinality, the embedded document approach becomes less compelling.

      The second consideration is independent access. Unlike email addresses, it’s sound to assume there would be cases in which information about university courses would need to be retrieved on their own. For instance, say someone needs information about available courses to prepare a marketing brochure. Additionally, courses will likely need to be updated over time: the professor teaching the course might change, its schedule may fluctuate, or its prerequisites might need to be updated.

      If you were to store the courses as documents embedded within student documents, retrieving the list of all the courses offered by the university would become troublesome. Also, each time a course needs an update, you would need to go through all student records and update the course information everywhere. Both are good reasons to store courses separately and not embed them fully.

      The third thing to consider is whether the relationship between student and a university course is actually one-to-many or instead many-to-many. In this case, it’s the latter, as more than one student can attend each course. This relationship’s cardinality and independent access aspects suggest against embedding each course document, primarily for practical reasons like ease of access and update. Considering the many-to-many nature of the relationship between courses and students, it might make sense to store course documents in a separate collection with unique identifiers of their own.

      The documents representing classes in this separate collection might have a structure like these examples:

      {
          "_id": ObjectId("61741c9cbc9ec583c836170a"),
          "name": "Physics 101",
          "department": "Department of Physics",
          "points": 7
      },
      {
          "_id": ObjectId("61741c9cbc9ec583c836170b"),
          "name": "Introduction to Cloud Computing",
          "department": "Department of Computer Science",
          "points": 4
      }
      

      If you decide to store course information like this, you’ll need to find a way to connect students with these courses so that you will know which students attend which courses. In cases like this where the number of related objects isn’t excessively large, especially with many-to-many relationships, one common way of doing this is to use child references.

      With child references, a student’s document will reference the object identifiers of the courses that the student attends in an embedded array, as in this example:

      {
          "_id": ObjectId("612d1e835ebee16872a109a4"),
          "first_name": "Sammy",
          "last_name": "Shark",
          "emails": [
              {
                  "email": "sammy@digitalocean.com",
                  "type": "work"
              },
              {
                  "email": "sammy@example.com",
                  "type": "home"
              }
          ],
          "courses": [
              ObjectId("61741c9cbc9ec583c836170a"),
              ObjectId("61741c9cbc9ec583c836170b")
          ]
      }
      

      Notice that this example document still has a courses field which also is an array, but instead of embedding full course documents like in the earlier example, only the identifiers referencing the course documents in the separate collection are embedded. Now, when retrieving a student document, courses will not be immediately available and will need to be queried separately. On the other hand, it’s immediately known which courses to retrieve. Also, in case any course’s details need to be updated, only the course document itself needs to be altered. All references between students and their courses will remain valid.

      Note: There is no firm rule for when the cardinality of a relation is too great to embed child references in this manner. You might choose a different approach at either a lower or higher cardinality if it’s what best suits the application in question. After all, you will always want to structure your data to suit the manner in which your application queries and updates it.

      If you model a one-to-many relationship where the amount of related documents is within reasonable bounds and related documents need to be accessed independently, favor storing the related documents separately and embedding child references to connect to them.

      Now that you’ve learned how to use child references to signify relationships between different types of data, this guide will outline an inverse concept: parent references.

      Guideline 5 — Modeling Unbounded One-to-Many Relationships with Parent References

      Using child references works well when there are too many related objects to embed them directly inside the parent document, but the amount is still within known bounds. However, there are cases when the number of associated documents might be unbounded and will continue to grow with time.

      As an example, imagine that the university’s student council has a message board where any student can post whatever messages they want, including questions about courses, travel stories, job postings, study materials, or just a free chat. A sample message in this example consists of a subject and a message body:

      {
          "_id": ObjectId("61741c9cbc9ec583c836174c"),
          "subject": "Books on kinematics and dynamics",
          "message": "Hello! Could you recommend good introductory books covering the topics of kinematics and dynamics? Thanks!",
          "posted_on": ISODate("2021-07-23T16:03:21Z")
      }
      

      You could use either of the two approaches discussed previously — embedding and child references — to model this relationship. If you were to decide on embedding, the student’s document might take a shape like this:

      {
          "_id": ObjectId("612d1e835ebee16872a109a4"),
          "first_name": "Sammy",
          "last_name": "Shark",
          "emails": [
              {
                  "email": "sammy@digitalocean.com",
                  "type": "work"
              },
              {
                  "email": "sammy@example.com",
                  "type": "home"
              }
          ],
          "courses": [
              ObjectId("61741c9cbc9ec583c836170a"),
              ObjectId("61741c9cbc9ec583c836170b")
          ],
          "message_board_messages": [
              {
                  "subject": "Books on kinematics and dynamics",
                  "message": "Hello! Could you recommend good introductory books covering the topics of kinematics and dynamics? Thanks!",
                  "posted_on": ISODate("2021-07-23T16:03:21Z")
              },
              . . .
          ]
      }
      

      However, if a student is prolific with writing messages their document will quickly become incredibly long and could easily exceed the 16MB size limit, so the cardinality of this relation suggests against embedding. Additionally, the messages might need to be accessed separately from the student, as could be the case if the message board page is designed to show the latest messages posted by students. This also suggests that embedding is not the best choice for this scenario.

      Note: You should also consider whether the message board messages are frequently accessed when retrieving the student’s document. If not, having them all embedded inside that document would incur a performance penalty when retrieving and manipulating this document, even when the list of messages would not be used often. Infrequent access of related data is often another clue that you shouldn’t embed documents.

      Now consider using child references instead of embedding full documents as in the previous example. The individual messages would be stored in a separate collection, and the student’s document could then have the following structure:

      {
          "_id": ObjectId("612d1e835ebee16872a109a4"),
          "first_name": "Sammy",
          "last_name": "Shark",
          "emails": [
              {
                  "email": "sammy@digitalocean.com",
                  "type": "work"
              },
              {
                  "email": "sammy@example.com",
                  "type": "home"
              }
          ],
          "courses": [
              ObjectId("61741c9cbc9ec583c836170a"),
              ObjectId("61741c9cbc9ec583c836170b")
          ],
          "message_board_messages": [
              ObjectId("61741c9cbc9ec583c836174c"),
              . . .
          ]
      }
      

      In this example, the message_board_messages field now stores the child references to all messages written by Sammy. However, changing the approach solves only one of the issues mentioned before in that it would now be possible to access the messages independently. But although the student’s document size would grow more slowly using the child references approach, the collection of object identifiers could also become unwieldy given the unbounded cardinality of this relation. A student could easily write thousands of messages during their four years of study, after all.

      In such scenarios, a common way to connect one object to another is through parent references. Unlike the child references described previously, it’s now not the student document referring to individual messages, but rather a reference in the message’s document pointing towards the student that wrote it.

      To use parent references, you would need to modify the message document schema to contain a reference to the student who authored the message:

      {
          "_id": ObjectId("61741c9cbc9ec583c836174c"),
          "subject": "Books on kinematics and dynamics",
          "message": "Hello! Could you recommend a good introductory books covering the topics of kinematics and dynamics? Thanks!",
          "posted_on": ISODate("2021-07-23T16:03:21Z"),
          "posted_by": ObjectId("612d1e835ebee16872a109a4")
      }
      

      Notice the new posted_by field contains the object identifier of the student’s document. Now, the student’s document won’t contain any information about the messages they’ve posted:

      {
          "_id": ObjectId("612d1e835ebee16872a109a4"),
          "first_name": "Sammy",
          "last_name": "Shark",
          "emails": [
              {
                  "email": "sammy@digitalocean.com",
                  "type": "work"
              },
              {
                  "email": "sammy@example.com",
                  "type": "home"
              }
          ],
          "courses": [
              ObjectId("61741c9cbc9ec583c836170a"),
              ObjectId("61741c9cbc9ec583c836170b")
          ]
      }
      

      To retrieve the list of messages written by a student, you would use a query on the messages collection and filter against the posted_by field. Having them in a separate collection makes it safe to let the list of messages grow without affecting any of the student’s documents.

      Note: When using parent references, creating an index on the field referencing the parent document can significantly increase the query performance each time you filter against the parent document identifier.

      If you model a one-to-many relationship where the amount of related documents is unbounded, regardless of whether the documents need to be accessed independently, it’s generally advised that you store related documents separately and use parent references to connect them to the parent document.

      Conclusion

      Thanks to the flexibility of document-oriented databases, determining the best way to model relationships in a document databases is less of a strict science than it is in a relational database. By reading this article, you’ve acquainted yourself with embedding documents and using child and parent references to store related data. You’ve learned about considering the relationship cardinality and avoiding unbounded arrays, as well as taking into account whether the document will be accessed separately or frequently.

      These are just a few guidelines that can help you model typical relationships in MongoDB, but modeling database schema is not a one size fits all. Always take into account your application and how it uses and updates the data when designing the schema.

      To learn more about schema design and common patterns for storing different kinds of data in MongoDB, we encourage you to check the official MongoDB documentation on that topic.



      Source link

      SOLID: os primeiros 5 princípios do design orientado a objeto


      Introdução

      SOLID é uma sigla para os primeiros cinco princípios do design orientado a objeto (OOD) criada por Robert C. Martin (também conhecido como Uncle Bob).

      Nota: embora esses princípios sejam aplicáveis a várias linguagens de programação, o código de amostra contido neste artigo usará o PHP.

      Esses princípios estabelecem práticas que contribuem para o desenvolvimento de software com considerações de manutenção e extensão à medida que o projeto cresce. A adoção dessas práticas também pode contribuir para evitar problemas de código, refatoração de código e o desenvolvimento ágil e adaptativo de software.

      SOLID significa:

      Neste artigo, cada princípio será apresentado individualmente para que você compreenda como o SOLID pode ajudá-lo(a) a melhorar como desenvolvedor(a).

      Princípio da responsabilidade única

      O Princípio da responsabilidade única (SRP) declara:

      Uma classe deve ter um e apenas um motivo para mudar, o que significa que uma classe deve ter apenas uma função.

      Por exemplo, considere um aplicativo que recebe uma coleção de formas — círculos e quadrados — e calcula a soma da área de todas as formas na coleção.

      Primeiramente, crie as classes de formas e faça com que os construtores configurem os parâmetros necessários.

      Para quadrados, será necessário saber o length (comprimento) de um lado:

      class Square
      {
          public $length;
      
          public function construct($length)
          {
              $this->length = $length;
          }
      }
      

      Para os círculos, será necessário saber o radius (raio):

      class Circle
      {
          public $radius;
      
          public function construct($radius)
          {
              $this->radius = $radius;
          }
      }
      

      Em seguida, crie a classe AreaCalculator e então escreva a lógica para somar as áreas de todas as formas fornecidas. A área de um quadrado é calculada pelo quadrado do comprimento. A área de um círculo é calculada por pi multiplicado pelo quadrado do raio.

      class AreaCalculator
      {
          protected $shapes;
      
          public function __construct($shapes = [])
          {
              $this->shapes = $shapes;
          }
      
          public function sum()
          {
              foreach ($this->shapes as $shape) {
                  if (is_a($shape, 'Square')) {
                      $area[] = pow($shape->length, 2);
                  } elseif (is_a($shape, 'Circle')) {
                      $area[] = pi() * pow($shape->radius, 2);
                  }
              }
      
              return array_sum($area);
          }
      
          public function output()
          {
              return implode('', [
                '',
                    'Sum of the areas of provided shapes: ',
                    $this->sum(),
                '',
            ]);
          }
      }
      

      Para usar a classe AreaCalculator, será necessário criar uma instância da classe, passar uma matriz de formas e exibir o resultado no final da página.

      Aqui está um exemplo com uma coleção de três formas:

      • um círculo com um raio de 2
      • um quadrado com um comprimento de 5
      • um segundo quadrado com um comprimento de 6
      $shapes = [
        new Circle(2),
        new Square(5),
        new Square(6),
      ];
      
      $areas = new AreaCalculator($shapes);
      
      echo $areas->output();
      

      O problema com o método de saída é que o AreaCalculator manuseia a lógica para gerar os dados.

      Considere um cenário onde o resultado deve ser convertido em outro formato, como o JSON.

      Toda a lógica seria manuseada pela classe AreaCalculator. Isso violaria o princípio da responsabilidade única. A classe AreaCalculator deve estar preocupada somente com a soma das áreas das formas fornecidas. Ela não deve se importar se o usuário quer JSON ou HTML.

      Para resolver isso, crie uma classe separada chamada SumCalculatorOutputter e use essa nova classe para lidar com a lógica necessária para gerar os dados para o usuário:

      class SumCalculatorOutputter
      {
          protected $calculator;
      
          public function __constructor(AreaCalculator $calculator)
          {
              $this->calculator = $calculator;
          }
      
          public function JSON()
          {
              $data = [
                'sum' => $this->calculator->sum(),
            ];
      
              return json_encode($data);
          }
      
          public function HTML()
          {
              return implode('', [
                '',
                    'Sum of the areas of provided shapes: ',
                    $this->calculator->sum(),
                '',
            ]);
          }
      }
      

      A classe SumCalculatorOutputter funcionaria da seguinte forma:

      $shapes = [
        new Circle(2),
        new Square(5),
        new Square(6),
      ];
      
      $areas = new AreaCalculator($shapes);
      $output = new SumCalculatorOutputter($areas);
      
      echo $output->JSON();
      echo $output->HTML();
      

      Agora, a lógica necessária para gerar os dados para o usuário é manuseada pela classe SumCalculatorOutputter.

      Isso satisfaz o princípio da responsabilidade única.

      Princípio do aberto-fechado

      O Princípio do aberto-fechado (S.R.P.) declara:

      Os objetos ou entidades devem estar abertos para extensão, mas fechados para modificação.

      Isso significa que uma classe deve ser extensível sem que seja modificada.

      Vamos revisitar a classe AreaCalculator e focar no método sum(soma):

      class AreaCalculator
      {
          protected $shapes;
      
          public function __construct($shapes = [])
          {
              $this->shapes = $shapes;
          }
      
          public function sum()
          {
              foreach ($this->shapes as $shape) {
                  if (is_a($shape, 'Square')) {
                      $area[] = pow($shape->length, 2);
                  } elseif (is_a($shape, 'Circle')) {
                      $area[] = pi() * pow($shape->radius, 2);
                  }
              }
      
              return array_sum($area);
          }
      }
      

      Considere um cenário onde o usuário deseja a sum de formas adicionais, como triângulos, pentágonos, hexágonos, etc. Seria necessário editar constantemente este arquivo e adicionar mais blocos de if/else. Isso violaria o princípio do aberto-fechado.

      Uma maneira de tornar esse método sum melhor é remover a lógica para calcular a área de cada forma do método da classe AreaCalculator e anexá-la à classe de cada forma.

      Aqui está o método area definido em Square:

      class Square
      {
          public $length;
      
          public function __construct($length)
          {
              $this->length = $length;
          }
      
          public function area()
          {
              return pow($this->length, 2);
          }
      }
      

      E aqui está o método area definido em Circle:

      class Circle
      {
          public $radius;
      
          public function construct($radius)
          {
              $this->radius = $radius;
          }
      
          public function area()
          {
              return pi() * pow($shape->radius, 2);
          }
      }
      

      O método sum para AreaCalculator pode então ser reescrito como:

      class AreaCalculator
      {
          // ...
      
          public function sum()
          {
              foreach ($this->shapes as $shape) {
                  $area[] = $shape->area();
              }
      
              return array_sum($area);
          }
      }
      

      Agora, é possível criar outra classe de formas e a passar ao calcular a soma sem quebrar o código.

      No entanto, outro problema surge. Como saber que o objeto passado para o AreaCalculator é na verdade uma forma ou se a forma possui um método chamado area?

      Programar em uma interface é uma parte integral do SOLID.

      Crie uma ShapeInterface que suporte area:

      interface ShapeInterface
      {
          public function area();
      }
      

      Modifique suas classes de formas para implement (implementar) a ShapeInterface.

      Aqui está a atualização para Square:

      class Square implements ShapeInterface
      {
          // ...
      }
      

      E aqui está a atualização para Circle:

      class Circle implements ShapeInterface
      {
          // ...
      }
      

      No método sum para AreaCalculator, verifique se as formas fornecidas são na verdade instâncias de ShapeInterface; caso contrário, lance uma exceção:

       class AreaCalculator
      {
          // ...
      
          public function sum()
          {
              foreach ($this->shapes as $shape) {
                  if (is_a($shape, 'ShapeInterface')) {
                      $area[] = $shape->area();
                      continue;
                  }
      
                  throw new AreaCalculatorInvalidShapeException();
              }
      
              return array_sum($area);
          }
      }
      

      Isso satisfaz o princípio do aberto-fechado.

      Princípio da substituição de Liskov

      O Princípio da substituição de Liskov declara:

      Seja q(x) uma propriedade demonstrável sobre objetos de x do tipo T. Então q(y) deve ser demonstrável para objetos y do tipo S onde S é um subtipo de T.

      Isso significa que cada subclasse ou classe derivada deve ser substituível pela classe sua classe base ou pai.

      Analisando novamente a classe de exemplo AreaCalculator, considere uma nova classe VolumeCalculator que estende a classe AreaCalculator:

      class VolumeCalculator extends AreaCalculator
      {
          public function construct($shapes = [])
          {
              parent::construct($shapes);
          }
      
          public function sum()
          {
              // logic to calculate the volumes and then return an array of output
              return [$summedData];
          }
      }
      

      Lembre-se que a classe SumCalculatorOutputter se assemelha a isto:

      class SumCalculatorOutputter {
          protected $calculator;
      
          public function __constructor(AreaCalculator $calculator) {
              $this->calculator = $calculator;
          }
      
          public function JSON() {
              $data = array(
                  'sum' => $this->calculator->sum();
              );
      
              return json_encode($data);
          }
      
          public function HTML() {
              return implode('', array(
                  '',
                      'Sum of the areas of provided shapes: ',
                      $this->calculator->sum(),
                  ''
              ));
          }
      }
      

      Se você tentar executar um exemplo como este:

      $areas = new AreaCalculator($shapes);
      $volumes = new VolumeCalculator($solidShapes);
      
      $output = new SumCalculatorOutputter($areas);
      $output2 = new SumCalculatorOutputter($volumes);
      

      Quando chamar o método HTML no objeto $output2, você irá obter um erro E_NOTICE informando uma conversão de matriz em string.

      Para corrigir isso, em vez de retornar uma matriz do método de soma de classe VolumeCalculator, retorne $summedData:

      class VolumeCalculator extends AreaCalculator
      {
          public function construct($shapes = [])
          {
              parent::construct($shapes);
          }
      
          public function sum()
          {
              // logic to calculate the volumes and then return a value of output
              return $summedData;
          }
      }
      

      O $summedData pode ser um float, duplo ou inteiro.

      Isso satisfaz o princípio da substituição de Liskov.

      Princípio da segregação de interfaces

      O Princípio da segregação de interfaces declara:

      Um cliente nunca deve ser forçado a implementar uma interface que ele não usa, ou os clientes não devem ser forçados a depender de métodos que não usam.

      Ainda utilizando o exemplo anterior do ShapeInterface, você precisará suportar as novas formas tridimensionais Cuboid e Spheroid, e essas formas também precisarão ter o volume calculado.

      Vamos considerar o que aconteceria se você modificasse a ShapeInterface para adicionar outro contrato:

      interface ShapeInterface
      {
          public function area();
      
          public function volume();
      }
      

      Agora, qualquer forma criada deve implementar o método volume, mas você sabe que os quadrados são formas planas que não têm volume, de modo que essa interface forçaria a classe Square a implementar um método sem utilidade para ela.

      Isso violaria o princípio da segregação de interfaces. Ao invés disso, você poderia criar outra interface chamada ThreeDimensionalShapeInterface que possui o contrato volume e as formas tridimensionais poderiam implementar essa interface:

      interface ShapeInterface
      {
          public function area();
      }
      
      interface ThreeDimensionalShapeInterface
      {
          public function volume();
      }
      
      class Cuboid implements ShapeInterface, ThreeDimensionalShapeInterface
      {
          public function area()
          {
              // calculate the surface area of the cuboid
          }
      
          public function volume()
          {
              // calculate the volume of the cuboid
          }
      }
      

      Essa é uma abordagem muito mais vantajosa, mas uma armadilha a ser observada é quando sugerir o tipo dessas interfaces. Ao invés de usar uma ShapeInterface ou uma ThreeDimensionalShapeInterface, você pode criar outra interface, talvez ManageShapeInterface, e implementá-la tanto nas formas planas quanto tridimensionais.

      Dessa forma, é possível ter uma única API para gerenciar todas as formas:

      interface ManageShapeInterface
      {
          public function calculate();
      }
      
      class Square implements ShapeInterface, ManageShapeInterface
      {
          public function area()
          {
              // calculate the area of the square
          }
      
          public function calculate()
          {
              return $this->area();
          }
      }
      
      class Cuboid implements ShapeInterface, ThreeDimensionalShapeInterface, ManageShapeInterface
      {
          public function area()
          {
              // calculate the surface area of the cuboid
          }
      
          public function volume()
          {
              // calculate the volume of the cuboid
          }
      
          public function calculate()
          {
              return $this->area();
          }
      }
      

      Agora, na classe AreaCalculator, substitua a chamada do método area por calculate e verifique se o objeto é uma instância da ManageShapeInterface e não da ShapeInterface.

      Isso satisfaz o princípio da segregação de interfaces.

      Princípio da inversão de dependência

      O princípio da inversão de dependência declara:

      As entidades devem depender de abstrações, não de implementações. Ele declara que o módulo de alto nível não deve depender do módulo de baixo nível, mas devem depender de abstrações.

      Esse princípio permite a desestruturação.

      Aqui está um exemplo de um PasswordReminder que se conecta a um banco de dados MySQL:

      class MySQLConnection
      {
          public function connect()
          {
              // handle the database connection
              return 'Database connection';
          }
      }
      
      class PasswordReminder
      {
          private $dbConnection;
      
          public function __construct(MySQLConnection $dbConnection)
          {
              $this->dbConnection = $dbConnection;
          }
      }
      

      Primeiramente, o MySQLConnection é o módulo de baixo nível, enquanto o PasswordReminder é de alto nível. No entanto, de acordo com a definição de D em SOLID, que declara Dependa de abstrações e não de implementações, Esse trecho de código acima viola esse princípio, uma vez que a classe PasswordReminder está sendo forçada a depender da classe MySQLConnection.

      Mais tarde, se você alterasse o mecanismo do banco de dados, também teria que editar a classe PasswordReminder e isso violaria o princípio do aberto-fechado.

      A classe PasswordReminder não deve se importar com qual banco de dados seu aplicativo usa. Para resolver esses problemas, programe em uma interface, uma vez que os módulos de alto e baixo nível devem depender de abstrações:

      interface DBConnectionInterface
      {
          public function connect();
      }
      

      A interface possui um método de conexão e a classe MySQLConnection implementa essa interface. Além disso, em vez de sugerir o tipo diretamente da classe MySQLConnection no construtor do PasswordReminder, você sugere o tipo de DBConnectionInterface. Sendo assim, independentemente do tipo de banco de dados que seu aplicativo usa, a classe PasswordReminder poderá se conectar ao banco de dados sem problemas e o princípio do aberto-fechado não será violado.

      class MySQLConnection implements DBConnectionInterface
      {
          public function connect()
          {
              // handle the database connection
              return 'Database connection';
          }
      }
      
      class PasswordReminder
      {
          private $dbConnection;
      
          public function __construct(DBConnectionInterface $dbConnection)
          {
              $this->dbConnection = $dbConnection;
          }
      }
      

      Esse código estabelece que tanto os módulos de alto quanto de baixo nível dependem de abstrações.

      Conclusão

      Neste artigo, os cinco princípios do Código SOLID foram-lhe apresentados. Projetos que aderem aos princípios SOLID podem ser compartilhados com colaboradores, estendidos, modificados, testados e refatorados com menos complicações.

      Continue seu aprendizado lendo sobre outras práticas para o desenvolvimento de software ágil e adaptativo.



      Source link

      S.O.L.I.D: The First 5 Principles of Object Oriented Design


      S.O.L.I.D is an acronym for the first five object-oriented design(OOD)** principles** by Robert C. Martin, popularly known as Uncle Bob.

      These principles, when combined together, make it easy for a programmer to develop software that are easy to maintain and extend. They also make it easy for developers to avoid code smells, easily refactor code, and are also a part of the agile or adaptive software development.

      S.O.L.I.D stands for:

      Let’s look at each principle individually to understand why S.O.L.I.D can help make us better developers.

      Single-responsibility Principle

      S.R.P for short – this principle states that:

      A class should have one and only one reason to change, meaning that a class should have only one job.

      For example, say we have some shapes and we wanted to sum all the areas of the shapes. Well this is pretty simple right?

      class Circle {
          public $radius;
      
          public function construct($radius) {
              $this->radius = $radius;
          }
      }
      
      class Square {
          public $length;
      
          public function construct($length) {
              $this->length = $length;
          }
      }
      

      First, we create our shapes classes and have the constructors setup the required parameters. Next, we move on by creating the AreaCalculator class and then write up our logic to sum up the areas of all provided shapes.

      class AreaCalculator {
      
          protected $shapes;
      
          public function __construct($shapes = array()) {
              $this->shapes = $shapes;
          }
      
          public function sum() {
              // logic to sum the areas
          }
      
          public function output() {
              return implode('', array(
                  "",
                      "Sum of the areas of provided shapes: ",
                      $this->sum(),
                  ""
              ));
          }
      }
      

      To use the AreaCalculator class, we simply instantiate the class and pass in an array of shapes, and display the output at the bottom of the page.

      $shapes = array(
          new Circle(2),
          new Square(5),
          new Square(6)
      );
      
      $areas = new AreaCalculator($shapes);
      
      echo $areas->output();
      

      The problem with the output method is that the AreaCalculator handles the logic to output the data. Therefore, what if the user wanted to output the data as json or something else?

      All of that logic would be handled by the AreaCalculator class, this is what SRP frowns against; the AreaCalculator class should only sum the areas of provided shapes, it should not care whether the user wants json or HTML.

      So, to fix this you can create an SumCalculatorOutputter class and use this to handle whatever logic you need to handle how the sum areas of all provided shapes are displayed.

      The SumCalculatorOutputter class would work like this:

      $shapes = array(
          new Circle(2),
          new Square(5),
          new Square(6)
      );
      
      $areas = new AreaCalculator($shapes);
      $output = new SumCalculatorOutputter($areas);
      
      echo $output->JSON();
      echo $output->HAML();
      echo $output->HTML();
      echo $output->JADE();
      

      Now, whatever logic you need to output the data to the user is now handled by the SumCalculatorOutputter class.

      Open-closed Principle

      Objects or entities should be open for extension, but closed for modification.

      This simply means that a class should be easily extendable without modifying the class itself. Let’s take a look at the AreaCalculator class, especially it’s sum method.

      public function sum() {
          foreach($this->shapes as $shape) {
              if(is_a($shape, 'Square')) {
                  $area[] = pow($shape->length, 2);
              } else if(is_a($shape, 'Circle')) {
                  $area[] = pi() * pow($shape->radius, 2);
              }
          }
      
          return array_sum($area);
      }
      

      If we wanted the sum method to be able to sum the areas of more shapes, we would have to add more if/else blocks and that goes against the Open-closed principle.

      A way we can make this sum method better is to remove the logic to calculate the area of each shape out of the sum method and attach it to the shape’s class.

      class Square {
          public $length;
      
          public function __construct($length) {
              $this->length = $length;
          }
      
          public function area() {
              return pow($this->length, 2);
          }
      }
      

      The same thing should be done for the Circle class, an area method should be added. Now, to calculate the sum of any shape provided should be as simple as:

      public function sum() {
          foreach($this->shapes as $shape) {
              $area[] = $shape->area();
          }
      
          return array_sum($area);
      }
      

      Now we can create another shape class and pass it in when calculating the sum without breaking our code. However, now another problem arises, how do we know that the object passed into the AreaCalculator is actually a shape or if the shape has a method named area?

      Coding to an interface is an integral part of S.O.L.I.D, a quick example is we create an interface, that every shape implements:

      interface ShapeInterface {
          public function area();
      }
      
      class Circle implements ShapeInterface {
          public $radius;
      
          public function __construct($radius) {
              $this->radius = $radius;
          }
      
          public function area() {
              return pi() * pow($this->radius, 2);
          }
      }
      

      In our AreaCalculator sum method we can check if the shapes provided are actually instances of the ShapeInterface, otherwise we throw an exception:

      public function sum() {
          foreach($this->shapes as $shape) {
              if(is_a($shape, 'ShapeInterface')) {
                  $area[] = $shape->area();
                  continue;
              }
      
              throw new AreaCalculatorInvalidShapeException;
          }
      
          return array_sum($area);
      }
      

      Liskov substitution principle

      Let q(x) be a property provable about objects of x of type T. Then q(y) should be provable for objects y of type S where S is a subtype of T.

      All this is stating is that every subclass/derived class should be substitutable for their base/parent class.

      Still making use of out AreaCalculator class, say we have a VolumeCalculator class that extends the AreaCalculator class:

      class VolumeCalculator extends AreaCalulator {
          public function construct($shapes = array()) {
              parent::construct($shapes);
          }
      
          public function sum() {
              // logic to calculate the volumes and then return and array of output
              return array($summedData);
          }
      }
      

      In the SumCalculatorOutputter class:

      class SumCalculatorOutputter {
          protected $calculator;
      
          public function __constructor(AreaCalculator $calculator) {
              $this->calculator = $calculator;
          }
      
          public function JSON() {
              $data = array(
                  'sum' => $this->calculator->sum();
              );
      
              return json_encode($data);
          }
      
          public function HTML() {
              return implode('', array(
                  '',
                      'Sum of the areas of provided shapes: ',
                      $this->calculator->sum(),
                  ''
              ));
          }
      }
      

      If we tried to run an example like this:

      $areas = new AreaCalculator($shapes);
      $volumes = new AreaCalculator($solidShapes);
      
      $output = new SumCalculatorOutputter($areas);
      $output2 = new SumCalculatorOutputter($volumes);
      

      The program does not squawk, but when we call the HTML method on the $output2 object we get an E_NOTICE error informing us of an array to string conversion.

      To fix this, instead of returning an array from the VolumeCalculator class sum method, you should simply:

      public function sum() {
          // logic to calculate the volumes and then return and array of output
          return $summedData;
      }
      

      The summed data as a float, double or integer.

      Interface segregation principle

      A client should never be forced to implement an interface that it doesn’t use or clients shouldn’t be forced to depend on methods they do not use.

      Still using our shapes example, we know that we also have solid shapes, so since we would also want to calculate the volume of the shape, we can add another contract to the ShapeInterface:

      interface ShapeInterface {
          public function area();
          public function volume();
      }
      

      Any shape we create must implement the volume method, but we know that squares are flat shapes and that they do not have volumes, so this interface would force the Square class to implement a method that it has no use of.

      ISP says no to this, instead you could create another interface called SolidShapeInterface that has the volume contract and solid shapes like cubes e.t.c can implement this interface:

      interface ShapeInterface {
          public function area();
      }
      
      interface SolidShapeInterface {
          public function volume();
      }
      
      class Cuboid implements ShapeInterface, SolidShapeInterface {
          public function area() {
              // calculate the surface area of the cuboid
          }
      
          public function volume() {
              // calculate the volume of the cuboid
          }
      }
      

      This is a much better approach, but a pitfall to watch out for is when type-hinting these interfaces, instead of using a ShapeInterface or a SolidShapeInterface.

      You can create another interface, maybe ManageShapeInterface, and implement it on both the flat and solid shapes, this way you can easily see that it has a single API for managing the shapes. For example:

      interface ManageShapeInterface {
          public function calculate();
      }
      
      class Square implements ShapeInterface, ManageShapeInterface {
          public function area() { /Do stuff here/ }
      
          public function calculate() {
              return $this->area();
          }
      }
      
      class Cuboid implements ShapeInterface, SolidShapeInterface, ManageShapeInterface {
          public function area() { /Do stuff here/ }
          public function volume() { /Do stuff here/ }
      
          public function calculate() {
              return $this->area() + $this->volume();
          }
      }
      

      Now in AreaCalculator class, we can easily replace the call to the area method with calculate and also check if the object is an instance of the ManageShapeInterface and not the ShapeInterface.

      Dependency Inversion principle

      The last, but definitely not the least states that:

      Entities must depend on abstractions not on concretions. It states that the high level module must not depend on the low level module, but they should depend on abstractions.

      This might sound bloated, but it is really easy to understand. This principle allows for decoupling, an example that seems like the best way to explain this principle:

      class PasswordReminder {
          private $dbConnection;
      
          public function __construct(MySQLConnection $dbConnection) {
              $this->dbConnection = $dbConnection;
          }
      }
      

      First the MySQLConnection is the low level module while the PasswordReminder is high level, but according to the definition of D in S.O.L.I.D. which states that Depend on Abstraction not on concretions, this snippet above violates this principle as the PasswordReminder class is being forced to depend on the MySQLConnection class.

      Later if you were to change the database engine, you would also have to edit the PasswordReminder class and thus violates Open-close principle.

      The PasswordReminder class should not care what database your application uses, to fix this again we “code to an interface”, since high level and low level modules should depend on abstraction, we can create an interface:

      interface DBConnectionInterface {
          public function connect();
      }
      

      The interface has a connect method and the MySQLConnection class implements this interface, also instead of directly type-hinting MySQLConnection class in the constructor of the PasswordReminder, we instead type-hint the interface and no matter the type of database your application uses, the PasswordReminder class can easily connect to the database without any problems and OCP is not violated.

      class MySQLConnection implements DBConnectionInterface {
          public function connect() {
              return "Database connection";
          }
      }
      
      class PasswordReminder {
          private $dbConnection;
      
          public function __construct(DBConnectionInterface $dbConnection) {
              $this->dbConnection = $dbConnection;
          }
      }
      

      According to the little snippet above, you can now see that both the high level and low level modules depend on abstraction.

      Conclusion

      S.O.L.I.D might seem to be a bit too abstract at first, but with each real-world application of S.O.L.I.D. principles, the benefits of adherence to its guidelines will become more apparent. Code that follows S.O.L.I.D. principles can more easily be shared with collaborators, extended, modified, tested, and refactored without any problems.



      Source link