ERD diagrams, pt. I: many-to-many relationships
Author: Alexey Makhotkin squadette@gmail.com.
I started writing a long post on how to design correct ERD diagrams based on the approach from the “Database Design Book”, but the text got a bit unwieldy. So I’m going to regroup and focus on one part: many-to-many relationships (“M:N links” in book terms).
Suppose that you need to build an ERD diagram based on some sort of real-world or teaching task. How do you make sure that your ERD diagram is correct?
One approach that you can try is:
- Step 1. Build a logical model using the approach from “Database Design Book”;
- Step 1b. Add a “ERD notation” column to each list: anchors, attributes, and links;
- Step 2. Fill in the “ERD notation” column, using text, such as “rectangle with ‘User’ inside”;
- Step 3. Draw the ERD diagram, placing each fragment of notation on the canvas.
You can update or fix the logical model if needed (e.g. if you forgot or misunderstood something in the business requirements). After that you update the “ERD notation” column, and make corresponding changes to the ERD diagram itself.
As a bonus, the logical model can also be used to design the physical database schema.
In this text we’ll discuss only one specific part: many-to-many relationships.
Table of contents
Business requirements
Let’s begin with business requirements. A classic example of many-to-many relationships is projects and developers. Suppose that we implement a project-management tool. We can create some projects, we can register some people who are going to work on those projects (call them “developers”). One project can have multiple developers assigned, and one developer can be assigned to multiple projects.
There would be a lot of complexity in any real-world project management tool, but we’re going to use an extremely limited subset of the functionality:
- we track projects and developers;
- projects can have name (and nothing else for now);
- developers have a name (and nothing else for now);
- developers can be assigned to multiple projects, and projects can have many developers working on them.
Adding additional attributes, entities and relationships is not very hard if you just follow the process.
Step 1. Logical model
First, clarification is needed. In the world of ER-diagrams (“entity-relationship diagrams”) we have:
- entities;
- attributes;
- relationships.
In the world of Minimal Modeling (that the “Database Design Book” is based on) we have:
- anchors;
- attributes;
- links.
The terms were carefully chosen, and there are reasons why we don’t use the commonly-accepted “entities” and “relationships” terms. We’ll discuss some differences later in the text.
Reading through the business requirements, we find two obvious anchors: projects and developers. They are nouns, and they could be counted (e.g., “We have 10 projects”, “We’ve just added two more developers”, etc.). Anchors roughly correspond to entities.
Anchor | ID example | ERD notation |
---|---|---|
Developer | 1, 2, 3, … | <TBD> |
Project | 1, 2, 3, … | <TBD> |
The next step is attributes. Here we actually don’t really need to discuss attributes, but we add them so that the diagram looks a bit more plausible. Using the “where is it stored?” approach, we find attributes. We describe attributes using formalized questions, and we select the corresponding logical type for them.
Anchor | Question | Logical type | Example value | ERD notation |
---|---|---|---|---|
Developer | What is the name of the Developer? | string | “B. Meyer” | <TBD> |
Project | What is the name of the Project? | string | <TBD> |
Finally, the topic of this text: links (links more or less correspond to relationships). There are only two anchors, so finding all the links between them is easy. Only one link could be found in our business requirements.
Anchor1 : Anchor2 | Cardi- nality |
Sentences | ERD notation |
---|---|---|---|
Developer : Project | M:N | A Developer is assigned to several Projects A Project could have several Developers assigned |
<TBD> |
Step 2. “ERD Notation” column
Our logical model consists of two anchors, two attributes and one link. Our ERD diagram is going to have roughly five graphical elements (but some graphical elements would be composite, we’ll see that in a moment).
There are many accepted ERD notations, both formalized and informal. Let’s focus on one specific notation for “conceptual ER diagrams”: https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning. Here, entities are rectangles, attributes are ellipses, and the relationships are diamonds and the lines. We also add another common variation of notation: mark cardinality of the relationships with labels such as “1”, “M”, “N”, etc.
You can easily use any other ERD notation, the method is notation independent.
Let’s go over each entry in our logical model, and fill in the “ERD notation” columns.
Anchors / Entities
Anchors usually directly correspond to entities and a special ID attribute. Because of that, every anchor is represented the same: a rectangle with the name of the anchor, plus an ellipse labeled “ID”, underlined. Also, there is a line connecting that rectangle and the ellipse.
Anchor | ID example | ERD notation |
---|---|---|
Developer | 1, 2, 3, … | a rectangle labelled “Developer”, and an ellipse labelled “ID”, underlined; connected by a line. |
Project | 1, 2, 3, … | a rectangle labelled “Project”, and an ellipse labelled “ID”, underlined; connected by a line. |
If your database model is long, you probably don’t want to tediously copy-paste the words in the “ERD notation” column, because they don’t really change. As you better understand the approach, you can use any shorthand notation for that, for example:
Attributes
Every attribute is also presented the same: an ellipse labelled with the short name of the attribute, connected by the line to its anchor rectangle. The name of the attribute is not underlined.
Anchor | Question | Logical type | Example value | ERD notation |
---|---|---|---|---|
Developer | What is the name of the Developer? | string | “B. Meyer” | An ellipse labelled “Name”, connected to its anchor rectangle |
Project | What is the name of the Project? | string | An ellipse labelled “Name”, connected to its anchor rectangle |
Here the main problem that you have is to choose an actual label for the ellipse. The name must be short but readable. We’ll talk more about that later.
Again, you probably want to save some tedious typing, so you can use shorthand notation such as “( Name )“.
Links / Relationships
Links correspond to relationships. They are represented by a diamond, labelled by a short label, that connects two corresponding anchor rectangles.
Here we have a M:N link (many-to-many relationship). Other cardinalities may be presented differently, we’ll discuss that in a longer text about ERD modeling.
Anchor1 : Anchor2 | Cardi-nality | Sentences | ERD notation |
---|---|---|---|
Developer : Project | M:N | A Developer is assigned to several Projects A Project could have several Developers assigned |
A diamond labelled “Assigned to”, connected to corresponding anchor rectangles. Lines are marked as “M” and “N”. |
You have to find a short label for each link — this may sometimes be challenging. We’ll talk more about that later.
Step 3. Drawing the ERD diagram
Now you can open your favorite diagramming tool (dozens of them), and again go over each anchor/attribute/link and place the ERD notation fragments on the canvas:
Cross-checking with the logical model: two rectangles for anchors, and also two ellipses, labelled “ID” and underlined. Two labelled ellipses, for each attribute. One diamond connected to rectangles, labeled as “Assigned to”, with lines marked as “M” and “N”, corresponds to a single link.
Done!
What’s next?
In the follow-up updates to this text we’ll cover more topics.
Physical ERD diagrams: from the same logical model we’re going to construct a different ERD diagram that shows physical database tables and connections between them.
We need to discuss how much information is available via the ERD diagram, compared to the tabular logical model. The logical model contains lots of human-readable text in the form of anchor names, descriptive questions and sentences, and also example values. Particularly, this information is readily available to feed to a LLM as a context for some prompting.
Why ERD diagrams are not very scalable, because bigger graphs quickly become unmanageable, as opposed to the tabular format.
Subscribe to the “Database Design Book” newsletter to receive future posts about database design: https://databasedesignbook.com/.