ERD diagrams, pt. II: physical diagrams
Author: Alexey Makhotkin squadette@gmail.com.
In the first part we’ve designed a logical ERD diagram based on the structured logical model. We built the structured logical model from the free-text business requirements.
What if we need to draw a physical ERD diagram for the same task? It turns out that we’ve already done maybe 80% of the work, and we can reuse the structured logical model verbatim. We’ll just use a different graphical notation.
(NB: You most certainly need to read the first part of this text to better understand this one.)
Table of contents
Table Design Strategy
A physical ERD diagram shows physical tables (the ones that you create with the CREATE TABLE statement). Physical tables are connected by some lines that show table relationships.
The first question is: what would the physical tables be? How many of them, and what are the fields?
To answer that question we first must decide on the table design strategy. That may sound scary, but you need to understand two things actually:
- most probably you need to use the “one table per entity/anchor” table strategy (explained here);
- there are maybe half a dozen different table design strategies (see “Other table design strategies” section below).
One table per entity/anchor
“One table per entity/anchor” is basically a textbook approach. It is also very common in real-world databases, and it’s used virtually everywhere in teaching. It’s basic and fundamental, and you need to understand it if you’re going to design databases.
It’s very simple to explain:
- for each entity/anchor there is a corresponding physical table;
- each attribute of an anchor corresponds to a column in its physical table;
- 1:N links correspond to a column in the physical table that sits on the N-side;
- for each M:N link there is a corresponding physical table with two columns;
If your task is to “draw a physical ERD diagram” then you most probably can just use this approach.
We’ll use it in this article too.
The diagram
Long story short, let me just show the final result for our toy problem (explained in the first part), and then we’ll discuss how it was created.
We’re going to reuse the same anchors/attributes/links structure as in the first part. (I’ve extended the schema a little bit: projects now belong to clients. This is needed to illustrate 1:N links.)
Anchors / Entities
In our “one table per anchor” table design strategy each anchor has its own table. (In other strategies it may not always be the case).
So, for let’s look closer at the “Projects” table (top left corner):
In the first line we see the name of the table: “Projects”. Choosing table names is often easy but sometimes not so easy (sorry for this piece of cryptic wisdom).
In the second line we see the primary key definition: “PK | id INTEGER NOT NULL”.
For the textbook approach this definition could be exactly the same for every anchor table: an integer column called “id”.
It is often used even in the real world, but primary keys could also be defined in other ways. In the book, this is discussed in sections 2.10 (“More on anchors”) and 5.7 (“Physical ID design”).
The remaining lines contain attributes of this anchor, and 1:N links that involve this anchor. Moreover, this anchor needs to be on the N-side of the link.
Now let’s look again at the logical model that we built in the first part. But now the last column is called “Physical ERD notation”, and the notation is different.
Anchor | ID example | Physical ERD notation |
---|---|---|
Developer | 1, 2, 3, … | standard anchor table called “Developers” |
Project | 1, 2, 3, … | standard anchor table called “Projects” |
Client | 1, 2, 3, … | standard anchor table called “Clients” |
In our example there are three anchors, and you can see three tables in the ERD diagram.
Attributes
In the “table per anchor” design strategy, each attribute is a column in the anchor table. (In other strategies it may not always be the case). Because of that, every attribute is also presented the same way: a line in the anchor table that contains a physical definition of the attribute, for example “name VARCHAR(50) NOT NULL”.
Here “name” is the name of the column, and the rest is data type and nullability. We chose sensible names and data types. More detailed discussion of physical attribute types could be found in section 5.4 “Attributes: choose column data types” of the book.
In our example all three attributes are called the same, but that’s a coincidence. More interesting attributes could be added, such as hourly rate of developer, project deadline and client support plan. This is left as the exercise for the reader.
Anchor | Question | Logical type | Example value | Physical ERD notation |
---|---|---|---|---|
Developer | What is the name of the Developer? | string | “B. Meyer” | Developers table: “name VARCHAR(50) NOT NULL” |
Project | What is the name of the Project? | string | “ |
Projects table: “name VARCHAR(50) NOT NULL” |
Client | What is the name of the Client? | string | “Olissipo, Inc.” | Clients table: “name VARCHAR(50) NOT NULL” |
Links / Relationships
In the “table per anchor” design strategy, 1:N and M:N links are represented very differently. ERD graphical notation reflects that.
In the “table per anchor” design strategy, 1:N link is stored in a column in the anchor table that sits on the N-side (In other strategies it may not always be the case).
M:N links, on the other hand, are represented by a separate two-column table: a column for first anchor ID and a column for second anchor ID.
Let’s look at both kinds of links in our toy example.
Anchor1 : Anchor2 | Cardi- nality |
Sentences | Physical ERD notation |
---|---|---|---|
Developer : Project | M:N | A Developer is assigned to several Projects A Project could have several Developers assigned |
“Project_Developers” table; a line connecting it to the “Projects” table; line connecting it to the “Developers” table; |
Client : Project | 1:N | A Client can have several Projects A Project belongs to only one Client |
a line connecting “Clients” and “Projects”; |
First link is outlined in blue. Yes, ALL of that is the graphical notation for a single data element (M:N link): two connecting lines and a whole separate link table.
The second link (1:N) is outlined in green, it’s just a line connecting two tables.
Now let’s look at the ends of lines connecting our tables. This is called “crow’s foot notation”. It is commonly used in physical ERD diagrams. Graphical elements represent cardinalities and whether the link is required. Detailed discussion is out of scope of this article.
Design workflow
At this point we can notice that the “Physical ERD notation” column is quite repetitive. Only the names and data types change.
If you realize that you need to add some new elements, like we did with the “clients/projects” stuff, you:
- add new anchors, attributes and links, focusing on the logical definition first;
- using the chosen table design strategy, fill in the “Physical ERD notation” column with names and data types;
- edit the ERD diagram, adding the corresponding graphical elements;
So, the source of truth for your work would basically be the tabular format. In the follow-up article we’ll discuss why the tabular format is more effective than the ERD diagram.
Why do you need the ERD diagram then? Because it’s an accepted way of showing your work to other people. You draw the diagram, and you send it to somebody for review: to your teacher, or to your colleagues, or as part of the job interview.
However, practice shows that keeping the ERD diagram up to date, and especially collaborating can be quite difficult, and ERD diagrams quickly become outdated and abandoned.
Other table design strategies
We cannot meaningfully talk about various table design strategies here. “Database Design Book” covers them in more depth:
- Chapter 3: “Building a physical schema”;
- Chapter 4: “Table-per-anchor table design strategy” (30 pages);
- Chapter 5: “Other table design strategies” (9 pages);
Here are some keywords:
- side tables;
- EAV (Entity-Attribute-Value);
- JSON-typed columns;
- physical 6NF design (aka “anchor modeling”);
If you would use a different table design strategy, your physical ERD diagram would look in a different way. Maybe some attributes would go to a side table, or there would be one EAV-table that contains many attributes.
But the logical model does not change, only the physical parts would be different. The logical ERD diagram also stays the same.
What’s next?
We need to discuss how much information is available via the ERD diagram, compared to the tabular logical model. The logical model contains lots of human-readable text in the form of anchor names, descriptive questions and sentences, and also example values. Particularly, this information is readily available to feed to a LLM as a context for some prompting.
Also, why ERD diagrams are not very scalable. Bigger graphs quickly become unmanageable, as opposed to the spreadsheet-like format.
Subscribe to the “Database Design Book” newsletter to receive future posts about database design: https://databasedesignbook.com/.