5NF and Database Design
Author: Alexey Makhotkin squadette@gmail.com, (~3900 words)
One of the goals of this publication is to deconstruct the traditional ways of teaching basic topics in relational databases. Previously we discussed the fourth normal form (4NF): “Historically, 4NF explanations are needlessly confusing”.
Let’s discuss the ultimate beast: fifth normal form (5NF). Often it’s presented even more confusingly than 4NF is, and we can show that this presentation is unnecessary, and the confusion is completely artificial.
Here is the roadmap of this post:
- It’s essential to have good, well-motivated examples. We start with a survey of examples used in various texts that present 5NF.
- Wikipedia should have a good baseline explanation of a concept, but in the case of 5NF, it does not. We discuss the problems with the Wikipedia example.
- We discuss a more logical sequence of designing tables: a) start with business requirements, b) create a logical model, c) design physical table schema.
- We discuss the “ice cream” example, and the AB-BC-AC triangle pattern that naturally appears here.
- We discuss the “musicians” example, and the ABC+D star pattern that appears here. For some business requirements, there is also a choice between using a composite or a synthetic primary key.
- Finally, we come to the conclusion that you don’t really need to involve 5NF to design your table schema. You begin with the logical model, and apply a textbook table design strategy that preserves normalization.
- In two extra chapters we discuss extending both examples, with two different teaching points.
Table of contents
Subscribe here to receive updates.
Sources
First, we need to list the teaching scenarios that are used in the commonly available sources.
Wikipedia: Salesman / Brand / Product Type, with the following relationships:
- Salesman sells product types (e.g. “vacuum cleaner”);
- Salesman sells brands (e.g. “Philips”);
- Brands offer product types;
https://en.wikipedia.org/w/index.php?title=Fifth_normal_form&oldid=1344596188#Example
Decomplexify: Ice cream brands / Flavours / Friends, with the following relationships:
- Brands offer flavours (e.g., Frosty’s offers Vanilla, Chocolate, Strawberry, and Mint);
- Friends like flavours (e.g. Jason likes vanilla and chocolate);
- Friends like brands (e.g., Jason likes Frosty’s and Alpine);
https://www.youtube.com/watch?v=GFQaEYEc8_8&t=1427s (see also the “corrections” in the pinned comment).
Data Demythed blog by Barry Johnson: Concerts / Musicians / Instruments / Performances.
- Musician participates in a Concert;
- Musician is able to play an Instrument;
- Musician plays an Instrument at a certain Concert;
https://datademythed.com/posts/5nf_missing_use_case/
(I particularly recommend the “In Conclusion” section)
Barry’s text also discusses the traditional poor treatment of 5NF. Back then both of us had a discussion in the comments, which helped me to make some progress in understanding. Here I’ll discuss my treatment of this business case, because it seems to be structurally different from the previous two examples.
Making sense of Wikipedia example
Wikipedia presents its example in a roundabout way. First it shows a three-column table (“Traveling salesman”, “Brand”, “Product type”).

What does this table mean? Let’s discuss later.
Then there is the following quote:
“In the absence of any rules restricting the valid possible combinations of traveling salespeople, brand, and product type, the three-attribute table above is necessary in order to model the situation correctly.
Suppose, however, that the following rule applies: A traveling salesperson has certain brands and certain product types in their repertoire. If brand B1 and brand B2 are in their repertoire, and product type P is in their repertoire, then (assuming brand B1 and brand B2 both make product type P), the traveling salesperson must offer product type P from both brands; that is, the salesperson cannot sell only B1’s product P or only B2’s product P.
In that case, it is possible to split the table into three:
In this case, it’s impossible for Louis Ferguson to refuse to offer vacuum cleaners made by Acme (assuming Acme makes vacuum cleaners) if he sells anything else made by Acme (lava lamp) and he also sells vacuum cleaners made by any other brand (Robusto).”
The “impossible to refuse” wording, and the entire premise of “must offer product type P from both brands” describe a weird scenario. I am not sure if it ever happens in real life.
Setting the tables aside: the rule itself is weird, it means that we cannot handle the normal case. If I want to start selling Robusto vacuum cleaners then I must also start selling Acme vacuum cleaners. If I want to sell Acme breadboxes, I must also sell Acme vacuum cleaners.
This makes no business sense, and because of that you won’t understand 5NF from this example. Nowadays I probably understand what this contrived wording is trying to achieve pedagogically, but I think that it’s failing at that. It’s not useful for understanding database design.
Begin with logical model
The entire premise is illogical. A table is presented, and we ask: what could be the meaning of this table? Then we split this table into three tables, and we ask again: what could be the meaning of those tables? We ask: what if there was this extra rule that makes very little sense, if we’d take it into account would we interpret the tables differently?
Instead, we must begin with the logical model that corresponds to the actual business scenario. When the logical model is complete, we can build a physical schema, using a textbook table design strategy. The result would be perfectly normalized: no redundancy, and no anomalies.
Two logical design patterns arise around 5NF:
- AB-BC-AC triangle (see the “ice cream” example below), and
- ABC+D star pattern (see the “musicians” example).
AB-BC-AC triangle: the “ice cream” example
Here is my rephrasing of the “ice cream” example, presented in the Decomplexify video on database normalization:
“There are several ice cream brands on the market, e.g. “Frosty’s”, “Alpine”, “Ice Queen”, etc. Each brand produces one or more flavours of ice cream, e.g. vanilla, strawberry, and rum raisin. We want to create a database of our friends’ ice cream preferences. Each friend likes some brands, and also some flavours. We assume that those preferences intersect: that is, if a friend likes brands A and B and flavours 1 and 2 that means that they like exactly A-1, A-2, B-1, and B-2 (restricted to pairs the brand actually produces). They won’t like A-3 or C-1.”
Two things to note: first, this example is more plausible than Wikipedia’s one, we’ll take it. Second, there is no requirement to handle specific tastes. For example, we don’t record the fact that the aforementioned friend likes D-4 additionally, nor the fact that they don’t like B-1.
Let’s write down the logical model that corresponds to the description above, using notation from “Database Design Book”. We’ll have three anchors (entities):
| Anchor | ID example | Table name |
|---|---|---|
| Brand | “Alpine” | brands |
| Flavour | “vanilla” | flavours |
| Friend | “Jason” | friends |
In the real database we’d use proper integer IDs, but this one is a tiny teaching database, so it makes sense to use so-called natural keys which are unique strings. We assume that all your friends have different names. In the textbook table design strategy, each anchor has a corresponding table. The table names are written in the last column.
We don’t need any attributes here, because all information about entities is basically contained in the ID.
But the most interesting part is the list of links. 5NF is mostly about the links.
| Anchor1 : Anchor2 | Cardi- nality |
Sentences | Table or column names |
|---|---|---|---|
| Brand : Flavour | M:N | A Brand produces several Flavours A Flavour is produced by several Brands |
brand_flavours |
| Friend : Brand | M:N | A Friend prefers several Brands A Brand is preferred by several Friends |
friend_brands |
| Friend : Flavour | M:N | A Friend prefers several Flavours A Flavour is preferred by several Friends |
friend_flavours |
A link is a relationship between two anchors. Here we have three links, connecting three anchors into a triangle, as we mentioned:

All three links have M:N cardinality. It is clearly visible because all the sentences contain the word “several”. In the textbook table design strategy, each M:N link has a corresponding two-column table (also known as junction table). The names of the tables are written down in the last column.
Here are three link tables, created based on the logical model. It uses the same dataset as presented in the Decomplexify video.

One thing that we can see here is that Jason likes chocolate, but none of the brands produce it.
A useful exercise for the interested reader would be to write an SQL query that returns the brand offerings that would suit each friend, based on their preferences.
Note that there would exists three more tables: brands, flavours, and friends, containing IDs used in the link tables.

Note that for each entity there is one more row that is not present in any of the link relationships. We know about the Coldflash brand, but we don’t know which flavours it provides, and nobody prefers it. We have a Kiwi flavour which is not produced by any brand and not preferred by anyone. And we have Suzy, whose preferences we have not yet registered.
ABC+D star pattern: the “musicians” example
Now let’s discuss an example from the “5NF: The Missing Use Case” post by Barry Johnson.
Suppose that we want to record information about concerts and musicians that played certain instruments in those concerts. We begin with the three anchors:
| Anchor | ID example | Table name |
|---|---|---|
| Concert | “christmas-2025” | concerts |
| Instrument | “violin” | instruments |
| Musician | “Patricia” | musicians |
Now let’s draft some possible links. This is one of the database modeling examples where natural language may mislead you, so we need to be careful and precise.
In plain English we’d say: “A Musician played an Instrument in a Concert”, but this sentence includes three anchors and not two. Links always connect two anchors. Even though the relational model allows relations with more than two elements, we insist on only using 2-element links because they help you design unambiguous tables.
Here is an example from a real-world concert, listing musicians and their instruments:
- Marc, violin;
- Gilles, violin;
- Vlad, viola;
- Yovan, cello.
You see that several musicians can play an instrument. If you think about it, it’s possible that one musician could play several instruments.
While we’re at it, why are we even building this database? Suppose that we want to track how much money musicians should get paid. Violin players get paid $x; maybe somebody was asked to help out and play some cymbals in the first half and a gong at the end, so they’d be paid $y + $z.
When you see a list of something, very often it would be an anchor. List items could be counted, and you can add one more item to the list: a classic anchor.
We need to find a word for this — let’s try “Performance”:
| Anchor | ID example | Table name |
|---|---|---|
| Performance | <two options possible, see below> | performances |
Finding words like this is often hard, because natural language is ambiguous.
(Note that we use a synthetic integer ID for the performances.)
Let’s try to find links between the four anchors now. A concert is made of individual performances. A performance involves a specific musician playing a specific instrument. Let’s formalize this:
| Anchor1 : Anchor2 | Cardi- nality |
Sentences | Table or column names |
|---|---|---|---|
| Concert : Performance | 1:N | A Concert consists of several Performances A Performance is a part of only one Concert |
performances.concert_id |
| Instrument : Performance | 1:N | An Instrument is played in several Performances A Performance involves only one Instrument |
performances.instrument_id |
| Musician : Performance | 1:N | A Musician may perform several Performances A Performance is performed by only one Musician |
performances.musician_id |
Read the sentences: they may sound a bit awkward, but you need to confirm that they make sense. It’s crucial to make sure that “only one” and “several” is used correctly, because that’s how you define cardinality. Read aloud if needed, or read to somebody else — that’s how you prevent design mistakes.
In this case, all three links have cardinality 1:N.
Here is a diagram that has a distinctive three-pointed star shape:

This is why we call it the ABC+D star pattern. We have three “easy” words: concert, musician, and instrument. But you must also realize that there is an underlying concept of performance. Using a formalized approach, such as the one presented in the “Database Design Book”, helps a lot, making the design process more reliable and error-proof.
Physical table design: primary keys
The concept of normal forms only appears when we talk about physical table design. The ice cream example was simple and straightforward, but the current scenario is a bit more complicated, because we have to talk about uniqueness constraints.
Textbook table design strategy could be applied directly:
- Performance anchor is implemented as “
performances” table; - All three links are 1:N, so they are implemented as columns in that table (“
concert_id”, “musician_id”, and “instrument_id”).
But here is the question: what would be the primary key of that table? In most anchor tables we’d use a synthetic ID column, but here we can notice that the combination of (concert_id, musician_id, instrument_id) must be unique. In awkward semi-formalized English: a musician is paid for performing an instrument only once for a given concert (this follows from business requirements).
If we use a synthetic ID column, we need to add a uniqueness constraint:
CREATE TABLE performances (
id INTEGER NOT NULL PRIMARY KEY,
concert_id INTEGER NOT NULL,
musician_id INTEGER NOT NULL,
instrument_id INTEGER NOT NULL,
UNIQUE (concert_id, musician_id, instrument_id)
);
The data looks like this:

But what the relational theory textbooks want you to know is that you can also design this table by using a composite primary key (which also serves as a uniqueness constraint):
CREATE TABLE performances (
concert_id INTEGER NOT NULL,
musician_id INTEGER NOT NULL,
instrument_id INTEGER NOT NULL,
PRIMARY KEY (concert_id, musician_id, instrument_id)
);
And the data will look like this:

The choice of composite vs synthetic primary key only appears when business requirements assume uniqueness. In other scenarios there is no uniqueness, and you must use the synthetic primary key.
Imagine an online game where users can buy certain items and give them as gifts to other users. The same user can give the same type of item to the same friend multiple times, so there is no uniqueness. We’d still have the same ABC+D star pattern, though.
Let’s go back to the three-column table variant. It’s the same structure as was used in Wikipedia and in many other textbooks and tutorials. We did not need to split it into three two-column tables because the business requirements are different from the “ice cream” example.
More importantly, we did not need to reason in terms of 5NF decomposition at all. We only needed to define business requirements, and construct the physical tables in a straightforward way.
Yet, we’ve seen both types of tables that are discussed in a typical 5NF tutorial:
- three two-column tables (AB-BC-AC triangle);
- one table (ABC+D star):
- three-column if uniqueness is required and we want a composite PK;
- one extra ID if there is no uniqueness or we want a synthetic PK;
Also, we did not need to perform this illogical operation of “splitting tables”, because we just designed all the tables correctly from the beginning. In practice they would not be split or joined as part of the database design process.
Extra chapter: specific ice cream preferences
Both presented patterns are not mutually exclusive. You can have several different links that connect the same anchors in a different way.
Let’s go back to the ice cream example and extend it. Some of our friends are extra picky: they prefer specific flavours from a specific brand, for example kiwi by Coldflash, but no other flavours by Coldflash, and no other kiwi brands.
We need to handle this example by extending the logical model defined above. It’s important to note that the original behavior stays as it is: all the logical schema and existing tables stay as they are. We only add new entries to the logical schema, and we’ll add some new tables.
Having discussed the “musicians” example, we can immediately recognize how to model “Frank loves Coldflash kiwi”. We introduce a new anchor called Preference, and three links arranged as an ABC+D star pattern.
Logical schema is basically a copy-paste of the “musicians” example. Anchors:
| Anchor | ID example | Table name |
|---|---|---|
| Preference | <two options are possible> | preferences |
And links:
| Anchor1 : Anchor2 | Cardi- nality |
Sentences | Table or column names |
|---|---|---|---|
| Friend : Preference | 1:N | A Friend can have several Preferences A Preference belongs to only one Friend |
preferences.friend_id |
| Brand : Preference | 1:N | A Brand is involved in several Preferences A Preference refers to only one Brand |
preferences.brand_id |
| Flavour : Preference | 1:N | A Flavour is involved in several Preferences A Preference refers to only one Flavour |
preferences.flavour_id |
Same as with “musicians” example, we have the option of synthetic ID or a composite PK. In this example each (Friend, Brand, Flavour) combination must be unique, so we can use a composite PK if we want.
From those new elements we create a new “preferences” table, that exists alongside the three other tables: “brand_flavours”, “friend_brands”, and “friend_flavours”. We still capture both the information about broader preferences and about specific preferences, using two sets of tables, designed accordingly.
The point of this section is to reinforce the idea that both patterns are equally useful, one is not a replacement for the other. Sometimes they could even be used simultaneously. Which of them is applicable depends only on your business requirements.
Extra chapter: skill issue
Let’s go back to the “musicians” example. If you read the “5NF: The Missing Use Case” post, you’ll see that at some point an idea of “skill” is introduced. Basically it’s a link that connects a Musician and an Instrument. This idea makes sense in natural language: “Alice can play violin, so she gets invited to play in the classical concerts”.
The data is modeled via this Skill in the text like this:

This differs from the model discussed in this post. Why? I’ve been thinking about this exact issue for quite some time, and came to look at the problem from a different angle.
I think that “A Musician can play an Instrument” is a valid link, but it corresponds to a different business process. We discussed capturing the information about who played which instrument in specific concerts.
But if you think about that, it’s possible to play an instrument in the concert without necessarily declaring that you have a skill to play it.
Consider the quote from that post:
“Looking at our example a bit further, for instance, it’s easy to conceive of a SkillRating in [Skill]. This could be used to help decide which [Musician] instance(s), with corresponding [Instrument] instance(s), will be invited to be part of a [Concert] instance.”
It makes total sense to extend our original logical model by capturing the information about which musicians are willing to be invited to the concerts to play certain instruments. Here is the link:
| Anchor1 : Anchor2 | Cardi- nality |
Sentences | Table or column names |
|---|---|---|---|
| Instrument : Musician | M:N | An Instrument can be played by several Musicians A Musician can play several Instruments |
musician_skills |
Note that this link has M:N cardinality. Because of that, we had to add another two-column table (also known as a junction table). It is completely independent from the “performances” table, in both ways.
First, we can record that Alice can play violin, but we don’t have any records about her playing anything. This is normal: maybe she just started, or our own dataset is far from complete.
Second, somebody can play in a concert, but they do not necessarily want to be invited to further concerts. Maybe they’ve retired and are only doing exceptions for special occasions.
Or, somebody needs to play hammer in Mahler’s Symphony No. 6 (there are only two or three blows in the entire piece). It’s probably not necessary to have a “skill” for that, but you certainly want to be mentioned in the lineup because it’s such an important part.
So, it’s a question of what you want to implement. Is it a sort of marketplace for musicians? Then you need “musician_skills”. Or, if you want to have a record of who played when — that’s what “performances” table is for. Maybe you want both. It’s up to you to decide.
Additional reading
Here are three more texts that may be useful if you want to understand more about the history of normal forms.
- William Kent “A Simple Guide to Five Normal Forms in Relational Database Theory” (1982)
5NF is discussed in chapter 4.2. We already used this text to better understand historical aspects of 4NF definition. There is a confusing sentence there:
“Roughly speaking, we may say that a record type is in fifth normal form when its information content cannot be reconstructed from several smaller record types, i.e., from record types each having fewer fields than the original record.”
It seems that there are too many negatives here, and it’s hard to follow. But I won’t attempt to edit this sentence: my goal is to present a completely different way of reaching the same result.
- Andrei Pall “Database Normalization” (2020)
This post contains a beautiful table that shows the full range of normal forms (including non-numerical ones), explaining which specific criteria motivated each of them.
- Hugh Darwen “An Introduction to Relational Database Theory” (2010)
The book is freely downloadable. On page 11 you can find “Note to Teachers” which alone is hugely enlightening. It says that 4NF is just a simple special case of 5NF, which is a useful piece of spiritual guidance.
Another quote:
“Also in Chapter 7, [I have aimed] to study the normal forms in reverse order to that in which they are normally presented. I put 6NF first because it is the simplest and also the most extreme. More important to me was to deal with 5NF and join dependencies before BCNF and functional dependencies (though I do leave to the end discussion of those pathological cases where BCNF is satisfied but not 5NF).”
This was a big confirmation that my own approach to teaching normal forms is plausible — I also think that 6NF is the foundational form.
5NF is presented in Section 7.4 (page 185), using the “Wives of Henry VIII” example. This is not a very good example because it is rather misaligned with the typical real-world scenarios. Nevertheless, the book, as well as other materials by Darwen, is a treasure.
“Database Design Book” (2025)
Learn how to get from business requirements to a database schema
If this post was useful, you may find this book useful too.
Table of contents and sample chapters
Book length: 145 pages, ~32.000 words. Available in both PDF and in EPUB format.
Conclusion
Traditionally 5NF is presented with an unmotivated example of a 3-column table. This table is then “split” into three 2-column tables. The splitting operation is not very well motivated and does not happen in practice.
This makes 5NF harder to understand and encourages the air of mysticism around the higher normal forms.
We propose a more straightforward way, better aligned with the practice of database design. We start from the business requirements and logical schema. Based on that, two patterns of physical table design follow naturally: AB-BC-AC triangle, or an ABC+D star pattern. Both patterns, when designed this way, are fully normalized: there are no anomalies and no redundancy.
I’d be happy to hear your feedback:
Alexey Makhotkin
squadette@gmail.com.

