Many yes/no attributes: table design study
Author: Alexey Makhotkin squadette@gmail.com.
I wanted to demonstrate the relationship between the logical model and a physical model. We’re going to design a commonly seen use case: many yes/no attributes of a single anchor (in our case, Restaurant). Then we’ll discuss how the physical tables would be designed. We’ll see that sometimes physical design strategy changes as the system becomes more mature. At the same time, logical design elements never change if the business requirement is still relevant.
We will briefly discuss three different physical table design strategies: a table per anchor, side table, and Entity-Attribute-Value (EAV).
This is a draft chapter from the upcoming book (https://databasedesignbook.com/).
Table of contents
Logical design: basic attributes
Physical design strategy 1: anchor per table
Physical design strategy 2: side table
Physical design strategy 3: Entity-Attribute-Value (EAV)
Description of the problem
First, let’s write down an informal description of the problem. Suppose that we are building a company that wants to become the largest source of information about restaurants. We want our users to find a restaurant according to their needs: a certain location, serving a certain cuisine, etc. We quickly learn that there are many yes/no attributes that are important for restaurant goers:
-
Is it pet-friendly?
-
Does it have a kids play area?
-
Is it vegan-only?
-
Is it halal?
-
Is it wheelchair-accessible?
-
Does it have a terrace?
-
Is it on the rooftop?
-
Is it romantic?
And so on, and so on.
The list is far from being complete: our product owners would always be listening to the users, looking for other interesting pieces of information that could sometimes be critical for the users. That’s fine: our database design approach just assumes that there would always be new and changed requirements. So we can go ahead and model whatever we know right now. When the new attributes are introduced we’ll just add them to the logical model and then extend the physical schema.
Logical design: anchors
Let’s design only the smallest subset of the logical model that allows us to discuss the topic at hand. First, we have the Restaurant anchor.
Anchor | ID example | Physical table |
---|---|---|
Restaurant | 1, 2, 3… | |
… | … | … |
To confirm that this is indeed a proper anchor, We can write down validating sentences: “How many restaurants are in our database?” and “This form adds a new restaurant to our database”. Those sentences make sense, so our anchor is correct. (This technique is described in the book.)
We leave the “Physical table” blank, to be discussed later.
Logical design: basic attributes
Just to make the design more believable, let’s introduce just one attribute: restaurant name. We won’t even be using it in the text, we just want the tables to look more familiar (and to plug De Klos, of course).
Anchor | Question | Logical type | Example value | Physical column | Physical type |
---|---|---|---|---|---|
Restaurant | What is the name of the Restaurant? | string | “De Klos” | ||
We leave “Physical column” and “Physical type” blank, to be discussed later.
Logical design: attributes
We have eight yes/no attributes in the current problem definition. Let’s write them all down:
Anchor | Question | Logical type | Example value | Physical column | Physical type |
---|---|---|---|---|---|
Restaurant | Is the Restaurant pet-friendly? | yes/no | yes | ||
Restaurant | Does the Restaurant have a kids play area? | yes/no | no | ||
Restaurant | Is the Restaurant vegan-only? | yes/no | no | ||
Restaurant | Is the Restaurant halal? | yes/no | no | ||
Restaurant | Is the Restaurant wheelchair-accessible? | yes/no | no | ||
Restaurant | Does the Restaurant have a terrace? | yes/no | yes | ||
Restaurant | Is the Restaurant on a rooftop? | yes/no | no | ||
Restaurant | Is the Restaurant romantic? | yes/no | no | ||
In the logical model, we write down every attribute that we are interested in. There is no way to present some sort of grab-bag of attributes that you don’t care about. If you don’t care about an attribute, you just don’t include it into the logical model. There is nothing wrong with that: somebody else, who does care, will.
We will see that this on the physical level this approach may be different.
We wrote down eight attributes. But we know that as time goes by, more attributes will be introduced. For a complex business domain (and the restaurant industry is certainly complex), we can have hundreds of yes/no attributes. (We’re not even talking about other sorts of data: different types of attributes, links, more anchors etc.)
We’re done with the logical model, now it’s time to design physical tables.
Physical design strategy 1: anchor per table
Let’s take the most common design strategy: one anchor per table, and apply it consistently. We’re going to have one table for our anchor, and 9 table columns (one for the name, and eight more for the attributes).
We assume that we use a popular relational database, such as MySQL or Postgres.
For the anchor, let’s choose a table name:
Anchor | ID example | Physical table |
---|---|---|
Restaurant | 1, 2, 3… | restaurants |
… | … | … |
For the attributes, let’s fill in the “Physical column” and “Physical type” (the left half of the table does not change):
Anchor | Question | Logical type | Example value | Physical column | Physical type |
---|---|---|---|---|---|
Restaurant | What is the name of the Restaurant? | string | “De Klos” | restaurants.name |
VARCHAR(128) NOT NULL |
Restaurant | Is the Restaurant pet-friendly? | yes/no | yes | restaurants. is_pet_friendly |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Does the Restaurant have a kids play area? | yes/no | no | restaurants. has_play_area |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant vegan-only? | yes/no | no | restaurants. is_vegan_only |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant halal? | yes/no | no | restaurants. is_halal |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant wheelchair-accessible? | yes/no | no | restaurants. is_wheelchair_accessible |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Does the Restaurant have a terrace? | yes/no | yes | restaurants. has_terrace |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant on a rooftop? | yes/no | no | restaurants. is_rooftop |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant romantic? | yes/no | no | restaurants. is_romantic |
SMALLINT NOT NULL DEFAULT 0 |
We chose some short but readable names for table columns. We chose non-surprising physical data types: VARCHAR
for the string attribute, and SMALLINT
for the yes/no attribute (1 means yes, 0 means no, other values are unused). Most of the table is quite repetitive (but very useful for people).
Let’s write down CREATE TABLE
operators for our table:
CREATE TABLE restaurants (
id INTEGER NOT NULL AUTO_INCREMENT,
name VARCHAR(128) NOT NULL,
is_pet_friendly SMALLINT NOT NULL DEFAULT 0,
has_play_area SMALLINT NOT NULL DEFAULT 0,
is_vegan_only SMALLINT NOT NULL DEFAULT 0,
is_halal SMALLINT NOT NULL DEFAULT 0,
is_wheelchair_accessible SMALLINT NOT NULL DEFAULT 0,
has_terrace SMALLINT NOT NULL DEFAULT 0,
is_rooftop SMALLINT NOT NULL DEFAULT 0,
is_romantic SMALLINT NOT NULL DEFAULT 0
);
So far so good. Why would we ever need other table design strategies? This one is simple: it directly corresponds to the logical model.
But remember that we’re going to have many more attributes, maybe several hundreds. Technically we can add every single one into our table, but it’s going to become less usable for people. People are used to write “SELECT * FROM restaurants WHERE ...
”, and having 300+ columns, mostly filled with zeroes and ones is something that many people won’t appreciate.
Physical design strategy 2: side table
Let’s try a different approach that is also very common. Let’s put all those yes/no attributes into the side table, let’s call it “restaurant_flags
”. The main table (“restaurants
”) is going to stay pretty narrow.
Anchor | Question | Logical type | Example value | Physical column | Physical type |
---|---|---|---|---|---|
Restaurant | What is the name of the Restaurant? | string | “De Klos” | restaurants.name |
VARCHAR(128) NOT NULL |
Restaurant | Is the Restaurant pet-friendly? | yes/no | yes | restaurant_flags. is_pet_friendly |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Does the Restaurant have a kids play area? | yes/no | no | restaurant_flags. has_play_area |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant vegan-only? | yes/no | no | restaurant_flags. is_vegan_only |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant halal? | yes/no | no | restaurant_flags. is_halal |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant wheelchair-accessible? | yes/no | no | restaurant_flags. is_wheelchair_accessible |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Does the Restaurant have a terrace? | yes/no | yes | restaurant_flags. has_terrace |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant on a rooftop? | yes/no | no | restaurant_flags. is_rooftop |
SMALLINT NOT NULL DEFAULT 0 |
Restaurant | Is the Restaurant romantic? | yes/no | no | restaurant_flags. is_romantic |
SMALLINT NOT NULL DEFAULT 0 |
This table looks like the previous one, only the table name has changed.
Here is how this table schema looks like:
CREATE TABLE restaurants (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(128) NOT NULL
);
CREATE TABLE restaurant_flags (
restaurant_id INTEGER NOT NULL PRIMARY KEY,
is_pet_friendly SMALLINT NOT NULL DEFAULT 0,
has_play_area SMALLINT NOT NULL DEFAULT 0,
is_vegan_only SMALLINT NOT NULL DEFAULT 0,
is_halal SMALLINT NOT NULL DEFAULT 0,
is_wheelchair_accessible SMALLINT NOT NULL DEFAULT 0,
has_terrace SMALLINT NOT NULL DEFAULT 0,
is_rooftop SMALLINT NOT NULL DEFAULT 0,
is_romantic SMALLINT NOT NULL DEFAULT 0
);
The “restaurant_flags.restaurant_id
” column contains the same IDs as “restaurant.id
”. In terms of relational model, it is a foreign key. However, we won’t bother with specifying foreign keys here.
Nice. We now realize that we’ve elegantly swept under the carpet our original concern: around 300 more attributes are incoming. Our “restaurants
” table is neat, but now the other table is going to become problematic in the same way as before.
In addition to the relative unreadability of this design, we have another technical problem. Each time when we want to add a new attribute, we must execute the “ALTER TABLE” operator to add a new physical column. If the table has a lot of rows, and it is actively queried, this may cause locking for arbitrary periods of time (seconds, maybe minutes, maybe hours). This is not necessarily bad, but this is a concern. We will discuss this in more detail in the corresponding chapter.
Are there any other alternative designs?
Physical design strategy 3: Entity-Attribute-Value (EAV)
Let’s step back and create a physical table that looks like this:
CREATE TABLE restaurant_yesno_attributes (
restaurant_id INTEGER NOT NULL,
attr_name VARCHAR(32) NOT NULL,
PRIMARY KEY (id, attr_name),
INDEX (attr_name)
);
This is a table with two columns: a restaurant ID and the name of the attribute. Here is an example of data that we could store about some restaurants (say, with ID=1 and ID=3):
restaurant_id |
attr_name |
---|---|
1 | “is_pet_friendly” |
1 | “has_terrace” |
3 | “is_vegan_only” |
… | … |
What this set of rows means is that:
-
restaurant with ID=1 is pet-friendly and has a terrace;
-
restaurant with ID=3 is vegan-only;
-
etc.
This design allows us to add arbitrary yes/no attributes of the Restaurant without ever changing the structure of this table. You only have to choose a unique text string that corresponds to each attribute.
To set the value of the attribute to “yes”, you insert a row to this table. To set the value to “no” (or to unset the value), you delete a row from this table.
How to query these sorts of tables is left as an exercise to the reader (and would be explained in a separate chapter).
Also, we won’t go in depth about EAV tables design here. For example, if you need to store attributes of other data types, such as string, then you will have to add a third table column, attr_value
.
The point of this section is that a physical model may look quite unlike the logical model. Let’s see how we would document this design in our list of logical attributes (we show only a single yes/no attribute for brevity):
Anchor | Question | Logical type | Example value | Physical column | Physical type |
---|---|---|---|---|---|
Restaurant | Is the Restaurant pet-friendly? | yes/no | yes | SELECT * FROM restaurant_yesno_attributes WHERE ID = ? AND attr_name = 'is_pet_friendly' |
Row existence |
… | … | … | … | … |
… |
Here we must use an entire human-readable SQL statement to document how the attribute value is stored physically.
Also, we have an unusual physical type. In this particular case there is no column that contains the value. The value is represented by the presence or absence of a certain row in a table. So, we use “row existence”.
Remember that the list of attributes is primarily for human consumption. Implementing machine-readable logical catalog is outside of the scope of this text.
Game of Tables
We looked at three different physical table design strategies for yes/no attributes. But what would we choose if we were actually designing a website that lets you search for restaurants?
In this text we have eight attributes that were introduced in the very beginning all together. We say that there are going to be a few hundreds more, but we don’t really know when.
In reality, what would probably happen is that those attributes would be introduced one by one, with substantial time passing between each: weeks, maybe months. First somebody would request a “vegan/non-vegan” selector; then later somebody with kids would ask for the play area, and so on and so on.
As each new attribute arrives, we must make a decision: do we add columns to the main table (strategy 1)? Do we introduce a side table (strategy 2)? Are we fed up already and want to introduce a single EAV table so that we no longer have to worry about ALTER TABLE (strategy 3)?
Also, at every such point we can only guess what’s going to happen in the future. When we first implement the “vegan”/”non-vegan” distinction, we’re probably not even sure if our startup would still exist in few months, let alone how many restaurant attributes we are going to have eventually. So we would probably just quickly add a column to the main table without much thinking. Same for the second attribute.
Later, when the main table becomes wide enough to become concerning, we may decide to introduce the side table. When a couple of years later it becomes comically wide too, and we introduce a EAV table. If we started in the last decade then we’d probably use a JSON-typed table column instead.
We call this Game of Tables.
The rules are simple: at each turn a new attribute is introduced, and you must make a decision on how to store this attribute. You have maybe half a dozen physical design strategies, and you must choose one. Sometimes a choice is easy, sometimes it’s controversial, sometimes it is enforced by the engineering guide. You can’t not play.
(Also, your adversary is Date’s demon — we’ll talk about it in a separate chapter.)
Learning the Game of Tables
The ultimate goal of the Database Design book is to teach you the Game of Tables.
It is quite like the game of chess: you can learn the moves, you can learn strategy and tactics, but when you’re at the chessboard, nobody can tell you which moves to make: you must play.
Same with database design. We introduce a logical model that helps you a lot by clarifying what happens “on the chessboard”. We can explain the physical design strategies that are possible. We can explain the driving forces behind each design strategy: both technical and organizational. We can discuss use cases and examples. But at some point you will have to make a technical decision (make a move).
The good news though is that it’s often possible to change the decision by doing a database migration. This may be not easy (or even super hard) in a mature system, but at least there is a technical possibility. We’ll discuss database migrations in a separate chapter.
Conclusion
The goal of this post was to show a relationship between logical and physical models. We looked at three different physical design strategies: one table per anchor, side table and Entity-Attribute-Value. Those strategies are like chess moves: sometimes it’s clear which move is optimal, sometimes you find out that it would have been better if you were to play differently.
Logical model helps you to remove a lot of uncertainty before you make your move.
This is a draft chapter from the upcoming book, “Database Design using Minimal Modeling” (https://databasedesignbook.com/). Leave your email address to subscribe to updates about the book.
Next: we can also look at the design process from a different point of view: begin from physical tables and see how a logical model can be deduced from them. This is a topic for one of the followup posts.