Primary keys
Should I use surrogate keys or natural keys? What are they and which setup will give me the least amount of development issues?
What are natural keys?
A column (or a combination of columns) that has business meaning and is inherently unique.
Examples of natural keys
Table | Natural Key Example |
---|---|
Users | email |
Products | SKU (Stock Keeping Unit) |
Countries | ISO Country Code (US , IN , UK ) |
Employees | SSN (Social Security Number) |
What are surrogate keys?
Unique identifiers that are auto-generated and have no business meaning.
Types
Type | Example and use case |
---|---|
Auto-increment | Generated sequential numbers. |
UUID (Universally Unique Identifier) | Generated 128-bit random string e.g. 550e8400-e29b-41d4-a716-446655440000 . Uses too much space and makes inserts, searches and joins slower (B-tree indexes). |
TSID (Time-Sorted Unique Identifier) | 64-bit where the first 42-bits are time based and the rest is random. |
Surragate keys recommended usage
Rule of thumb
Use auto-incremented keys in most cases. If having predictable keys is an issue e.g. if you don't want to leak how many items there are.
Bridge/junction tables
Let's say you have a junction table of student_course with surragate foreign keys student_id and course_id. Instead of having composite surrogate keys, just create another key called student_course_id so that joins aren't unnecessarily complicated.
Why use surrogate keys instead of business keys?
- Uniqueness: There are cases where that natural key isn't truely unique and duplicates exist usually by error (e.g. South African IDs).
- Stability: Natural keys can change causing the need to update that key on various tables where it is used as a Foreign Key. Examples:
- Email on User table: A user may have changed their email or want to move their account to a different email.
- Country code in Country table: In 1993 Czechoslovakia (CS) split into Czech Republic (CZ) and Slovakia (SK). No more CS.
- Easier foreign key relationships and joins: No compound keys (also makes queries faster) and names are predictable from table names.