Primary keys

Should I use surrogate keys or natural keys? What are they and which setup will give me the least amount of development issues?

Primary keys
Photo by Aneta Pawlik / Unsplash

What are natural keys?

A column (or a combination of columns) that has business meaning and is inherently unique.

Examples of natural keys

TableNatural Key Example
Usersemail
ProductsSKU (Stock Keeping Unit)
CountriesISO Country Code (US, IN, UK)
EmployeesSSN (Social Security Number)

What are surrogate keys?

Unique identifiers that are auto-generated and have no business meaning.

Types

TypeExample and use case
Auto-incrementGenerated sequential numbers.
UUID (Universally Unique Identifier)Generated 128-bit random string e.g. 550e8400-e29b-41d4-a716-446655440000. Uses too much space and makes inserts, searches and joins slower (B-tree indexes).
TSID (Time-Sorted Unique Identifier)64-bit where the first 42-bits are time based and the rest is random.

Rule of thumb

Use auto-incremented keys in most cases. If having predictable keys is an issue e.g. if you don't want to leak how many items there are.

Bridge/junction tables

Let's say you have a junction table of student_course with surragate foreign keys student_id and course_id. Instead of having composite surrogate keys, just create another key called student_course_id so that joins aren't unnecessarily complicated.

Why use surrogate keys instead of business keys?

  • Uniqueness: There are cases where that natural key isn't truely unique and duplicates exist usually by error (e.g. South African IDs).
  • Stability: Natural keys can change causing the need to update that key on various tables where it is used as a Foreign Key. Examples:
    • Email on User table: A user may have changed their email or want to move their account to a different email.
    • Country code in Country table: In 1993 Czechoslovakia (CS) split into Czech Republic (CZ) and Slovakia (SK). No more CS.
  • Easier foreign key relationships and joins: No compound keys (also makes queries faster) and names are predictable from table names.