The options
Before picking, you should look at all the options.
Enums
For highly static definitions it's generally optimial to use an enumeration type (CREATE TYPE ... AS ENUM (...)
), with the caveat that PostgreSQL does not currently support removing values from enumerations. You can add/insert new values though.
Enums are great if you're defining something you don't expect to ever change.
Apps can easily look up the allowed values for an enum at runtime, but it's more typical to just code the enum into the application.
Look-up tables
Integer-keyed look-up tables are generally fine. If the string keys are short, there isn't a big advantage to using integers instead.
It's often both more efficient and more readable to use a "char"
key, which is PostgreSQL's extension type for a single character. The double-quotes are significant and required when naming this type. (I don't know if Rails supports "char"
though). There's no detectable performance difference over integer. Note that this is different to char
, (no quotes) which is a horrible legacy space-padded character type you should never use.
String-keyed lookup tables
You get a form of validation by using a key into a lookup table, but nothing forces that key to be a number rather than a short string. (Except possibly Rails, which is not noted for dealing well with tables that don't follow its convention of having a synthetic integer primary key.
It's perfectly valid to have a one-column lookup table with a string key.
Domains
With a string key you don't strictly even need the lookup table. You can also use a DOMAIN
type, e.g.
CREATE DOMAIN mytype AS varchar(10)
CONSTRAINT mytype_legal_values
CHECK (value IS NOT NULL AND value IN ('fred','bob','joe'));
Unlike an enum, you can re-define the constraint of a domain to remove currently-legal values and PostgreSQL will re-check all uses of the domain to make sure the change is OK before applying it.
It's difficult for an application to look up the list of allowed values for a domain at runtime.
What to choose?
Which option you choose depends on many factors:
- How well the tools/frameworks you use work with each feature;
- How often you expect the lookup/enum to change;
- What you're comfortable with / personal preference
For example, many ORMs deal very poorly with domains and enums, often requiring custom type-handling code because they don't understand the general concept of "domain type" or "enumerated type". I can't speak for Rails specifically in this regard.
Performance
In pure performance terms ... it depends.
An integer (e.g. a key to a lookup table) is stored as a fixed width field in PostgreSQL, with no TOAST table. It's fast to compare. It doesn't need any encoding-aware collation handling, etc. It's fast and simple.
However, for lookup table use you're incurring the cost of foreign key constraint lookups on inserts, and your selects have to do an additional join against the lookup table to get the "real" value, so it's far from free for write-heavy tables. Removing values is possible, but expensive due to the need to re-validate foreign key constraints.
An enum is just an integer with validation. You're not firing foreign key constraint triggers, so it's quite a lot faster for insert/update/delete. For SELECT there's just a cheap lookup of a cached system catalog table to get the name, and the extra data used to transfer the enum name instead of integer value. In practice it's the fastest option in almost all cases. However, PostgreSQL doesn't currently support dropping values from enums. (Patches welcome)
A string is stored as a varlena
and may require a TOAST table. However, if it's always stored inline (which you can force by limiting the string's length and/or setting the column's storage type to "MAIN") then there's no extra lookup to get the value. PostgreSQL still has to do correct text-encoding collations on sorts, though, which can be significantly slower than sorting integers.
You can mitigate the impact of collation by defining the column as COLLATE "C"
, forcing byte-oriented POSIX collations. I don't know whether Rails supports it natively, but you can tack the clause on as extra SQL, and it won't care about the column collation in day-to-day operation. You'll want a CHECK
constraint to restrict the column to allowable values, e.g.
CREATE TABLE mytable (
... blah ....,
thekey varchar(5)
COLLATE "C"
CONSTRAINT thekey_allowed_value
CHECK (thekey IN ('fred','bob','joe'))
NOT NULL,
... blah ...
);
ALTER TABLE mytable ALTER COLUMN thekey SET STORAGE MAIN;
so efficient string keys are a bit fiddlier, but not really problematic.
If you use a foreign key to a string-keyed lookup table instead of the CHECK constraint you get a single central value store, but you then have all the disadvantages of a lookup table for insert/update foreign key check costs.
Domains over strings are just a convenience, and are identical in performance terms to using a string with a check constraint. They give you the advantage of central control over the allowed values list, without the need for a foreign key lookup. However, applications can't easily determine the allowed values list from a DOMAIN at runtime, unlike an enum or lookup table.
String keys in the default collation with no constraints can be inefficient. If the key is big it might get stored out-of-line compressed in a TOAST table, where an extra seek may be required to fetch the actual value. Additional CPU time is required to perform locale-aware collation calculations if you ORDER BY
the key, but not if you simply test for equality.
In practice the difference is generally minimal.
ActiveRecord::Enum
. – toro2k Oct 14 '14 at 18:15