Using Postgres 9.5, how can I query a table with a JSONB column, joining two JSON fields, crucially utilising a GIN index to maximise performance?

I'm prototyping a system which can accommodate different schemas/data models for different customers on a single platform. Entity-Attribute-Value (EAV) is a common pattern for this and I'm keen to benchmark how well Postgres can handle queries on entities stored using its JSONB data type.

I have a single table which stores entities of different type e.g. customer, sales transaction.

CREATE TABLE entity_jsonb (
    id uuid PRIMARY KEY,
    type text,
    created_at timestamp without time zone,
    properties jsonb,
    org_id integer
);
CREATE INDEX entity_jsonb_created_at_idx ON entity_jsonb USING btree (created_at);
CREATE INDEX entity_jsonb_org_id_idx ON entity_jsonb USING btree (org_id);
CREATE INDEX entity_jsonb_type_idx ON entity_jsonb USING btree (type);
CREATE INDEX entity_jsonb_properties_idx ON entity_jsonb USING gin (properties);

I have a customer record in this table with type 'customer' and properties:

{"uuid": "8f8896c7-f41c-49f7-ad6e-4613f7b51a23", "email": "[email protected]", "lastname": "McCarthy", "createdAt": "May 27, 2015 12:06:45 PM", "firstname": "Nathan"}

I have a sales transaction record in this table with type 'sales_transaction' and properties:

{"uuid": "54243d48-e69f-4bb6-ab33-6defb8a0f626", "amount": 99817, "status": 0, "paymentType": 1, "currencyCode": "USD", "customerUuid": "8f8896c7-f41c-49f7-ad6e-4613f7b51a23", "transactionId": "471170"}

How can I take advantage of the GIN index (operators: @>, ?, ?&, ?|) to query transactions efficiently, but joining with customers to return data from both e.g. populating a list of transactions from the last 7 days, showing customer name as well as transaction amount?

I've tried this query to find a subset of transaction objects (for amount = 99817) and related customer details:

SELECT t.properties AS transaction, c.properties AS customer 
FROM entity_jsonb t 
JOIN entity_jsonb c ON (c.properties->>'uuid' = t.properties->>'customerUuid' AND c.type = 'customer') 
WHERE t.type = 'sales_transaction' AND t.properties @> '{"amount" : 99817}';

It works but it's not using the GIN index i.e. not as fast as I'd like it to be. I tried this approach (using the GIN @> containment operator) but it doesn't include the customer details. What am I doing wrong?

SELECT t.properties AS transaction, c.properties AS customer FROM entity_jsonb t LEFT JOIN entity_jsonb c ON (c.properties @> json_build_array(json_build_object('uuid', t.properties->'customerUuid'))::jsonb AND c.type = 'customer') WHERE t.type = 'sales_transaction' AND t.properties @> '{"amount" : 99817}';

I understand that this is not an optimum solution compared with a traditional relational design but I'm interested to see how efficient the query can be purely storing the entity data as JSON.

share|improve this question

My query wasn't far off. I didn't need json_build_array. The query now runs 10x faster than the version which does't use the GIN index, for a dataset of 100k customers:

SELECT t.properties AS transaction, c.properties AS customer 
FROM entity_jsonb t 
JOIN entity_jsonb c ON (c.properties @> json_build_object('uuid', t.properties->'customerUuid')::jsonb AND c.type = 'customer') 
WHERE t.type = 'sales_transaction' AND t.properties @> '{"amount" : 99817}';
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.