I recently got a chance to create a new Django app with the current stable version which exposes some Postgresql specific data types for models (docs) and got excited to have an Array. According the Real Python folks, this could be a way to introduce a tagging system (you can read the post here).
This implementation would lack referential integrity between rows (something that a tagging system should probably enforce at the database level because that's what RDMSs do) but I think a more intriguing application would be caching references to rows that span a couple of tables.
The idea here would be something like (in pseudo-django):
class Customer(models.Model):
# additional stuff
returned_products = ArrayField(type=integer)
Where the returned_products
array is a way of storing references to products which I would normally access with something like:
(Customer.objects.get(pk=123)
.order_set
.filter(status='closed')
.products
.filter(status='returned'))
Not exactly clean but since that's a queryset it could be cached in some way but returns are something that don't really shrink, they just grow over time (in principle to arbitrary length, yikes!). Imagine storing the values of that queryset in the returned_products
array so that when, for example, the user visits their history, they can see their returns across all orders with the overhead of something like:
(Product.objects
.filter(id__in=request.user.customer.returned_products))
... instead of having to traverse through a bunch of tables.
Is this a sensible application of this form of storage or is it more sensible to leverage some additional caching layer?