Is PostgreSQL appropriate for processing this large but simple dataset? [closed]

Question

I have a dataset I'm not sure how to store. The structure is simple: 30 numeric attributes identified by about 15 billion x, y, and t values. We're expecting ~17k t values and maybe 90k x/y combinations, though it could be some combination that gives us 20 million records in the end.

The processing involves retrieving 1-10 columns for each x and y pair and storing various calculated numeric values. Are we nearing/passing the limit of fast response times for Postgres with this many rows?

The processing is all done in-house by one person and shouldn't need to happen more than a couple dozen times as we settle on what summaries we want. So we're not worried about a high number of writes or connections, strict security, making updates to existing records, table relations, losing data because of network connection issues. Basically, we're not concerned about the kinds of things I understand the ACID part of a RDBMS like Postgres brings to the table. But we also don't need to replicate or distribute the data, high availability, change the schema on the fly, or manage an unusual number of writes (say a dozen for each of the 90k x/y pairs)- the kinds of things I understand NoSQL DBs offer.

So I guess the real issue is read-speed out of Postgres for a table of this size and the real question is whether there's a better storage mechanism for what we need to accomplish. Does the answer change if we have 40 billion records? 60?

The read speed depends on so many factors, that it's impossible to give a good answer. If you are only selecting a few rows based on an index the speed is most probably good enough even on low-end hardware. If you are selecting a large number of rows then you will be I/O bound and you'll need really fast harddisk - but that should be doable as well. — a_horse_with_no_name, Aug 8 at 22:08

sunil · Answer 1 · 2013-08-11 18:53:12Z

Seems like your x,y is a fixed pair. So you can combine them to form a single key. If all your access is based on a key, then the NoSQL dbs can fit your bill. Some NoSQL dbs offer range queries too but that is not their biggest strength.

The limitation with the basic/standard RDBMS systems are that they are monolithic. You need a powerful machine to store your 15/40/60 billion objects on a single machine. You will also end up paying cost in terms of performance for the full ACID capabilities of the regular RDBMS systems even though you know that you dont need them.

NoSQL dbs which are inherently designed keeping distributed systems in mind can help greatly here. When you want to grow from 15->40->60 billion objects, probably you just have throw more machines into the cluster. When the data gets sharded, the amount of data stored on each machine is less (making the index sizes etc smaller). This can give you better read performance.

asked	3 months ago
viewed	63 times
active	3 months ago

Is PostgreSQL appropriate for processing this large but simple dataset? [closed]

closed as primarily opinion-based by Max Vernon, Mark Storey-Smith, StanleyJohns, dezso, Paul White Aug 13 at 4:25

1 Answer

Not the answer you're looking for? Browse other questions tagged postgresql database-recommendation nosql postgresql-9.2 or ask your own question.

Is PostgreSQL appropriate for processing this large but simple dataset? [closed]

closed as primarily opinion-based by Max Vernon, Mark Storey-Smith, StanleyJohns, dezso, Paul White Aug 13 at 4:25

1 Answer

Not the answer you're looking for? Browse other questions tagged postgresql database-recommendation nosql postgresql-9.2 or ask your own question.

Related