Take the 2-minute tour ×
Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I approach you all humbly as one who is NOT a DBA, and I'm sure that my question is fraught with conceptual shortcomings and "it depends on" land mines. I'm also pretty sure that all of you who choose to answer are going to want a lot more in the way of specifics than I can currently deliver.

That said, I'm curious about the following scenario in general:

  • Say that I have two non-trivial queries.
  • Query 1 requires 2 minutes to complete on average.
  • Query 2 requires 5 minutes to complete on average.

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

  • Under what conditions would I expect to see a speedup? (Total time < 7 minutes)
  • Under what conditions would I expect to see a slowdown? (Total time > 7 minutes)

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Again, I know it's probably impossible to answer the question precisely without knowing specifics (which I don't have.) I'm looking for some general guidelines to think about when asking the following questions:

  • Under what circumstances do concurrent queries result in an overall speedup?
  • Under what circumstances do concurrent queries result in an overall slowdown?
share|improve this question

1 Answer 1

up vote 1 down vote accepted

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

If they use unrelated data sets, then yes.

If they share a data set, and the cache is cold for the first query and the query is mostly I/O bound, then the second one might complete in moments. You need to consider caching effects when dealing with performance analysis and query timing.

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

"It depends".

If they were both using sequential scans of the same table then in PostgreSQL it'd be a huge performance win because of its support for synchronized sequential scans.

If they shared the same indexes then they'd likely benefit from each others' reads in to cache.

If they're independent and touch different data then they might compete for I/O bandwidth, in which case they might take the same amount of time as running sequentially. If the I/O subsystem benefits from concurrency (higher net throughput with more clients) then the total time might be less. If the I/O subsystem handles concurrency poorly then they might take longer than running them sequentially. Or they might not be I/O bound at all, in which case if there's a free CPU for each they could well execute as if the other wasn't running at all.

It depends a great deal on the hardware and system configuration, the data set, and on the queries themselves.

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Yes, that'd very likely slow things down for a number of reasons.

  • PostgreSQL's own overheads in inter-process coordination, transaction and lock management, buffer management, etc. This can be quite a big cost, and PostgreSQL isn't really designed for high client counts - it works better if you queue work.

  • Competition for working memory, cache, etc.

  • OS scheduling overhead as it juggles 1000 competing processes all wanting time slices. Pretty minor these days, modern OSes have fast schedulers.

  • I/O thrashing. Most I/O systems have a peak performance client count. Sometimes it's 1, i.e. it's best with only one client, but it's often higher. Sometimes performance decreases again above the threshold. Sometimes it just reaches a plateau.

share|improve this answer
    
This is exactly the kind of explanation I was looking for. Clear, succinct, informative. Thanks! –  Aaron Johnson Jul 25 '14 at 14:38

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.