Skip to content

cockroachdb/cockroach

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

113809: kvstreamer: add limit to how many eager batches are issued r=yuzefovich a=yuzefovich

**kvstreamer: add limit to how many eager batches are issued**

This commit fixes extremely suboptimal behavior of the streamer in the
InOrder mode in some cases. In particular, previously it was possible
for the buffered responses to consume most of the working budget, so the
streamer would degrade to processing all requests effectively one
BatchRequest with one Get / Scan at a time, significantly increasing the
latency. For example, the query added as a regression test that performs
30k Gets across 10 ranges would usually take on the order of 1.5s (which
is not great already since in the non-streamer path it takes 400ms), but
in the degenerate cases it could be on the order of 20-30s.

Similar behavior could occur in the OutOfOrder mode too where we would
issue more BatchRequests in which only one request could be satisfied
(although in OutOfOrder mode the problem is not as severe - we don't
buffer any results since we can always return them right away).

This problem is now fixed by imposing the limit on the budget's usage at
which point the streamer stops issuing "eager" requests. Namely, now,
when there is at least one request in flight, the streamer won't issue
anymore requests once `limit * eagerFraction` is exceeded. This
effectively reserves a portion of the budget for the "head-of-the-line"
batch.

The "eager fraction" is controlled by a session variable, separate for
each mode. The defaults of 0.5 for InOrder and 0.8 for OutOfOrder modes
were chosen after running TPCH queries and the query that inspired this
commit. These values bring the number of gRPC calls for the reproduction
query from 1.5k-2k range to below 200 and the query latency to be
reliably around 400ms.

I don't really see any significant downsides to this change - in the
worst case, we'd be utilizing less of the available memory budget which
is not that big of a deal, so I intend to backport this change. Also,
setting the eager fractions to large values (greater than 1.0 is
allowed) would disable this limiting behavior and revert to the previous
behavior if we need it.

Fixes: #113729.

Release note (bug fix): Previously, when executing queries with
index / lookup joins when the ordering needs to be maintained,
CockroachDB in some cases could get into a pathological behavior
which would lead to increased query latency, possibly by 1 or 2 orders
of magnitude. This bug was introduced in 22.2 and is now fixed.

**kvstreamer: increase default avg response multiple**

This commit increases the default value for
`sql.distsql.streamer.avg_response_size_multiple` cluster setting from
1.5 to 3.0. This setting controls the factor by which the current "avg
response size" estimate is multiplied and allows for TargetBytes
parameter to grow over time. In the reproduction query from the
previous commit it was determined that the growth might not be as quick
as desirable.

The effect of this change is as follows:
- if we have responses of varying sizes, then we're now likely to be more
effective since we'll end up issuing less BatchRequests
- if we have responses of similar sizes, then we might pre-reserve too
much budget upfront, so we'll end up with lower concurrency across
ranges.

Thus, we don't want to increase the multiple by too much; however,
keeping it at 1.5 can be quite suboptimal in some cases - 3.0 seems like
a decent middle ground. This number was chosen based on running TPCH
queries (both via InOrder and OutOfOrder modes of the streamer) and the
reproduction query. (For the latter this change reduces the number of
gRPC calls by a factor of 3 or so.)

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
e174990

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
pkg
November 8, 2023 23:56
June 4, 2019 13:07


CockroachDB is a cloud-native distributed SQL database designed to build, scale, and manage modern, data-intensive applications.

What is CockroachDB?

CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.

For more details, see our FAQ or architecture document.

Docs

For guidance on installation, development, deployment, and administration, see our User Documentation.

Starting with CockroachCloud

We can run CockroachDB for you, so you don't have to run your own cluster.

See our online documentation: Quickstart with CockroachCloud

Starting with CockroachDB

  1. Install CockroachDB: using a pre-built executable or build it from source.
  2. Start a local cluster and connect to it via the built-in SQL client.
  3. Learn more about CockroachDB SQL.
  4. Use a PostgreSQL-compatible driver or ORM to build an app with CockroachDB.
  5. Explore core features, such as data replication, automatic rebalancing, and fault tolerance and recovery.

Client Drivers

CockroachDB supports the PostgreSQL wire protocol, so you can use any available PostgreSQL client drivers to connect from various languages.

Deployment

  • CockroachCloud - Steps to create a free CockroachCloud cluster on your preferred Cloud platform.
  • Manual - Steps to deploy a CockroachDB cluster manually on multiple machines.
  • Cloud - Guides for deploying CockroachDB on various cloud platforms.
  • Orchestration - Guides for running CockroachDB with popular open-source orchestration systems.

Need Help?

Building from source

See our wiki for more details.

Contributing

We welcome your contributions! If you're looking for issues to work on, try looking at the good first issue list. We do our best to tag issues suitable for new external contributors with that label, so it's a great way to find something you can help with!

See our wiki for more details.

Engineering discussions take place on our public mailing list, cockroach-db@googlegroups.com. Also please join our Community Slack (there's a dedicated #contributors channel!) to ask questions, discuss your ideas, and connect with other contributors.

Design

For an in-depth discussion of the CockroachDB architecture, see our Architecture Guide. For the original design motivation, see our design doc.

Licensing

Current CockroachDB code is released under a combination of two licenses, the Business Source License (BSL) and the Cockroach Community License (CCL).

When contributing to a CockroachDB feature, you can find the relevant license in the comments at the top of each file.

For more information, see the Licensing FAQs.

Comparison with Other Databases

To see how key features of CockroachDB stack up against other databases, check out CockroachDB in Comparison.

See Also