Deletes
This document describes how to delete data stored in Cloud Bigtable tables, discusses when you should use each approach, and provides examples. Before you read this page, you should be familiar with the Bigtable overview and understand the concepts involved in schema design.
For consistency, descriptions on this page refer to the API methods that are used for each type of request. However, we strongly recommend that you always use one of the Bigtable client libraries to access the Bigtable APIs instead of using REST or RPC.
Examples on this page use sample data similar to the data that you might store in Bigtable.
To learn the number of times that you can use the operations described on this page per day, see Quotas and limits .
How Bigtable deletes data
When you send a delete request, cells are marked for deletion and cannot be read. The data is removed up to a week later during compaction, a background process that continuously optimizes the table. Deletion metadata can cause your data to take up slightly more space (several kb per row) for a few days after you send a delete request, until the next compaction occurs.
You can always send a delete request, even if your cluster has exceeded the storage limit and reads and writes are blocked.
Delete a range of rows
If you want to delete a large amount of data stored in contiguous rows, use
dropRowRange. This operation deletes all rows for a range of rows identified
by a starting and ending row or a row key prefix.
After a successful deletion is complete and you receive a response, you can safely write data to the same row range.
You can't call the dropRowRange method asynchronously. Attempting to send a
dropRowRange request to a table while another one is in progress results in an
error. If an error is returned, the caller should send the request again.
Using dropRowRange to delete data from a table stored in a single-cluster
instance has almost no impact on performance. On instances that use replication,
however, the request takes longer and you might notice an increase in
replication latency and CPU usage until the operation is complete. For this
reason, we recommend that if possible, you avoid dropping row ranges on
replicated tables. To delete data from an instance that uses replication,
use the Data API to read and then delete your data.
The following code samples show how to drop a range of rows that start with
the row key prefix phone#5c10102. Not shown: instead of using a prefix, you
can provide start and end row keys.
Java
Python
Node.js
Delete data using Data API methods
If you need to delete small amounts of non-contiguous data, deleting data using a method that calls the Bigtable Data API is often the best choice. Use these methods if you are deleting MB, not GB, of data in a request. Using the Data API is the only way to delete data from a column (not column family).
Data API methods call MutateRows with one of three mutation types:
- DeleteFromColumn
- DeleteFromFamily
- DeleteFromRow
A delete request using the Data API is atomic: either the request succeeds and all data is deleted, or the request fails and no data is removed.
In most cases, avoid using CheckAndMutate methods to delete data. In the rare
event that you require strong consistency, you might want to use this
approach, but be aware that it is resource-intensive and performance might be
affected.
To use MutateRows to delete data, you first send a readRows request with a
filter to determine what you want to delete, and then you send the deletion
request. For a list of the filters that are available, see
Filters.
Samples in this section assume that you have already determined what data to delete.
Delete from a column
The following code samples demonstrate how to delete all the cells from a column in a row.
Java
Python
Node.js
Delete from a column family
The following code samples demonstrate how to delete cells from a column family in a row.
Java
Python
Node.js
Delete from a row
The following code snippets demonstrate how to delete all the cells from a row.
Java
Python
Node.js
Delete by streaming and batching
Streaming and batching your delete requests is often the best way to delete large amounts of data. This strategy can be useful when you have finer-grained data retention requirements than garbage-collection policies allow.
The following code snippets start a stream of data (reading
rows), batches them, and then go through the batch and deletes all the
cells in column data_plan_01gb1 in the cell_plan column family.
Java
Python
Node.js
What's next
- If you're using the HBase client library, review the list of unsupported deletes.
- Explore the ways that you can
monitor your Bigtable resources.