Stay organized with collections Save and categorize content based on your preferences.

Deletes

This document describes how to delete data stored in Cloud Bigtable tables, discusses when you should use each approach, and provides examples. Before you read this page, you should be familiar with the Bigtable overview and understand the concepts involved in schema design.

For consistency, descriptions on this page refer to the API methods that are used for each type of request. However, we strongly recommend that you always use one of the Bigtable client libraries to access the Bigtable APIs instead of using REST or RPC.

Examples on this page use sample data similar to the data that you might store in Bigtable.

To learn the number of times that you can use the operations described on this page per day, see Quotas and limits .

How Bigtable deletes data

When you send a delete request, cells are marked for deletion and cannot be read. The data is removed up to a week later during compaction, a background process that continuously optimizes the table. Deletion metadata can cause your data to take up slightly more space (several kb per row) for a few days after you send a delete request, until the next compaction occurs.

You can always send a delete request, even if your cluster has exceeded the storage limit and reads and writes are blocked.

Delete a range of rows

If you want to delete a large amount of data stored in contiguous rows, use dropRowRange. This operation deletes all rows for a range of rows identified by a starting and ending row or a row key prefix.

After a successful deletion is complete and you receive a response, you can safely write data to the same row range.

You can't call the dropRowRange method asynchronously. Attempting to send a dropRowRange request to a table while another one is in progress results in an error. If an error is returned, the caller should send the request again.

Using dropRowRange to delete data from a table stored in a single-cluster instance has almost no impact on performance. On instances that use replication, however, the request takes longer and you might notice an increase in replication latency and CPU usage until the operation is complete. For this reason, we recommend that if possible, you avoid dropping row ranges on replicated tables. To delete data from an instance that uses replication, use the Data API to read and then delete your data.

The following code samples show how to drop a range of rows that start with the row key prefix phone#5c10102. Not shown: instead of using a prefix, you can provide start and end row keys.

Java

import com.google.cloud.bigtable.admin.v2.BigtableTableAdminClient;
import java.io.IOException;

public class DropRowRangeExample {
  public void dropRowRange(String projectId, String instanceId, String tableId) throws IOException {
    try (BigtableTableAdminClient tableAdminClient =
        BigtableTableAdminClient.create(projectId, instanceId)) {
      tableAdminClient.dropRowRange(tableId, "phone#4c410523");
    }
  }
}

Python

def drop_row_range(project_id, instance_id, table_id):
    client = bigtable.Client(project=project_id, admin=True)
    instance = client.instance(instance_id)
    table = instance.table(table_id)
    row_key_prefix = "phone#4c410523"
    table.drop_by_prefix(row_key_prefix, timeout=200)

Node.js

await table.deleteRows('phone#5c10102');
await printRows();

Delete data using Data API methods

If you need to delete small amounts of non-contiguous data, deleting data using a method that calls the Bigtable Data API is often the best choice. Use these methods if you are deleting MB, not GB, of data in a request. Using the Data API is the only way to delete data from a column (not column family).

Data API methods call MutateRows with one of three mutation types:

  • DeleteFromColumn
  • DeleteFromFamily
  • DeleteFromRow

A delete request using the Data API is atomic: either the request succeeds and all data is deleted, or the request fails and no data is removed.

In most cases, avoid using CheckAndMutate methods to delete data. In the rare event that you require strong consistency, you might want to use this approach, but be aware that it is resource-intensive and performance might be affected.

To use MutateRows to delete data, you first send a readRows request with a filter to determine what you want to delete, and then you send the deletion request. For a list of the filters that are available, see Filters.

Samples in this section assume that you have already determined what data to delete.

Delete from a column

The following code samples demonstrate how to delete all the cells from a column in a row.

Java

import com.google.cloud.bigtable.data.v2.BigtableDataClient;
import com.google.cloud.bigtable.data.v2.models.Mutation;
import com.google.cloud.bigtable.data.v2.models.RowMutation;
import java.io.IOException;

public class DeleteFromColumnExample {
  public void deleteFromColumnCells(String projectId, String instanceId, String tableId)
      throws IOException {
    try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {
      Mutation mutation = Mutation.create().deleteCells("cell_plan", "data_plan_01gb");
      dataClient.mutateRow(RowMutation.create(tableId, "phone#4c410523#20190501", mutation));
    }
  }
}

Python

def delete_from_column(project_id, instance_id, table_id):
    client = bigtable.Client(project=project_id, admin=True)
    instance = client.instance(instance_id)
    table = instance.table(table_id)
    row = table.row("phone#4c410523#20190501")
    row.delete_cell(column_family_id="cell_plan", column="data_plan_01gb")
    row.commit()

Node.js

await table.mutate({
  key: 'phone#4c410523#20190501',
  method: 'delete',
  data: {
    column: 'cell_plan:data_plan_05gb',
  },
});
await printRows();

Delete from a column family

The following code samples demonstrate how to delete cells from a column family in a row.

Java

import com.google.cloud.bigtable.data.v2.BigtableDataClient;
import com.google.cloud.bigtable.data.v2.models.RowMutation;
import java.io.IOException;

public class DeleteFromColumnFamilyExample {
  public void deleteFromColumnFamily(String projectId, String instanceId, String tableId)
      throws IOException {
    try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {
      dataClient.mutateRow(
          RowMutation.create(tableId, "phone#5c10102#20190501").deleteFamily("stats_summary"));
    }
  }
}

Python

def delete_from_column_family(project_id, instance_id, table_id):
    client = bigtable.Client(project=project_id, admin=True)
    instance = client.instance(instance_id)
    table = instance.table(table_id)
    row = table.row("phone#4c410523#20190501")
    row.delete_cells(
        column_family_id="cell_plan", columns=row.ALL_COLUMNS
    )
    row.commit()

Node.js

await table.mutate({
  key: 'phone#4c410523#20190501',
  method: 'delete',
  data: {
    column: 'cell_plan',
  },
});
await printRows();

Delete from a row

The following code snippets demonstrate how to delete all the cells from a row.

Java

import com.google.cloud.bigtable.data.v2.BigtableDataClient;
import com.google.cloud.bigtable.data.v2.models.Mutation;
import com.google.cloud.bigtable.data.v2.models.RowMutation;
import java.io.IOException;

public class DeleteFromRowExample {
  public void deleteFromRow(String projectId, String instanceId, String tableId)
      throws IOException {
    try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {
      Mutation mutation = Mutation.create().deleteRow();
      dataClient.mutateRow(RowMutation.create(tableId, "phone#4c410523#20190501", mutation));
    }
  }
}

Python

def delete_from_row(project_id, instance_id, table_id):
    client = bigtable.Client(project=project_id, admin=True)
    instance = client.instance(instance_id)
    table = instance.table(table_id)
    row = table.row("phone#4c410523#20190501")
    row.delete()
    row.commit()

Node.js

const row = table.row('phone#4c410523#20190501');
await row.delete();
await printRows();

Delete by streaming and batching

Streaming and batching your delete requests is often the best way to delete large amounts of data. This strategy can be useful when you have finer-grained data retention requirements than garbage-collection policies allow.

The following code snippets start a stream of data (reading rows), batches them, and then go through the batch and deletes all the cells in column data_plan_01gb1 in the cell_plan column family.

Java

import com.google.api.gax.batching.Batcher;
import com.google.api.gax.rpc.ServerStream;
import com.google.cloud.bigtable.data.v2.BigtableDataClient;
import com.google.cloud.bigtable.data.v2.models.Query;
import com.google.cloud.bigtable.data.v2.models.Row;
import com.google.cloud.bigtable.data.v2.models.RowMutationEntry;
import java.io.IOException;

public class BatchDeleteExample {
  public void batchDelete(String projectId, String instanceId, String tableId)
      throws InterruptedException, IOException {
    try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {
      try (Batcher<RowMutationEntry, Void> batcher = dataClient.newBulkMutationBatcher(tableId)) {
        ServerStream<Row> rows = dataClient.readRows(Query.create(tableId));
        for (Row row : rows) {
          batcher.add(
              RowMutationEntry.create(row.getKey()).deleteCells("cell_plan", "data_plan_05gb"));
        }
        // Blocks until mutations are applied on all submitted row entries.
        batcher.flush();
      }
    }
  }
}

Python

def streaming_and_batching(project_id, instance_id, table_id):
    client = bigtable.Client(project=project_id, admin=True)
    instance = client.instance(instance_id)
    table = instance.table(table_id)
    batcher = table.mutations_batcher(flush_count=2)
    rows = table.read_rows()
    for row in rows:
        row = table.row(row.row_key)
        row.delete_cell(column_family_id="cell_plan", column="data_plan_01gb")

    batcher.mutate_rows(rows)

Node.js

const rows = (await table.getRows({limit: 2}))[0];
const entries = rows.map(row => {
  return {
    key: row.id,
    method: 'delete',
    data: {
      column: 'cell_plan:data_plan_05gb',
    },
  };
});
await table.mutate(entries);
await printRows();

What's next