This document describes how to create datasets in BigQuery.
You can create datasets in the following ways:
- Using the Cloud Console.
- Using the
bq mkcommand in thebqcommand-line tool. - Calling the
datasets.insertAPI method. - Using the client libraries.
- Copying an existing dataset.
To see steps for copying a dataset, including across regions, see Copying datasets.
Before you begin
Before you begin, ensure that you have the following required permissions.
Required permissions
To create datasets, you need the bigquery.datasets.create
Identity and Access Management (IAM) permission.
Each of the following predefined IAM roles includes the permissions that you need to create a dataset:
bigquery.dataEditorbigquery.dataOwnerbigquery.userbigquery.admin
For more information about IAM roles in BigQuery, see Predefined roles and permissions.
Dataset limitations
BigQuery datasets are subject to the following limitations:
- You can set the geographic location at creation time only. After a dataset has
been created, the location becomes immutable and can't be changed by using the
Cloud Console, using the
bqcommand-line tool, or calling thepatchorupdateAPI methods. All tables that are referenced in a query must be stored in datasets in the same location.
When you copy a table, the datasets that contain the source table and destination table must reside in the same location.
Dataset names must be unique for each project.
Naming datasets
When you create a dataset in BigQuery, the dataset name must be unique for each project. The dataset name can contain the following:
- Up to 1,024 characters.
Letters (uppercase or lowercase), numbers, and underscores.
Dataset names are case-sensitive: mydataset and MyDataset can coexist in the
same project.
Dataset names cannot contain spaces or special characters such as -, &, @,
or %.
Creating a dataset
To create a dataset:
Console
Open the BigQuery page in the Cloud Console.
In the Explorer panel, select the project where you want to create the dataset.
Expand the Actions option and click Create dataset.
On the Create dataset page:
- For Dataset ID, enter a unique dataset name.
(Optional) For Data location, choose a geographic location for the dataset. If you leave the value set to Default, the location is set to
US. After a dataset is created, the location can't be changed.For Default table expiration, choose one of the following options:
- Never: (Default) Tables created in the dataset are never automatically deleted. You must delete them manually.
- Number of days after table creation: This value determines when a newly created table in the dataset is deleted. This value is applied if you do not set a table expiration when the table is created.
Click Create dataset.
SQL
To create a dataset, use the
CREATE SCHEMA statement.
The following example creates a dataset named mydataset with a default
table expiration.
CREATE SCHEMA mydataset
OPTIONS(
default_table_expiration_days=3.75,
labels=[("label1","value1"),("label2","value2")]
)
For information about how to run a SQL query in BigQuery, see Running interactive and batch query jobs.
bq
Use the bq mk command with the --location flag to create a new dataset.
Optional parameters include --default_table_expiration,
--default_partition_expiration, and --description.
To create a dataset in a project other than your default project, add the
project ID to the dataset name in the following format:
project_id:dataset.
bq --location=location mk \ --dataset \ --default_table_expiration integer1 \ --default_partition_expiration integer2 \ --description description \ project_id:dataset
Replace the following:
locationis the dataset's location. After a dataset is created, the location can't be changed. You can set a default value for the location by using the.bigqueryrcfile.integer1is the default lifetime (in seconds) for newly created tables. The minimum value is 3600 seconds (one hour). The expiration time evaluates to the current time plus the integer value. Any table created in the dataset is deletedinteger1seconds after its creation time. This value is applied if you do not set a table expiration when you create the table.integer2is the default lifetime (in seconds) for partitions in newly created partitioned tables. The default partition expiration has no minimum value. The expiration time evaluates to the partition's date plus the integer value. Any partition created in a partitioned table in the dataset is deletedinteger2seconds after the partition's date. If you supply the--time_partitioning_expirationflag when you create or update a partitioned table, the table-level partition expiration takes precedence over the dataset-level default partition expiration.descriptionis a description of the dataset in quotes.project_idis your project ID.datasetis the name of the dataset that you're creating.
For example, the following command creates a dataset named mydataset with data
location set to US, a default table expiration of 3600 seconds (1 hour), and a
description of This is my dataset. Instead of using the --dataset flag, the
command uses the -d shortcut. If you omit -d and --dataset, the command
defaults to creating a dataset.
bq --location=US mk -d \
--default_table_expiration 3600 \
--description "This is my dataset." \
mydataset
To confirm that the dataset was created, enter the bq ls command. Also,
you can create a table when you create a new dataset using the
following format: bq mk -t dataset.table.
For more information about creating tables, see
Creating a table.
API
Call the datasets.insert
method with a defined dataset resource.
C#
Before trying this sample, follow the C# setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery C# API reference documentation.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.
PHP
Before trying this sample, follow the PHP setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery PHP API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Ruby
Before trying this sample, follow the Ruby setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Ruby API reference documentation.
Next steps
- For more information about assigning access controls to datasets, see Controlling access to datasets.
- For more information about listing datasets in a project, see Listing datasets.
- For more information about dataset metadata, see Getting information about datasets.
- For more information about changing dataset properties, see Updating datasets.
- For more information about creating and managing labels, see Creating and managing labels.
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how BigQuery performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Try BigQuery free