How to Create a Google Dataset in Google Cloud Platform

Google provides an Enterprise size Data Warehousing and Analytics Solution known as BigQuery. It is a fast, scalable, flexible, & reliable platform with an easy-to-use interface for performing several administrative tasks. BigQuery allows you to create datasets in your BigQuery Projects. Using the BigQuery Datasets, you can effectively maintain and manage access to your tables and views. For a table or view to be part of a dataset, you must create a BigQuery Dataset before loading data into BigQuery.

Before you start creating BigQuery Datasets, you need to have bigquery.datasets.create IAM (Identity & Access Management) permission. There are 4 predefined IAM roles that include the permissions you require for creating a BigQuery Datasets:

  • Data Editor: roles/bigquery.dataEditor
  • Data Owner: roles/bigquery.dataOwner
  • User: roles/bigquery.user
  • Admin: roles/bigquery.admin

Table of Contents

  • Best Practices for Creating a BigQuery Dataset
  • Steps for building BigQuery Datasets in Google Cloud Platform
    • Method 1: Google Console
    • Method 2: bq Commands
    • Method 3: BigQuery SQL
    • Method 4: IoT Cloud Tester
    • Method 5: API and Client Libraries 
  • Conclusion

Best Practices for Creating a BigQuery Dataset

While creating a Bigquery Dataset, you need to remember the following points:

  • While naming your BigQuery Dataset, make sure to provide a unique name for each project. 
  • The BigQuery Dataset name can include up to 1024 characters with Letters (uppercase or lowercase), numbers, and underscores. However, you cannot use spaces or special characters such as -, &, @, or % in your BigQuery Dataset name. 
  • Also, the names are case-sensitive. For example, sampledataset and SampleDataset are 2 different datasets.
  • Ensure to double-check where you want to store your Datasets as GCP allows you to set the geographic location of your dataset only during its creation.
  • You need to make sure that the tables that are referenced in a query should be stored in datasets at the same location.
  • When copying a table, the datasets containing the source and target tables must be present in the same location.

Steps for building BigQuery Datasets in Google Cloud Platform

You can create BigQuery Datasets using any of the following methods given below:

  • Method 1: Google Console
  • Method 2: bq Commands
  • Method 3: BigQuery SQL
  • Method 4: IoT Cloud Tester
  • Method 5: API and Client Libraries 

Method 1: Google Console

Follow the simple steps given below using the intuitive UI of Google Console:

  • Step 1: Go to the BigQuery page on the Cloud Console.
  • Step 2: Navigate to the Explorer panel to choose the project where you want to create your BigQuery Dataset. You can close add a new project by clicking on the New Project button.
  • Step 3: Click on the ⠇ Actions option and then select the Create Dataset option.
  • Step 4: A Create Dataset page will pop up on your screen. You can fill in the following details in it:
    • Dataset ID: Type in a unique dataset name according to the guidelines discussed above.
    • Data Location: Carefully select the geographic location of your dataset. Note that once the dataset is created, you can’t modify this location.
    • Default Table Expiration: You can set the value of the number of days after which a newly created table in a dataset is deleted. If you set this to Never, then you would need to manually delete the datasets when required as now they will not be deleted automatically. 
  • Step 5: Once you are done filling in all the details, click on the Create Dataset button to complete the process.

Method 2: bq Commands

BigQuery also provides a Python-based command-line tool known as bq. You can employ the bq mk command with the –location flag to create a new dataset. You can also include additional parameters such as –default_table_expiration, –default_partition_expiration, and –description.

For example, you can create a SampleDataset and store the dataset in the data centers present in the United States. The default table expiration time is set to 2 hours(7200 seconds).

bq –location=US mk -d \

–default_table_expiration 7200 \

–description “An example dataset.” \

SampleDataset

Method 3: BigQuery SQL

Data Analysts familiar with SQL can create a dataset using the CREATE SCHEMA statement.

Consider the example code given below that creates a dataset named SampleDataset along with the Default Table Expiration time in days and KMS(Cloud Key Management Service) Key. The KMS key is used to secure the newly created tables in this dataset unless a different key is provided at the time of creation.

 CREATE SCHEMA SampleDataset

 OPTIONS(

   default_table_expiration_days=2,

   default_kms_key_name=”projects/sample-project/locations/eu/keyRings/mykeyring/cryptoKeys/mykey”,

   labels=[(“label1″,”value1”),(“label2″,”value2”)]

 )

Method 4: IoT Cloud Tester

You can also use the IoT Cloud Tester application to create your BigQuery Dataset. Follow the easy steps given below to use this tool:

  • Step 1: Open your IoT Cloud Tester application and navigate to the BigQuery Tab > Datasets Tab.
  • Step 2: Click on the Add Dataset button. An Add Dataset window will pop up on your screen. Provide a unique Dataset Name for the Dataset ID and click on the Add button.

IoT Cloud Tester

  • Step 3: IoT Cloud Tester application will now make a POST request for creating the BigQuery Dataset. You can click on the Console button to check the request made where the Dataset ID is passed as a JSON Object.

JSON Object

  • Step 4: You can go to your Google Console to check if the required dataset has been created.

API and Client LibrariesMethod 5: API and Client Libraries 

Owing to its flexibility, BigQuery allows you to create datasets using several client libraries such as C#, Go, Java, Node.js, PHP, Python & Ruby. You can also call the datasets. Insert method along with a defined dataset resource.

Conclusion

In this article, you have learned how to easily create BigQuery Dataset using 5 different methods. The Google Cloud Platform provides an easy-to-use Interface i.e the Google Cloud Console to perform the basic administrative tasks. You can also SQL, Bq commands, or programming languages like C#, Go, Java, Node.js, PHP, Python & Ruby to create your datasets.