GCP BigQuery Google

Loading CSV data into BigQuery

洪堂瑋 Tangwei Hung 2023/04/29 01:24:29
1230

Goal:

Ingest CSV data into tables inside of BigQuery

 

Required roles and permissions:

Permissions to load data into BigQuery

IAM permissions:

  • bigquery.tables.create
  • bigquery.tables.updateData
  • bigquery.tables.update
  • bigquery.jobs.create

predefined IAM roles:

  • roles/bigquery.dataEditor
  • roles/bigquery.dataOwner
  • roles/bigquery.admin (includes the bigquery.jobs.create permission)
  • bigquery.user (includes the bigquery.jobs.create permission)
  • bigquery.jobUser (includes the bigquery.jobs.create permission)

Permissions to load data from Cloud Storage

IAM permissions:

  • storage.buckets.get
  • storage.objects.get
  • storage.objects.list (required if you are using a URI wildcard)

 

Task 1. Create a new dataset to store tables

  1. In the BigQuery console, click on the View actions icon () next to your Project ID and click Create dataset.
  2. Set the Dataset ID to movie. Leave the other fields at their default values.

  3. Click Create dataset.

You’ll now see the movie dataset under your project name.

 

Task 2. Ingest a new dataset from a CSV

  1. In the BigQuery Console, click View actions icon next to nyctaxi dataset then click Create table.
  2. Specify the following table options:

Source:

  • Create table from: Upload
  • Select file: Browse the file you downloaded locally earlier
  • File format: CSV

Destination:

  • Table name: rating

Leave all other settings at default.

Schema:

  • Check Auto Detect (tip: Not seeing the checkbox? Ensure the file format is CSV and not Avro)

Advanced Options:

  • Leave at default values

 

3. Click Create Table.

You should now see the rating table below the movie dataset.

 

Task 3. Ingest a new dataset from Google Cloud Storage

Now, try to load another subset of the same rating data that is available on Cloud Storage. And this time, let’s use the CLI tool to do it.

  1. In your Cloud Shell, run the following command:
bq load \
--source_format=CSV \
--autodetect \
--noreplace  \
movie.rating1 \
gs://cloud-sql-postgres-samples/rating.csv

You should now see the rating1 table below the movie dataset.

 

洪堂瑋 Tangwei Hung