This is an English translation of a Japanese blog. Some content may not be fully translated.
GCP

Understanding the Basics of GCP BigQuery

In AWS terms, I understand it as a service similar to Athena, Redshift, and Aurora all in one.

BigQuery Components

  • BigQuery Managed Storage
    • Scalable data storage
  • BigQuery Analysis
    • Parallel SQL engine based on Dremel query engine technology

Architecture

Data Storage Format

Distributed Data Placement

Parallel Query Processing

Data Types

https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types?hl=ja

Data Storage Approach

In BigQuery, charges are based on the amount of data read, so using these features should be actively considered.

  • Partitioned tables
    • Partition pruning, per-partition export, etc.
  • Clustered tables
    • Data placement and order adjusted based on clustering columns

Slots

Slots represent the degree of processing parallelism, with a default maximum of 2000. BigQuery achieves its fast parallel processing through distributed storage and slot distribution, but note that it’s not guaranteed to scale up to this limit. It does not seem to refer to CPU core count.

https://cloud.google.com/bigquery/docs/slots?hl=ja

BigQuery slots are virtual CPUs used by BigQuery to run SQL queries. BigQuery automatically calculates the number of slots required per query based on the size and complexity of the query.

The unknown world of Google BigQuery - Qiita https://qiita.com/AkiQ/items/9c5eefb7953409aa2eda

As mentioned, by default a project is given a maximum of 2,000 slots. Query speed is achieved through slot parallel processing. Slots are allocated from resources currently available in BigQuery, which makes sense when you think about it. Slots are essentially a global resource. Therefore, even though you can use up to 2,000 slots, it doesn’t mean you can always use all 2,000 slots simultaneously.

BigQuery Hierarchy

Cost Optimization

Pricing

Pricing | BigQuery: Cloud Data Warehouse | Google Cloud https://cloud.google.com/bigquery/pricing?hl=ja

  • Query pricing
  • Storage pricing

Transferring Data from Other Clouds

Without data, an analytics platform is useless. Using BigQuery Data Transfer Service for Amazon S3, you can automatically schedule recurring load jobs from Amazon S3 to BigQuery. The reverse is also possible.

Amazon S3 Transfer | BigQuery Data Transfer Service | Google Cloud https://cloud.google.com/bigquery-transfer/docs/s3-transfer?hl=ja

Thinking about and summarizing data migration from GCP to AWS | DevelopersIO https://dev.classmethod.jp/articles/data-migration-from-gcp-to-aws-matome/#a-4

References

BigQuery Documentation | Google Cloud https://cloud.google.com/bigquery/docs

Suggest an edit on GitHub