In AWS terms, I understand it as a service similar to Athena, Redshift, and Aurora all in one.
BigQuery Components
- BigQuery Managed Storage
- Scalable data storage
- BigQuery Analysis
- Parallel SQL engine based on Dremel query engine technology
Architecture
Data Storage Format
Distributed Data Placement
Parallel Query Processing
Data Types
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types?hl=ja
Data Storage Approach
In BigQuery, charges are based on the amount of data read, so using these features should be actively considered.
- Partitioned tables
- Partition pruning, per-partition export, etc.
- Clustered tables
- Data placement and order adjusted based on clustering columns
Slots
Slots represent the degree of processing parallelism, with a default maximum of 2000. BigQuery achieves its fast parallel processing through distributed storage and slot distribution, but note that it’s not guaranteed to scale up to this limit. It does not seem to refer to CPU core count.
https://cloud.google.com/bigquery/docs/slots?hl=ja
BigQuery slots are virtual CPUs used by BigQuery to run SQL queries. BigQuery automatically calculates the number of slots required per query based on the size and complexity of the query.
The unknown world of Google BigQuery - Qiita https://qiita.com/AkiQ/items/9c5eefb7953409aa2eda
As mentioned, by default a project is given a maximum of 2,000 slots. Query speed is achieved through slot parallel processing. Slots are allocated from resources currently available in BigQuery, which makes sense when you think about it. Slots are essentially a global resource. Therefore, even though you can use up to 2,000 slots, it doesn’t mean you can always use all 2,000 slots simultaneously.
BigQuery Hierarchy
Cost Optimization
- The “BigQuery bankruptcy” topic was discussed previously. When running analytical queries on large volumes of data, you need to be careful about costs.
- BigQuery Cost Optimization Best Practices | Google Cloud Blog https://cloud.google.com/blog/ja/products/data-analytics/cost-optimization-best-practices-for-bigquery?utm_source=pocket_mylist
Pricing
Pricing | BigQuery: Cloud Data Warehouse | Google Cloud https://cloud.google.com/bigquery/pricing?hl=ja
- Query pricing
- Storage pricing
Transferring Data from Other Clouds
Without data, an analytics platform is useless. Using BigQuery Data Transfer Service for Amazon S3, you can automatically schedule recurring load jobs from Amazon S3 to BigQuery. The reverse is also possible.
Amazon S3 Transfer | BigQuery Data Transfer Service | Google Cloud https://cloud.google.com/bigquery-transfer/docs/s3-transfer?hl=ja
Thinking about and summarizing data migration from GCP to AWS | DevelopersIO https://dev.classmethod.jp/articles/data-migration-from-gcp-to-aws-matome/#a-4
References
BigQuery Documentation | Google Cloud https://cloud.google.com/bigquery/docs