This is an English translation of a Japanese blog. Some content may not be fully translated.
Snowflake

Querying S3 External Tables in Snowflake

Overview

  • You can query Parquet, CSV, and other files placed on S3
    • Understood as a feature similar to Redshift Spectrum

Notes

Creating a File Format

CREATE FILE FORMAT csv_format
    TYPE = 'CSV'
    COMPRESSION = 'AUTO'
    FIELD_DELIMITER = ','
    RECORD_DELIMITER = '\n'
    SKIP_HEADER = 0
    FIELD_OPTIONALLY_ENCLOSED_BY = NONE
    DATE_FORMAT = 'AUTO'
    TIMESTAMP_FORMAT = 'AUTO'
    NULL_IF = ('');

Creating an External Stage

Using the external stage created below. Refer to this for IAM settings as well.

Loading Data from S3 (External Stage) into Snowflake | my opinion is my own

zatoima#COMPUTE_WH@TESTDB.PUBLIC>show stages;
+-------------------------------+--------------+---------------+-------------+-----------------------------+-----------------+--------------------+--------------+---------+----------------+----------+-------+----------------------+------------------------------+
| created_on                    | name         | database_name | schema_name | url                         | has_credentials | has_encryption_key | owner        | comment | region         | type     | cloud | notification_channel | storage_integration          |
|-------------------------------+--------------+---------------+-------------+-----------------------------+-----------------+--------------------+--------------+---------+----------------+----------+-------+----------------------+------------------------------|
| 2022-08-23 22:45:45.523 -0700 | S3_EXT_STAGE | TESTDB        | PUBLIC      | s3://snowflake-bucket-work/ | N               | N                  | ACCOUNTADMIN |         | ap-northeast-1 | EXTERNAL | AWS   | NULL                 | SNOWFLAKE_AND_S3_INTEGRATION |
+-------------------------------+--------------+---------------+-------------+-----------------------------+-----------------+--------------------+--------------+---------+----------------+----------+-------+----------------------+------------------------------+
1 Row(s) produced. Time Elapsed: 0.077s

Building an External Table

CREATE EXTERNAL TABLE t1
  WITH location = @S3_EXT_STAGE
  auto_refresh = true
  file_format = csv_format;

Even when auto_refresh is set to TRUE, when files on the S3 side change, you need to perform a refresh. For AWS S3, the following settings must be configured in advance.

zatoima#COMPUTE_WH@TESTDB.PUBLIC>ALTER EXTERNAL TABLE t1 refresh;

+--------+-------------------+-----------------------------------------+
| file   | status            | description                             |
|--------+-------------------+-----------------------------------------|
| t1.csv | REGISTERED_UPDATE | File registered (updated) successfully. |
+--------+-------------------+-----------------------------------------+
1 Row(s) produced. Time Elapsed: 0.480s
zatoima#COMPUTE_WH@TESTDB.PUBLIC>
zatoima#COMPUTE_WH@TESTDB.PUBLIC>select * from t1;
+-----------------------------------------+
| VALUE                                   |
|-----------------------------------------|
| {                                       |
|   "c1": "0",                            |
|   "c2": "111111",                       |
|   "c3": "222222",                       |
|   "c4": " 1",                           |
|   "c5": "2022-08-23 22:47:44.214 -0700" |
| }                                       |
| {                                       |
|   "c1": "1",                            |
|   "c2": "111111",                       |
|   "c3": "222222",                       |
|   "c4": " 2",                           |
|   "c5": "2022-08-23 22:47:44.214 -0700" |
| }                                       |
+-----------------------------------------+
2 Row(s) produced. Time Elapsed: 1.077s
zatoima#COMPUTE_WH@TESTDB.PUBLIC>

The update status can be checked with SYSTEM$EXTERNAL_TABLE_PIPE_STATUS('table_name');.

zatoima#COMPUTE_WH@TESTDB.PUBLIC>select SYSTEM$EXTERNAL_TABLE_PIPE_STATUS('T1');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SYSTEM$EXTERNAL_TABLE_PIPE_STATUS('T1')                                                                                                                                                                                                                                        |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {"executionState":"RUNNING","pendingFileCount":0,"notificationChannelName":"arn:aws:sqs:ap-northeast-1:xxxxx:sf-snowpipe-xxxx-88QZUsuy_HLwocQAHjHNUA","numOutstandingMessagesOnChannel":0,"lastPulledFromChannelTimestamp":"2022-08-25T07:24:54.217Z"} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Execution log

zatoima#COMPUTE_WH@TESTDB.PUBLIC>CREATE EXTERNAL TABLE t1
                                   WITH location = @S3_EXT_STAGE
                                   auto_refresh = true
                                   file_format = csv_format;
+--------------------------------+
| status                         |
|--------------------------------|
| Table T1 successfully created. |
+--------------------------------+
1 Row(s) produced. Time Elapsed: 4.079s
zatoima#COMPUTE_WH@TESTDB.PUBLIC>
zatoima#COMPUTE_WH@TESTDB.PUBLIC>show tables;
+-------------------------------+----------+---------------+-------------+-------+---------+------------+------+-------+--------------+----------------+----------------------+-----------------+-------------+
| created_on                    | name     | database_name | schema_name | kind  | comment | cluster_by | rows | bytes | owner        | retention_time | automatic_clustering | change_tracking | is_external |
|-------------------------------+----------+---------------+-------------+-------+---------+------------+------+-------+--------------+----------------+----------------------+-----------------+-------------|
| 2022-08-14 05:17:22.196 -0700 | LINEITEM | TESTDB        | PUBLIC      | TABLE |         |            |    0 |     0 | ACCOUNTADMIN | 1              | OFF                  | OFF             | N           |
| 2022-08-25 00:05:01.262 -0700 | T1       | TESTDB        | PUBLIC      | TABLE |         |            | NULL |     0 | ACCOUNTADMIN | 1              | OFF                  | OFF             | Y           |
+-------------------------------+----------+---------------+-------------+-------+---------+------------+------+-------+--------------+----------------+----------------------+-----------------+-------------+
2 Row(s) produced. Time Elapsed: 0.146s
zatoima#COMPUTE_WH@TESTDB.PUBLIC>
zatoima#COMPUTE_WH@TESTDB.PUBLIC>
zatoima#COMPUTE_WH@TESTDB.PUBLIC>select * from t1;
+-----------------------------------------+
| VALUE                                   |
|-----------------------------------------|
| {                                       |
|   "c1": "0",                            |
|   "c2": "111111",                       |
|   "c3": "222222",                       |
|   "c4": "1",                            |
|   "c5": "2022-08-23 22:47:44.214 -0700" |
| }                                       |
| {                                       |
|   "c1": "1",                            |
|   "c2": "111111",                       |
|   "c3": "222222",                       |
|   "c4": "2",                            |
|   "c5": "2022-08-23 22:47:44.214 -0700" |
| }                                       |
~~omitted~~
zatoima#COMPUTE_WH@TESTDB.PUBLIC>SELECT value :c1, value :c2, value :c3, value :c4, value :c5 FROM t1 LIMIT 10 ;
+-----------+-----------+-----------+-----------+---------------------------------+
| VALUE :C1 | VALUE :C2 | VALUE :C3 | VALUE :C4 | VALUE :C5                       |
|-----------+-----------+-----------+-----------+---------------------------------|
| "0"       | "111111"  | "222222"  | "1"       | "2022-08-23 22:47:44.214 -0700" |
| "1"       | "111111"  | "222222"  | "2"       | "2022-08-23 22:47:44.214 -0700" |
| "2"       | "111111"  | "222222"  | "3"       | "2022-08-23 22:47:44.214 -0700" |
| "3"       | "111111"  | "222222"  | "4"       | "2022-08-23 22:47:44.214 -0700" |
| "4"       | "111111"  | "222222"  | "5"       | "2022-08-23 22:47:44.214 -0700" |
| "5"       | "111111"  | "222222"  | "6"       | "2022-08-23 22:47:44.214 -0700" |
| "6"       | "111111"  | "222222"  | "7"       | "2022-08-23 22:47:44.214 -0700" |
| "7"       | "111111"  | "222222"  | "8"       | "2022-08-23 22:47:44.214 -0700" |
| "8"       | "111111"  | "222222"  | "9"       | "2022-08-23 22:47:44.214 -0700" |
| "9"       | "111111"  | "222222"  | "10"      | "2022-08-23 22:47:44.214 -0700" |
+-----------+-----------+-----------+-----------+---------------------------------+
10 Row(s) produced. Time Elapsed: 0.653s

Reference

Suggest an edit on GitHub