This is an English translation of a Japanese blog. Some content may not be fully translated.
❄️

Loading Data from S3 (External Stage) into Snowflake

Loading data from S3 (external stage) into Snowflake.

Creating an IAM Policy

Created as sf_policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": "arn:aws:s3:::snowflake-bucket-work/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::snowflake-bucket-work"
        }
    ]
}

Creating an IAM Role

Created as sf_role:

  • Select AWS account
  • Select another AWS account and temporarily enter your own AWS Account ID
    • Note: The trust relationship will be modified later to grant access to Snowflake
  • Click “Require external ID” and enter a dummy ID
    • The trust relationship will be modified later to specify the Snowflake stage’s external ID. An external ID is required to allow a third party (Snowflake) to access AWS resources (i.e., S3)
  • Attach the sf_policy created above

image-20220824142339311

image-20220824142551527

Creating Cloud Storage Integration on Snowflake Side

CREATE STORAGE INTEGRATION snowflake_and_s3_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = S3
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<AWS Account>:role/sf_role'
  STORAGE_ALLOWED_LOCATIONS = ('s3://snowflake-bucket-work/')
;

Note: Only account administrators (users with ACCOUNTADMIN role) or roles with global CREATE INTEGRATION privilege can execute this command.

zatoima#COMPUTE_WH@TESTDB.PUBLIC>CREATE STORAGE INTEGRATION snowflake_and_s3_integration
                                   TYPE = EXTERNAL_STAGE
                                   STORAGE_PROVIDER = S3
                                   ENABLED = TRUE
                                   STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<AWS Account>:role/sf_role'
                                   STORAGE_ALLOWED_LOCATIONS = ('s3://snowflake-bucket-work/')
                                 ;
+----------------------------------------------------------------+
| status                                                         |
|----------------------------------------------------------------|
| Integration SNOWFLAKE_AND_S3_INTEGRATION successfully created. |
+----------------------------------------------------------------+
1 Row(s) produced. Time Elapsed: 1.015s
zatoima#COMPUTE_WH@TESTDB.PUBLIC>desc integration snowflake_and_s3_integration;
+---------------------------+---------------+------------------------------------------------+------------------+
| property                  | property_type | property_value                                 | property_default |
|---------------------------+---------------+------------------------------------------------+------------------|
| ENABLED                   | Boolean       | true                                           | false            |
| STORAGE_PROVIDER          | String        | S3                                             |                  |
| STORAGE_ALLOWED_LOCATIONS | List          | s3://snowflake-bucket-work/                    | []               |
| STORAGE_BLOCKED_LOCATIONS | List          |                                                | []               |
| STORAGE_AWS_IAM_USER_ARN  | String        | arn:aws:iam::xxxxxxxxx:user/qvl6-s-jpss8756 |                  |
| STORAGE_AWS_ROLE_ARN      | String        | arn:aws:iam::<AWS Account>:role/sf_role         |                  |
| STORAGE_AWS_EXTERNAL_ID   | String        | <Snowflake Account>_SFCRole=3_ZGmPmtzjlkklddeysL+zqym2qW8= |                  |
| COMMENT                   | String        |                                                |                  |
+---------------------------+---------------+------------------------------------------------+------------------+
8 Row(s) produced. Time Elapsed: 0.713s
zatoima#COMPUTE_WH@TESTDB.PUBLIC>

Note the values of STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID.

Modifying IAM Role Trust Relationship

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"AWS": "<STORAGE_AWS_IAM_USER_ARN>"
			},
			"Action": "sts:AssumeRole",
			"Condition": {
				"StringEquals": {
					"sts:ExternalId": "<STORAGE_AWS_EXTERNAL_ID>"
				}
			}
		}
	]
}

Creating an External Stage

CREATE STAGE s3_ext_stage STORAGE_INTEGRATION = snowflake_and_s3_integration URL = 's3://snowflake-bucket-work/';
zatoima#COMPUTE_WH@TESTDB.PUBLIC>CREATE STAGE s3_ext_stage STORAGE_INTEGRATION = snowflake_and_s3_integration URL = 's3://snowflake-bucket-work/';
+-----------------------------------------------+
| status                                        |
|-----------------------------------------------|
| Stage area S3_EXT_STAGE successfully created. |
+-----------------------------------------------+
zatoima#COMPUTE_WH@TESTDB.PUBLIC>show stages;
+-------------------------------+--------------+---------------+-------------+-----------------------------+-----------------+--------------------+--------------+---------+----------------+----------+-------+----------------------+------------------------------+
| created_on                    | name         | database_name | schema_name | url                         | has_credentials | has_encryption_key | owner        | comment | region         | type     | cloud | notification_channel | storage_integration          |
|-------------------------------+--------------+---------------+-------------+-----------------------------+-----------------+--------------------+--------------+---------+----------------+----------+-------+----------------------+------------------------------|
| 2022-08-23 22:45:45.523 -0700 | S3_EXT_STAGE | TESTDB        | PUBLIC      | s3://snowflake-bucket-work/ | N               | N                  | ACCOUNTADMIN |         | ap-northeast-1 | EXTERNAL | AWS   | NULL                 | SNOWFLAKE_AND_S3_INTEGRATION |
+-------------------------------+--------------+---------------+-------------+-----------------------------+-----------------+--------------------+--------------+---------+----------------+----------+-------+----------------------+------------------------------+
1 Row(s) produced. Time Elapsed: 0.098s

Loading Data

copy into t1 from @s3_ext_stage pattern='test.csv';
zatoima#COMPUTE_WH@TESTDB.PUBLIC>copy into t1 from @s3_ext_stage pattern='test.csv';
+-------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file                                | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| s3://snowflake-bucket-work/test.csv | LOADED |          10 |          10 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
+-------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.675s
zatoima#COMPUTE_WH@TESTDB.PUBLIC>select * from t1;
+---+--------+--------+----+-------------------------------+
| A | B      | C      |  D | E                             |
|---+--------+--------+----+-------------------------------|
| 0 | 111111 | 222222 |  1 | 2022-08-23 22:47:44.214 -0700 |
| 1 | 111111 | 222222 |  2 | 2022-08-23 22:47:44.214 -0700 |
| 2 | 111111 | 222222 |  3 | 2022-08-23 22:47:44.214 -0700 |
| 3 | 111111 | 222222 |  4 | 2022-08-23 22:47:44.214 -0700 |
| 4 | 111111 | 222222 |  5 | 2022-08-23 22:47:44.214 -0700 |
| 5 | 111111 | 222222 |  6 | 2022-08-23 22:47:44.214 -0700 |
| 6 | 111111 | 222222 |  7 | 2022-08-23 22:47:44.214 -0700 |
| 7 | 111111 | 222222 |  8 | 2022-08-23 22:47:44.214 -0700 |
| 8 | 111111 | 222222 |  9 | 2022-08-23 22:47:44.214 -0700 |
| 9 | 111111 | 222222 | 10 | 2022-08-23 22:47:44.214 -0700 |
+---+--------+--------+----+-------------------------------+
10 Row(s) produced. Time Elapsed: 0.198s
zatoima#COMPUTE_WH@TESTDB.PUBLIC>

Load Options

ON_ERROR options:

CONTINUE If an error is found, continue loading the file. Rows where errors are found will not be loaded.
SKIP_FILE If an error is found, skip the file.
SKIP_FILE_n If the number of error rows found in the file is n or more, skip the file.
SKIP_FILE_n% If the percentage of error rows found in the file exceeds the specified percentage, skip the file.
ABORT_STATEMENT If an error is found in the data file, abort the load operation.

Unloading Data

copy into @s3_ext_stage/unload/ from t1;

By default, data is unloaded in compressed form.

References

Suggest an edit on GitHub