我看到 Snowflake 现在支持从 GitHub 执行脚本(公共预览版),但没有示例显示实际从 .csv 读取数据。
这可能吗?
从 GitHub 读取 CSV 很容易。例如,在不需要身份验证的公共存储库中 - 我们首先需要在 Snowflake 中设置连接:
create or replace api integration git_plotly_datasets_integration
api_provider = git_https_api
api_allowed_prefixes = ('https://github.com/plotly')
-- allowed_authentication_secrets = (git_secret)
enabled = true;
create or replace git repository plotly_datasets
api_integration = git_plotly_datasets_integration
-- git_credentials = myco_git_secret
origin = 'https://github.com/plotly/datasets.git';
(我跳过了良好的安全实践并使用 accountadmin 完成了这一切)
完成 git 存储库设置后,我们可以
select from
该存储库中的任何文件:
select $1, $2, $3, $16
from @plotly_datasets/branches/master/1962_2006_walmart_store_openings.csv;
这太棒了 - 但尤其是对于 CSV,我很乐意进行一些模式自动检测。但这不起作用:
create or replace file format my_csv_format
type = csv
parse_header = true
field_optionally_enclosed_by = '"';
select *
from table(
infer_schema(
location=>'@plotly_datasets/branches/master/1962_2006_walmart_store_openings.csv'
, file_format=>'my_csv_format'
)
);
-- Remote file '[...]/git_repository/e7f0f8a8-bd17-4834-adeb-5ab5bcd86192/null' was not found. There are several potential causes. The file might not exist. The required credentials may be missing or invalid. If you are running a copy command, please make sure files are not deleted when they are being loaded or files are not being loaded into two different tables concurrently with auto purge option.
-- File 'git_repository/e7f0f8a8-bd17-4834-adeb-5ab5bcd86192/null'
-- Row 0 starts at line 0, column
(我将在内部将其作为错误归档)
要将 csv 从 github 获取到具有模式自动检测功能的 Snowflake 表中,您需要将 csv 复制到传统阶段:
create stage files_stage;
copy files
into @files_stage
from @plotly_datasets/branches/master/1962_2006_walmart_store_openings.csv;
create or replace table mytable
using template (
select array_agg(object_construct(*))
from table(
infer_schema(
location=>'@files_stage/1962_2006_walmart_store_openings.csv',
file_format=>'my_csv_format'
)
)
);
copy into mytable
from @files_stage/1962_2006_walmart_store_openings.csv
file_format = my_csv_format
match_by_column_name = case_insensitive;
select *
from mytable;