我可以直接从 Snowflake 上的 GitHub 读取 CSV 吗?

问题描述 投票:0回答:1

我看到 Snowflake 现在支持从 GitHub 执行脚本(公共预览版),但没有示例显示实际从 .csv 读取数据。

这可能吗?

git csv github snowflake-cloud-data-platform
1个回答
0
投票

从 GitHub 读取 CSV 很容易。例如,在不需要身份验证的公共存储库中 - 我们首先需要在 Snowflake 中设置连接:

create or replace api integration git_plotly_datasets_integration
api_provider = git_https_api
api_allowed_prefixes = ('https://github.com/plotly')
--  allowed_authentication_secrets = (git_secret)
enabled = true;

create or replace git repository plotly_datasets
api_integration = git_plotly_datasets_integration
-- git_credentials = myco_git_secret
origin = 'https://github.com/plotly/datasets.git';

(我跳过了良好的安全实践并使用 accountadmin 完成了这一切)

完成 git 存储库设置后,我们可以

select from
该存储库中的任何文件:

select $1, $2, $3, $16
from @plotly_datasets/branches/master/1962_2006_walmart_store_openings.csv;

这太棒了 - 但尤其是对于 CSV,我很乐意进行一些模式自动检测。但这不起作用:


create or replace file format my_csv_format
type = csv
parse_header = true
field_optionally_enclosed_by = '"';

select *
from table(
infer_schema(
  location=>'@plotly_datasets/branches/master/1962_2006_walmart_store_openings.csv'
  , file_format=>'my_csv_format'
  )
);
-- Remote file '[...]/git_repository/e7f0f8a8-bd17-4834-adeb-5ab5bcd86192/null' was not found. There are several potential causes. The file might not exist. The required credentials may be missing or invalid. If you are running a copy command, please make sure files are not deleted when they are being loaded or files are not being loaded into two different tables concurrently with auto purge option.
--  File 'git_repository/e7f0f8a8-bd17-4834-adeb-5ab5bcd86192/null'
--  Row 0 starts at line 0, column 

(我将在内部将其作为错误归档)

要将 csv 从 github 获取到具有模式自动检测功能的 Snowflake 表中,您需要将 csv 复制到传统阶段:

create stage files_stage;

copy files
into @files_stage
from @plotly_datasets/branches/master/1962_2006_walmart_store_openings.csv;

create or replace table mytable
using template (
    select array_agg(object_construct(*))
    from table(
        infer_schema(
          location=>'@files_stage/1962_2006_walmart_store_openings.csv',
          file_format=>'my_csv_format'
        )
    )
);

copy into mytable
from @files_stage/1962_2006_walmart_store_openings.csv
file_format = my_csv_format
match_by_column_name = case_insensitive;


select *
from mytable;

© www.soinside.com 2019 - 2024. All rights reserved.