使用 Great Expectations 从 Azure Data Lake 读取 CSV 时出错：TypeError: read_csv() got an unexpected keyword argument 'connect_options'

Question

我在本地使用 Great Expectations 并尝试将其连接到 Azure Data Lake。我正在使用 Pandas 简单地从数据湖中读取 CSV 文件来测试连接。

代码产生错误：

TypeError: read_csv() got an unexpected keyword argument 'connect_options'

重现代码：

import great_expectations as gx

context = gx.get_context()
datasource = context.sources.add_pandas_abs(
    name="great_expectations_azure_test",
    azure_options={"conn_str": "<CONN_STR>"}
)
data_asset = datasource.add_csv_asset(
    name="taxi_data_asset",
    batching_regex=r"data/taxi_yellow_tripdata_samples/yellow_tripdata_2019-01\.csv",
    abs_container="yellowtaxis",
    abs_name_starts_with="data/taxi_yellow_tripdata_samples/",
)
batch_request = data_asset.build_batch_request()
data_batch = data_asset.get_batch_list_from_batch_request(batch_request)

数据湖只有1个文件：

data/taxi_yellow_tripdata_samples/yellow_tripdata_2019-01.csv"

我对此进行了调试，并能够确认 GE 已成功将数据从 Azure Data Lake 下载到 Great Expectations 中的 StreamIO 缓冲区。应使用

pandas.read_csv

读取缓冲区，但由于某种原因

connect_options

-参数传递给

pandas.read_csv()

导致错误

我没有找到将 GE 连接到 Azure Data Lake 的现成示例，所以我想问一下我的配置是否有问题，或者这是 Great Expectations 的问题吗？

代码结合了 GE 文档中这些资源的片段：

如何设置 Great Expectations 以处理 Azure Blob 存储中的数据

如何使用 Pandas 连接到 Azure Blob 存储上的数据

如何从数据资产请求数据

注意：我知道我可以使用其他客户端从数据湖下载文件，并将其作为普通 CSV 传递给 Great Expectations。我刚开始使用 GE，此时我更愿意使用内置方式访问 Azure Data Lake，但如果内置方式不可行，我会进行调查。

Answer 1

这已在 Great Expectations 版本 0.16.10 中修复，现在代码按原样工作

使用 Great Expectations 从 Azure Data Lake 读取 CSV 时出错：TypeError: read_csv() got an unexpected keyword argument 'connect_options'

问题描述投票：0回答：1

1个回答

最新问题

使用 Great Expectations 从 Azure Data Lake 读取 CSV 时出错：TypeError: read_csv() got an unexpected keyword argument 'connect_options'

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1