使用“数据集”包加载数据时出现“协议未知”值错误

问题描述 投票:0回答:1

我想训练一个面食分类器,目前正在预处理数据。我将其分为训练集、验证集和测试集,并使用以下代码将其保存到我的笔记本电脑上:

datasets.DatasetDict(     {         "train": train_dataset,         "valid": valid_dataset,             "test": test_dataset,     } ).save_to_disk("split_pasta_dataset")

到目前为止,一切都很好。现在,我创建了一个新的 Jupyter Notebook,我将在其中开始构建模型。但首先,我需要从笔记本电脑加载数据集。我尝试通过以下方式做到这一点:

dataset = datasets.load_from_disk('split_pasta_dataset')

我使用这种方法是因为它对于我过去从事的另一个分类项目效果很好。但是,当我现在尝试执行此操作时,它不起作用并且出现此错误:

`---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 2
1 # Loading the dataset from the laptop
----> 2 dataset = datasets.load_from_disk('split_pasta_dataset')

File /opt/homebrew/anaconda3/envs/pasta_classifier/lib/python3.11/site-packages/datasets/load.py:1892, in load_from_disk(dataset_path, fs, keep_in_memory, storage_options)
1890     return Dataset.load_from_disk(dataset_path, keep_in_memory=keep_in_memory, storage_options=storage_options)
1891 elif fs.isfile(path_join(dest_dataset_path, config.DATASETDICT_JSON_FILENAME)):
-> 1892     return DatasetDict.load_from_disk(dataset_path, keep_in_memory=keep_in_memory, storage_options=storage_options)
1893 else:
1894     raise FileNotFoundError(
1895         f"Directory {dataset_path} is neither a Dataset directory nor a DatasetDict directory."
1896     )

File /opt/homebrew/anaconda3/envs/pasta_classifier/lib/python3.11/site-packages/datasets/dataset_dict.py:1319, in DatasetDict.load_from_disk(dataset_dict_path, fs, keep_in_memory, storage_options)
1313 for k in splits:
1314     dataset_dict_split_path = (
1315         dataset_dict_path.split("://")[0] + "://" + path_join(dest_dataset_dict_path, k)
1316         if is_remote_filesystem(fs)
1317         else path_join(dest_dataset_dict_path, k)
1318     )
-> 1319     dataset_dict[k] = Dataset.load_from_disk(
1320         dataset_dict_split_path, keep_in_memory=keep_in_memory, storage_options=storage_options
1321     )
1322 return dataset_dict

File /opt/homebrew/anaconda3/envs/pasta_classifier/lib/python3.11/site-packages/datasets/arrow_dataset.py:1627, in Dataset.load_from_disk(dataset_path, fs, keep_in_memory, storage_options)
1620     warnings.warn(
1621         "'fs' was deprecated in favor of 'storage_options' in version 2.8.0 and will be removed in 3.0.0.\n"
1622         "You can remove this warning by passing 'storage_options=fs.storage_options' instead.",
1623         FutureWarning,
1624     )
1625     storage_options = fs.storage_options
-> 1627 fs_token_paths = fsspec.get_fs_token_paths(dataset_path, storage_options=storage_options)
1628 fs: fsspec.AbstractFileSystem = fs_token_paths[0]
1630 if is_remote_filesystem(fs):

File /opt/homebrew/anaconda3/envs/pasta_classifier/lib/python3.11/site-packages/fsspec/core.py:610, in get_fs_token_paths(urlpath, mode, num, name_function, storage_options, protocol, expand)
608 if protocol:
609     storage_options["protocol"] = protocol
--> 610 chain = 
un_chain(urlpath0, storage_options or {})
611 inkwargs = {}
612 # Reverse iterate the chain, creating a nested target
* structure

File /opt/homebrew/anaconda3/envs/pasta_classifier/lib/python3.11/site-packages/fsspec/core.py:325, in _un_chain(path, kwargs)
323 for bit in reversed(bits):
324     protocol = kwargs.pop("protocol", None) or split_protocol(bit)[0] or "file"
--> 325     cls = get_filesystem_class(protocol)
326     extra_kwargs = cls._get_kwargs_from_urls(bit)
327     kws = kwargs.pop(protocol, {})

File /opt/homebrew/anaconda3/envs/pasta_classifier/lib/python3.11/site-packages/fsspec/registry.py:232, in get_filesystem_class(protocol)
230 if protocol not in registry:
231     if protocol not in known_implementations:
--> 232         raise ValueError(f"Protocol not known: {protocol}")
233     bit = known_implementations[protocol]
234     try:

ValueError: Protocol not known: split_pasta_dataset`

我尝试重新启动内核并清除所有单元中的输出,并尝试再次运行单元,但出现了相同的错误。我还尝试检查我是否写了确切的文件名,它似乎是正确的。我也尝试用谷歌搜索该错误,但无法找到有关此主题的类似讨论。有人可以帮我解决这个问题吗?

python dataset protocols
1个回答
0
投票

这是 fsspec 版本的问题。

尝试一下

pip install fsspec==2023.6.0

另请阅读huggingface 上的此问题:https://github.com/huggingface/datasets/issues/6353

© www.soinside.com 2019 - 2024. All rights reserved.