我无法从 Huggingface 下载数据集

问题描述 投票:0回答:1
from datasets import load_dataset

dataset = load_dataset(path='seamew/THUCNewsTitle', split='train')

我的网络没问题,但控制台总是显示:

Traceback (most recent call last):
  File "/Users/yuanyang_lee/Desktop/HuggingFace/demo2.py", line 11, in <module>
    dataset = load_dataset(path='seamew/THUCNewsTitle', split='train')
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/load.py", line 2153, in load_dataset
    builder_instance.download_and_prepare(
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 954, in download_and_prepare
    self._download_and_prepare(
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 1717, in _download_and_prepare
    super()._download_and_prepare(
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 1027, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yuanyang_lee/.cache/huggingface/modules/datasets_modules/datasets/seamew--THUCNewsTitle/b3df30999854cbe65ae45110e895b2fa88c14975f2185c1f43d9b7ca85b5f679/THUCNewsTitle.py", line 30, in _split_generators
    train_path = dl_manager.download_and_extract(_TRAIN_DOWNLOAD_URL)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 565, in download_and_extract
    return self.extract(self.download(url_or_urls))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 428, in download
    downloaded_path_or_paths = map_nested(
                               ^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 456, in map_nested
    return function(data_struct)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 454, in _download
    return cached_path(url_or_filename, download_config=download_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 182, in cached_path
    output_path = get_from_cache(
                  ^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 596, in get_from_cache
    raise FileNotFoundError(f"Couldn't find file at {url}")
FileNotFoundError: Couldn't find file at https://drive.google.com/u/0/uc?id=1xnicHROZsgtxKodf8sZiRiXoWJ7fpQt2&export=download

我尝试更改 Wi-Fi 连接,但不起作用。 我可以在浏览器上成功打开Huggingface。

huggingface-datasets
1个回答
0
投票

Huggingface 数据集可以包含在您尝试加载数据集时运行的自定义代码。例如,您提供的数据集的代码是here。这段代码的作用似乎是尝试从谷歌驱动器链接下载文件,但这是行不通的。这可能是由多种原因引起的,例如身份验证或文件被删除。

您可以尝试联系该数据集的维护者。联系他们的一种可能方法是在 Huggingface 数据集项目页面上展开讨论。 (在上面提供的链接中,转到社区选项卡并按

New discussion

值得注意的是,某些版本的数据集似乎存在于存储库中

.arrow
文件中。您可以尝试从那里下载它并编写自己的代码来加载数据集。

© www.soinside.com 2019 - 2024. All rights reserved.