调用 Hugging Face load_dataset("glue", "mrpc") 时出错

问题描述 投票:0回答:1

我正在遵循huggingface教程这里,它给了我一个奇怪的错误。当我运行以下代码时:

from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding
from torch.utils.data import DataLoader

raw_datasets = load_dataset("glue", "mrpc")

这是我所看到的:

Downloading data: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 151k/151k [00:00<00:00, 3.35MB/s]
Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.1k/11.1k [00:00<00:00, 6.63MB/s]
Downloading data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:32<00:00, 10.89s/it]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 127.92it/s]
Traceback (most recent call last):
  File "/Users/ameenizhac/Downloads/transformers_playground.py", line 5, in <module>
    raw_datasets = load_dataset("glue", "mrpc")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/load.py", line 1782, in load_dataset
    builder_instance.download_and_prepare(
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/builder.py", line 872, in download_and_prepare
    self._download_and_prepare(
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/builder.py", line 967, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/builder.py", line 1709, in _prepare_split
    split_info = self.info.splits[split_generator.name]
                 ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/splits.py", line 530, in __getitem__
    instructions = make_file_instructions(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/arrow_reader.py", line 112, in make_file_instructions
    name2filenames = {
                     ^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/arrow_reader.py", line 113, in <dictcomp>
    info.name: filenames_for_dataset_split(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/naming.py", line 70, in filenames_for_dataset_split
    prefix = filename_prefix_for_split(dataset_name, split)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/datasets/naming.py", line 54, in filename_prefix_for_split
    if os.path.basename(name) != name:
       ^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen posixpath>", line 142, in basename
TypeError: expected str, bytes or os.PathLike object, not NoneType

我不知道从哪里开始,因为我不明白错误从何而来。

python nlp huggingface huggingface-datasets
1个回答
1
投票

我在我的电脑和 Google Colab 上尝试过。奇怪的是,在 Colab 上它可以工作,但在我的 PC 上却不行。

无论如何,可能的解决方法如下:

raw_datasets = load_dataset("SetFit/mrpc")

如果打印它,您会看到数据集是相同的,只是名称不同:

DatasetDict({
    train: Dataset({
        features: ['text1', 'text2', 'label', 'idx', 'label_text'],
        num_rows: 3668
    })
    test: Dataset({
        features: ['text1', 'text2', 'label', 'idx', 'label_text'],
        num_rows: 1725
    })
    validation: Dataset({
        features: ['text1', 'text2', 'label', 'idx', 'label_text'],
        num_rows: 408
    })
})
© www.soinside.com 2019 - 2024. All rights reserved.