Google Colab Fair NLP ValueError:num_samples应该为正整数值,但num_samples = 0

问题描述 投票:0回答:1

我正在使用Google colab与Flair进行简单的文本分类,但是我遇到了一个我不理解的错误。这是我的代码:

data = pd.read_csv(dataset, header = None, encoding = "ISO-8859-1")
data = data[[0, 1]].rename(columns={0:"text", 1:"label"})
data.columns
data.head
data['label'] = '__label__' + data['label'].astype(str)
data.iloc[0:int(len(data)*0.8)].to_csv('train.csv', sep='\t', index = False, header = False)
data.iloc[int(len(data)*0.8):int(len(data)*0.9)].to_csv('test.csv', sep='\t', index = False, header = False)
data.iloc[int(len(data)*0.9):].to_csv('dev.csv', sep='\t', index = False, header = False);

!ls
!pwd

给我

content  drive     sample_data  train.csv     weights.txt dev.csv  loss.tsv  test.csv   training.log  zip
/content

但是当我继续时:

from flair.data_fetcher import NLPTaskDataFetcher
from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentLSTMEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer
from pathlib import Path
corpus = NLPTaskDataFetcher.load_classification_corpus(Path('/content'), test_file='test.csv', dev_file='dev.csv', train_file='train.csv')
word_embeddings = [WordEmbeddings('glove'), FlairEmbeddings('news-forward-fast'), FlairEmbeddings('news-backward-fast')]
document_embeddings = DocumentLSTMEmbeddings(word_embeddings, hidden_size=512, reproject_words=True, reproject_words_dimension=256)
classifier = TextClassifier(document_embeddings, label_dictionary=corpus.make_label_dictionary(), multi_label=False)
trainer = ModelTrainer(classifier, corpus)
trainer.train('/content', max_epochs=10)

我有错误文本:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-dc73d615edca> in <module>
      3 classifier = TextClassifier(document_embeddings, label_dictionary=corpus.make_label_dictionary(), multi_label=False)
      4 trainer = ModelTrainer(classifier, corpus)
----> 5 trainer.train('/content', max_epochs=10)

3 frames
/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py in __init__(self, data_source, replacement, num_samples)
     92         if not isinstance(self.num_samples, int) or self.num_samples <= 0:
     93             raise ValueError("num_samples should be a positive integer "
---> 94                              "value, but got num_samples={}".format(self.num_samples))
     95 
     96     @property

ValueError: num_samples should be a positive integer value, but got num_samples=0
python-3.x nlp google-colaboratory
1个回答
0
投票

遇到了类似的错误,但能够通过重新组织数据以使“标签”成为第一列来解决。

© www.soinside.com 2019 - 2024. All rights reserved.