加载huggingface数据集后
download_config = DownloadConfig()
dataset = load_dataset (hf_dataset_name, download_config=download_config)
dataset_split = dataset ['train']
假设如果在
None
列的行中找到""
或"answer"
,我该如何删除该行?
HuggingFace 数据集环绕 PyArrow 数据集可以过滤:
import pyarray.dataset as ds
mask = ds.field('answer') is not None
filtered = dataset_split.data.filter(mask).to_table()