将 spacy.tokens.span.Span 插入 pandas 数据帧时出错

问题描述 投票:0回答:1

使用 scispacy,尝试使用赫斯特模式功能,该功能返回 spacy.tokens.span.Span 对象。当尝试将结果放入数据域时,我收到错误,对象被视为多个单词,而不是单个对象。

按照示例 -

import spacy
from scispacy.hyponym_detector import HyponymDetector

nlp = spacy.load("en_core_sci_sm")
nlp.add_pipe("hyponym_detector", last=True, config={"extended": False})

doc = nlp("Keystone plant species such as fig trees are good for the soil.")

print(doc._.hearst_patterns)
>>> [('such_as', Keystone plant species, fig trees)]
print(type(doc_hp[0][1]))
>>> <class 'spacy.tokens.span.Span'>

doc_hp = doc._.hearst_patterns
dict = {
    "hp_connector": doc_hp[0][0],
    "hp_entity_1":doc_hp[0][1],
    "hp_entity_2":doc_hp[0][2],
}

df = pd.DataFrame.from_dict(dict)

抛出错误:

Traceback (most recent call last):
  File "extract_hearst_patterns.py", line 42, in <module>
    df = pd.DataFrame.from_dict(dict)
  File "/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 1760, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 709, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/venv/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 481, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/venv/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 115, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/venv/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 655, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length
python dataframe nlp spacy spacy-3
1个回答
0
投票

这最终对我有用 -

doc_hp = doc._.hearst_patterns
for pattern in doc_hp:
    patten_dict = get_pattern_dict(full_sent, pattern)
    patten_dict = {
        "hp object": [patten],
        "hp_connector": str(patten[0]),
        "hp_entity_1": patten[1].text,
        "hp_entity_2": patten[2].text,
    }
    list_of_pattern_dicts.append(patten_dict)
df = pd.DataFrame.from_dict(list_of_pattern_dicts)
© www.soinside.com 2019 - 2024. All rights reserved.