具有spacy / Wikipedia的实体链接

问题描述 投票:1回答:1

我正在尝试遵循以下示例:https://github.com/explosion/spaCy/tree/master/bin/wiki_entity_linking。但是我只是对培训数据中的内容感到困惑。这是维基百科的全部内容吗?假设我只需要一些实体的训练数据。例如,E1,E2和E3。该示例是否只允许我指定一些我想消除歧义的实体?

python spacy entity-linking
1个回答
2
投票

如果运行https://github.com/explosion/spaCy/tree/master/bin/wiki_entity_linking中提供的脚本,它们确实会从Wikipedia创建训练数据集,可用于训练通用模型。

[如果您希望训练更有限的模型,当然可以输入自己的训练集。可以在以下位置找到一个玩具示例:https://github.com/explosion/spaCy/blob/master/examples/training/train_entity_linker.py,您可以在其中推断训练数据的格式:

def sample_train_data():
    train_data = []

    # Q2146908 (Russ Cochran): American golfer
    # Q7381115 (Russ Cochran): publisher

    text_1 = "Russ Cochran his reprints include EC Comics."
    dict_1 = {(0, 12): {"Q7381115": 1.0, "Q2146908": 0.0}}
    train_data.append((text_1, {"links": dict_1}))

    text_2 = "Russ Cochran has been publishing comic art."
    dict_2 = {(0, 12): {"Q7381115": 1.0, "Q2146908": 0.0}}
    train_data.append((text_2, {"links": dict_2}))

    text_3 = "Russ Cochran captured his first major title with his son as caddie."
    dict_3 = {(0, 12): {"Q7381115": 0.0, "Q2146908": 1.0}}
    train_data.append((text_3, {"links": dict_3}))

    text_4 = "Russ Cochran was a member of University of Kentucky's golf team."
    dict_4 = {(0, 12): {"Q7381115": 0.0, "Q2146908": 1.0}}
    train_data.append((text_4, {"links": dict_4}))

    return train_data

train_entity_linker.py中的此示例向您展示了模型如何从发布者(Q2146908)消除高尔夫球手(Q7381115)的“ Russ Cochran”的歧义。请注意,这只是一个玩具示例:实际的应用程序将需要具有准确的先验频率的较大知识库(如通过运行Wikipedia / Wikidata脚本可以获得的),当然,您需要更多的句子和词汇形式来期望机器学习模型,可以找到适当的线索并有效地泛化到看不见的文本。

© www.soinside.com 2019 - 2024. All rights reserved.