CONLL 2003 BIO格式的NER的数据准备

问题描述 投票:1回答:1

要通过自定义实体训练我自己的NER,我需要使用-https://github.com/yongyuwen/sequence-tagging-ner中指定的CONLL-2003格式来校正我的数据集。

我将如何将文本文档(.txt)文件转换为指定的CONLL-U格式-如[Word POS CHUNK NER]。

注意:对于给定的文本文档,我已经具有自定义NER标签。

样本数据(training_data.txt):

(Sample 1)
This Agreement of Work is made pursuant to the Global Developer Master Services Agreement effective as  of May 24, 2018, as amended on March 28, 2016, between MA[CUSTOM_ENTITY], lnc.[CUSTOM_ENTITY] whose registered office or principal place of  business is at 520 Madison Avenue, Ahmedabad, India, whose registered  office or principal place of business is at Building A, Atlantis de la,  Switzerland, collectively and ABC[CUSTOM_ENTITY] LLC[CUSTOM_ENTITY] a wholly owned subsidiary of  Amazon Services Ltd and having its registered office at 113 Red Avenue, 10th Floor, New York, NY 13027.

(Sample 2)
This Agreement of Work is subject to the terms and conditions of the Master Agreement for Technology  Consulting Services between Vignesh[CUSTOM_ENTITY] Services[CUSTOM_ENTITY] Limited[CUSTOM_ENTITY] and ABD[CUSTOM_ENTITY] LLC[CUSTOM_ENTITY], an  entity wholly owned by ABC[CUSTOM_ENTITY] Holdings[CUSTOM_ENTITY] LLC[CUSTOM_ENTITY].

(Sample 3)
This Agreement of Work dated October 22, 2013 between Google[CUSTOM_ENTITY] Services[CUSTOM_ENTITY] Limited[CUSTOM_ENTITY]  and Avaya[CUSTOM_ENTITY] Communications[CUSTOM_ENTITY] Management[CUSTOM_ENTITY], LLC[CUSTOM_ENTITY] and any of its operating subsidiaries and  affiliates which receive Services from Vendor incorporates and is governed by the terms and  conditions contained in the Master Services Agreement Services, by and between Avaya and Vendor.

[CUSTOM_ENTITY]是要使用NER训练的新实体的标签。

python nlp lstm named-entity-recognition ner
1个回答
0
投票

@@ Vignesh Prajapati您是否成功完成了此任务,如果可以,可以向我解释吗?谢谢

© www.soinside.com 2019 - 2024. All rights reserved.