从 json 文件创建 BIO 格式的句子 - 训练 NER 模型

Question

我有一个 JSON 文件，将用作 NER 模型的数据。它有一个句子和该特定句子中的相关实体。我想创建一个函数，根据实体为每个句子生成一个 BIO 标记的字符串

例如 JSON 文件中的以下对象

{
      "request": "I want to fly to New York on the 13.3",
      "entities": [
        {"start": 16, "end": 23, "text": "New York", "category": "DESTINATION"},
        {"start": 32, "end": 35, "text": "13.3", "category": "DATE"}
      ]
}

“我想 13.3 飞往纽约” 相应的BIO标签将是 “O O O O O B-目的地 I-目的地 O O B-日期” 其中 B 类别是该类别的开头 I 类代表内部，O 代表外部。

我正在寻找一个 Python 代码来迭代 JSON 文件中的每个对象，从而为其生成 BIO 标签。

如有必要，更改 JSON 格式

Answer 1

这只是上述任务的快速实现，还有很多优化是可能的，稍后可以探索，但乍一看这里是函数：

def BIO_converter(r, entities):
    to_replace = {} # needed to maintain all the NER to be replaced
    for i in entities:
        sub = r[i['start']+1:i['end']+2].split(' ') # 1 indexed values in entities
        if len(sub) > 1:
            vals = [f"B-{i['category']}"] + ([f"I-{i['category']}"] * (len(sub)-1))
        else:
            vals = [f"B-{i['category']}"]

        to_replace = to_replace | dict(zip(sub,vals))

    r = r.split(' ')
    r = [to_replace[i] if i in to_replace else 'O' for i in r ]
    return ' '.join(r)

js = {
        "request": "I want to fly to New York on the 13.3",
        "entities": [
          {"start": 16, "end": 23, "text": "New York", "category": "DESTINATION"},
          {"start": 32, "end": 35, "text": "13.3", "category": "DATE"}
        ]
      }
BIO_converter(js['request'], js['entities'])

应输出：

O O O O O B-DESTINATION I-DESTINATION O O B-DATE

从 json 文件创建 BIO 格式的句子 - 训练 NER 模型

问题描述投票：0回答：1

1个回答

最新问题

从 json 文件创建 BIO 格式的句子 - 训练 NER 模型

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1