Stanford jawanslup regisonsoneroter apostarrope

Question

RegexNERAnnotator似乎无法识别撇号。

    Properties properties = new Properties();
    properties.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions,regexner,tokensregex");
    properties.put("regexner.mapping", "regexfile.txt");
    properties.put("regexner.ignorecase", "true");

    StanfordCoreNLP pipeline = new StanfordCoreNLP(properties);

在regexfile.txt中，

Bachelor of (Arts|Laws|Science|Engineering) DEGREE
Lalor   LOCATION    PERSON
Labor   ORGANIZATION

它能够识别文学学士学位。不幸的是，在我改为之后，

Bachelor's of (Arts|Laws|Science|Engineering)   DEGREE
Lalor   LOCATION    PERSON
Labor   ORGANIZATION

它无法将学士学位确定为学位。

任何帮助将不胜感激。提前致谢。 :)

Answer 1

RegexNERAnnotator需要使用tokenizer才能工作。

考虑一个包含短语“艺术学士”的句子。标记化过程将单词Bachelor与撇号分开，创建两个不同的标记。

在制表符分隔文件regexfile.txt中，空格表示新标记。这意味着您的自定义规则只会匹配一个完全是“Bachelor's”字样的标记。由于令牌化器，这不会发生。

编写规则，其中您要匹配的每个标记用空格分隔，一切都会起作用。

Bachelor 's of (Arts|Laws|Science|Engineering)   DEGREE
Lil ' Jon    RAPPER

Stanford jawanslup regisonsoneroter apostarrope

问题描述投票：0回答：1

1个回答

最新问题

Stanford jawanslup regisonsoneroter apostarrope

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1