如何使用TokenSequencePattern

问题描述 投票:0回答:1

我刚开始使用CoreNLP的TokenSequencePattern,我无法让简单的匹配工作。我试图做的就是匹配输入文本中的标记。下面的代码执行没有错误,但不匹配任何内容。但是,如果你将匹配表达式改为[],那么它匹配两个句子。

     Properties props = new Properties();
     props.put("annotators", "tokenize, ssplit, parse");
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
     Annotation document = new Annotation("This is sent 1. And here is sent 2");
     pipeline.annotate(document);
     List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

     Env env = TokenSequencePattern.getNewEnv();
     env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
     env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);

     TokenSequencePattern pattern = TokenSequencePattern.compile(env,"[ { word:\"sent\" } ]");
     TokenSequenceMatcher matcher = pattern.getMatcher(sentences);

     while ( matcher.find() ) {
        System.out.println( matcher.group() );
    }

谢谢!

regex stanford-nlp
1个回答
-1
投票
List<CoreLabel> tokens = 
document.get(CoreAnnotations.TokensAnnotation.class);
TokenSequencePattern pattern= TokenSequencePattern.compile("[ { 
word:\"sent\" } ]");
TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
while (matcher.find())
{
String matchedString = matcher.group();
List<CoreMap> matchedTokens = matcher.groupNodes();
System.out.println(matchedString + " " + matchedTokens);
}
© www.soinside.com 2019 - 2024. All rights reserved.