我正在尝试使用斯坦福NLP解析医学研究报告。我可以获得除第一个或根节点之外的所有节点的GrammaticalRelation。我如何得到这个价值。
我编写了一个java程序,它通过获取依赖图来解析报告,并且可以获得除根节点之外的所有节点的子对。
public void DocAnnotationParse(String Input_text) {
Annotation document = new Annotation(Input_text);
Properties props = new Properties();
//props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
props.setProperty("annotators", "tokenize,ssplit,pos,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
int sentNum = 0;
Map<String, Map<String, Map<String,IndexedWord>>> sentMap = new LinkedHashMap<>(); // A map contains maps of each sentence
for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
SemanticGraph dependencyParse = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
IndexedWord firstVertex = dependencyParse.getFirstRoot();
Map<String, Map<String,IndexedWord>> outterMap = new LinkedHashMap<>();
RecursiveChild(outterMap, dependencyParse, firstVertex, 0);
sentMap.put(Integer.toString(++sentNum), outterMap);
logger.debug("outtermap: "+outterMap);
}
logger.debug("all sentMaps: "+sentMap);
PrettyPrintBySentence(sentMap);
}
public void RecursiveChild(Map<String, Map<String, IndexedWord>> outterMap,
SemanticGraph dependencyParse,
IndexedWord vertex, int hierLevel) {
Map<String, IndexedWord> pairMap = new LinkedHashMap<>();
pairMap.put("Root", vertex);
List<IndexedWord>indxwdsL = dependencyParse.getChildList(vertex);
List<Pair<GrammaticalRelation,IndexedWord>>childPairs = dependencyParse.childPairs(vertex);
List<IndexedWord> nxtLevalAL = new ArrayList<>();
if(!indxwdsL.isEmpty()) {
++hierLevel;
for(Pair<GrammaticalRelation, IndexedWord> aPair : childPairs) { //at level hierLevel x
logger.debug(aPair);
String grammRel = aPair.first.toString(); //Gramatic Relation
IndexedWord indxwd = aPair.second;
pairMap.put(grammRel, indxwd);
List<Pair<GrammaticalRelation,IndexedWord>>childPairs2 = dependencyParse.childPairs(indxwd);
if(!childPairs2.isEmpty()) {
nxtLevalAL.add(indxwd);
}
}
}
String level = Integer.toString(hierLevel);
outterMap.put(level, pairMap);
//Go to each lower level
for(IndexedWord nxtIwd : nxtLevalAL) {
RecursiveChild(outterMap, dependencyParse, nxtIwd, hierLevel);
}
}
根顶点的childPair不包含我想要的语法关系。查看依赖关系图没有值,只有字符串根。如何获取该节点的语法关系。例如,简单的句子“我喜欢炸薯条”。给出图:
-> love/VBP (root)
-> I/PRP (nsubj)
-> fries/NNS (dobj)
-> French/JJ (amod)
-> ./. (punct)
嗨,我不是一个语言学家,但我的理解是,在ROOT
之外只有一个SemanticGraph
节点,并且root
边缘指向从句子中的词根到一个单词。
所以在你的例子中,ROOT
节点与love
关系附加到单词root
。
如果你看一下SemanticGraph的代码,它会明确指出:
* The root is not at present represented as a vertex in the graph.
* At present you need to get a root/roots
* from the separate roots variable and to know about it.
您可以使用getRoots()
方法访问根列表(我猜可以假设不止一个?)。但我认为这意味着root
边缘从ROOT
节点流入这些词。
如果您希望实际的Java对象代表String而不是String,那么edu.stanford.nlp.trees.GrammaticalRelation.ROOT
代表“伪造的ROOT节点”和根之间的这种关系。
/**
* The "root" grammatical relation between a faked "ROOT" node, and the root of the sentence.
*/
public static final GrammaticalRelation ROOT =
new GrammaticalRelation(Language.Any, "root", "root", null);