我有一个要用于文本替换的前缀列表。每当我用相应的值替换整个匹配的文本时,我的程序都可以工作,但是如果我想保留匹配的文本的某些部分并使用分组替换其他部分,则我的程序无法工作:
prefixes = {
r"http://www.w3.org/2002/07/owl#([a-z]+)": r"owl:\1",
r"http://www.w3.org/1999/02/22-rdf-syntax-ns#([a-z]+)": r"rdf:\1",
r"http://www.w3.org/2000/01/rdf-schema#([a-z]+)": r"rdfs:\1",
r"http://schema.org/": "schema",
r"http://www.w3.org/2001/XMLSchema#([a-z]+)": r"xsd:\1",
r"http://purl.org/linked-data/sdmx#([a-z]+)": r"sdmx:\1",
r"http://www.w3.org/XML/1998/namespace": r"xml"
}
# test = "http://www.w3.org/XML/1998/namespace" # works for this
test = "http://www.w3.org/2000/01/rdf-schema#a" # Does not work!
regex = re.compile("|".join(map(re.escape, prefixes.keys())))
test = regex.sub(lambda match:prefixes[match.group(0)], test)
我想用“ rdfs:a”代替测试,但是这种方式不起作用。我应该如何更改代码才能在这种情况下工作?
您想要做的是非常复杂。话虽如此,我很想挑战...
>>> prefixes = {
... 0: (r"http://www.w3.org/2002/07/owl#([a-z]+)", r"owl:\1"),
... 2: (r"http://www.w3.org/1999/02/22-rdf-syntax-ns#([a-z]+)", r"rdf:\3"),
... 4: (r"http://www.w3.org/2000/01/rdf-schema#([a-z]+)", r"rdfs:\5"),
... 6: (r"http://schema.org/", "schema"),
... 7: (r"http://www.w3.org/2001/XMLSchema#([a-z]+)", r"xsd:\8"),
... 9: (r"http://purl.org/linked-data/sdmx#([a-z]+)", r"sdmx:\10"),
... 11: (r"http://www.w3.org/XML/1998/namespace", r"xml")
... }
>>> expr = '(' + ')|('.join(p[0] for p in prefixes.values()) + ')'
>>>
>>> regex = re.compile(expr)
>>>
>>> regex.findall(test)
[('', '', '', '', 'http://www.w3.org/2000/01/rdf-schema#a', 'a', '', '',
'', '', '', '')]
>>> regex.sub(lambda m: m.expand(prefixes[next(i for i, v in
... enumerate(m.groups())
... if v)][1]),
... test)
'rdfs:http://www.w3.org/2000/01/rdf-schema#a'
>>> regex.sub(lambda m: m.expand(prefixes[next(i for i, v in
... enumerate(m.groups())
... if v)][1]),
... test2)
'xml'
>>>