任务:我们有维基百科英文页面,需要检索俄语的相同页面地址。
我知道语义网解决方案 - 使用简单的查询到 DbPedia,但我很好奇是否有传统的解决方案。我在 semanticoverflow.com 中提出了同样的问题,其中 Toby Inkster 建议解析 http://en.wikipedia.org/wiki/Colugo?action=raw 结果(底部还有其他语言链接),但是这种方式效率太低了。还有其他方法吗?或者 DbPedia 是唯一真正的选择?
有时,在查找页面的日语 (ja) 标题等效项时,https://en.wikipedia.org/wiki/Aframomum_corrorima
import json
import requests
site = "enwiki" # For English queries, set `&sites=enwiki`
page = "Aframomum_corrorima"
trg_lang = "ja"
url = f"https://www.wikidata.org/w/api.php?action=wbgetentities&sites={site}&titles={page}&languages={trg_lang}&format=json"
result = json.loads(requests.get(url).content.decode('utf8'))
translations = [result['entities'][k]['labels'] for k in result['entities']]
print(translations)
[出]:
[{'ja': {'language': 'ja', 'value': 'コロリマ'}}]
然后你会发现 https://ja.wikipedia.org/w/index.php?title=コロ里マ 尚未编写,但 wikidata API 能够找到正确的实体翻译。
要提取所有可能的链接,请执行以下操作:
url = f"https://www.wikidata.org/w/api.php?action=wbgetentities&sites={site}&titles={page}&prop=langlinks&format=json"
result = json.loads(requests.get(url).content.decode('utf8'))
links = [result['entities'][e]['sitelinks'] for e in result['entities'].keys()]
print(json.dumps(links))
[出]:
[
{
"amwiki": {
"site": "amwiki",
"title": "\\u12ae\\u1228\\u122a\\u121b",
"badges": []
},
"cebwiki": {
"site": "cebwiki",
"title": "Aframomum corrorima",
"badges": []
},
"commonswiki": {
"site": "commonswiki",
"title": "Category:Aframomum corrorima",
"badges": []
},
"elwiki": {
"site": "elwiki",
"title": "Aframomum corrorima",
"badges": []
},
"enwiki": {
"site": "enwiki",
"title": "Aframomum corrorima",
"badges": []
},
"eswiki": {
"site": "eswiki",
"title": "Aframomum corrorima",
"badges": []
},
"frwiki": {
"site": "frwiki",
"title": "Aframomum corrorima",
"badges": []
},
"kowiki": {
"site": "kowiki",
"title": "\\ucf54\\ub7ec\\ub9ac\\ub9c8",
"badges": []
},
"lawiki": {
"site": "lawiki",
"title": "Aframomum corrorima",
"badges": []
},
"specieswiki": {
"site": "specieswiki",
"title": "Aframomum corrorima",
"badges": []
},
"svwiki": {
"site": "svwiki",
"title": "Korarima",
"badges": []
},
"ukwiki": {
"site": "ukwiki",
"title": "Aframomum corrorima",
"badges": []
},
"viwiki": {
"site": "viwiki",
"title": "Aframomum corrorima",
"badges": []
},
"warwiki": {
"site": "warwiki",
"title": "Aframomum corrorima",
"badges": []
}
}
]