检索维基百科页面的另一种语言

问题描述 投票:0回答:2

任务:我们有维基百科英文页面,需要检索俄语的相同页面地址。

我知道语义网解决方案 - 使用简单的查询到 DbPedia,但我很好奇是否有传统的解决方案。我在 semanticoverflow.com 中提出了同样的问题,其中 Toby Inkster 建议解析 http://en.wikipedia.org/wiki/Colugo?action=raw 结果(底部还有其他语言链接),但是这种方式效率太低了。还有其他方法吗?或者 DbPedia 是唯一真正的选择?

wikipedia wikipedia-api mediawiki-api
2个回答
11
投票

维基百科有一个广泛的API,它可以提供语言链接信息等。在这种特殊情况下,您正在寻找

api.php?action=query&prop=langlinks&titles=...
参见这里的例子


0
投票

有时,在查找页面的日语 (ja) 标题等效项时,https://en.wikipedia.org/wiki/Aframomum_corrorima

import json
import requests

site = "enwiki"  # For English queries, set `&sites=enwiki`
page = "Aframomum_corrorima"
trg_lang = "ja"

url = f"https://www.wikidata.org/w/api.php?action=wbgetentities&sites={site}&titles={page}&languages={trg_lang}&format=json"

result = json.loads(requests.get(url).content.decode('utf8'))

translations = [result['entities'][k]['labels'] for k in result['entities']]
print(translations)

[出]:

[{'ja': {'language': 'ja', 'value': 'コロリマ'}}]

然后你会发现 https://ja.wikipedia.org/w/index.php?title=コロ里マ 尚未编写,但 wikidata API 能够找到正确的实体翻译。

要提取所有可能的链接,请执行以下操作:

url = f"https://www.wikidata.org/w/api.php?action=wbgetentities&sites={site}&titles={page}&prop=langlinks&format=json"

result = json.loads(requests.get(url).content.decode('utf8'))

links = [result['entities'][e]['sitelinks'] for e in result['entities'].keys()]

print(json.dumps(links))

[出]:

[
    {
        "amwiki": {
            "site": "amwiki",
            "title": "\\u12ae\\u1228\\u122a\\u121b",
            "badges": []
        },
        "cebwiki": {
            "site": "cebwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "commonswiki": {
            "site": "commonswiki",
            "title": "Category:Aframomum corrorima",
            "badges": []
        },
        "elwiki": {
            "site": "elwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "enwiki": {
            "site": "enwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "eswiki": {
            "site": "eswiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "frwiki": {
            "site": "frwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "kowiki": {
            "site": "kowiki",
            "title": "\\ucf54\\ub7ec\\ub9ac\\ub9c8",
            "badges": []
        },
        "lawiki": {
            "site": "lawiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "specieswiki": {
            "site": "specieswiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "svwiki": {
            "site": "svwiki",
            "title": "Korarima",
            "badges": []
        },
        "ukwiki": {
            "site": "ukwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "viwiki": {
            "site": "viwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "warwiki": {
            "site": "warwiki",
            "title": "Aframomum corrorima",
            "badges": []
        }
    }
]
© www.soinside.com 2019 - 2024. All rights reserved.