如何从维基词典下载单词类别?

问题描述 投票:0回答:2

我想从维基词典下载所有可数名词(类别:英语可数名词),
我在 Index of /enwiktionary/latest/ 上尝试了一些语料库,但看起来很难提取我想要的类别。谁能告诉我应该使用哪一个以及如何提取特定类别的单词列表?或者有没有其他方法可以做到这一点,比如使用API?

mediawiki-api wiktionary
2个回答
6
投票

类别成员APIhttps://en.wiktionary.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:English_countable_nouns&cmprop=title给出:

{
"warnings": {
    "query": {
        "*": "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."
    }
},
"query-continue": {
    "categorymembers": {
        "cmcontinue": "page|302d342d30|474610"
    }
},
"query": {
    "categorymembers": [
        {
            "ns": 0,
            "title": "$100 hamburger"
        },
        {
            "ns": 0,
            "title": "%ile"
        },
        {
            "ns": 0,
            "title": "&lit"
        },
        {
            "ns": 0,
            "title": ".com"
        },
        {
            "ns": 0,
            "title": "/b/tard"
        },
        {
            "ns": 0,
            "title": "0"
        },
        {
            "ns": 0,
            "title": "0-10-0"
        },
        {
            "ns": 0,
            "title": "0-10-2"
        },
        {
            "ns": 0,
            "title": "0-12-0"
        },
        {
            "ns": 0,
            "title": "0-2-2"
        }
    ]
}

}


0
投票

为了扩展 Nemo 的答案,这里有一个用于循环 API 的 bash 脚本。它有点慢,但对于较小的类别应该可以正常工作。它需要安装

jq
,并将单词列表输出到标准输出。

#!/usr/bin/env bash
# Download all words in a category from wiktionary
# Usage: ./download-wiktionary-category.sh "English_contractions" > contractions.txt

set -euo pipefail
IFS=$'\n\t'

download() {
    local category="$1"
    local continue="${2:-}"
    resp="$(curl --silent "https://en.wiktionary.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:${category}&cmprop=title&cmlimit=max&cmcontinue=${continue}&format=json")"

    echo "$resp" | jq -re '.query.categorymembers[] | select (.ns == 0) | .title'

    next="$(echo "$resp" | jq -r '.continue.cmcontinue')"
    if [[ $next != null ]]; then
        download "$category" "$next"
    fi
}
download "$1"
© www.soinside.com 2019 - 2024. All rights reserved.