我只想从维基百科页面中提取特定部分:
示例: 我想从维基百科文章“House”的“Parts”部分中提取文本。
https://en.wikipedia.org/wiki/House
生成的文本将是:
Many houses have several large rooms ..... sections of the home (including in more recent eras a garage).
我们可以从如下文章中获取全文:
但是如何获取特定部分的文本?
您需要纯维基文本还是解析器生成的 HTML?
下面的示例为您提供了“布局”部分(内部文章的第三部分,您也可以使用任何其他部分 ID)。
当你想检索特定部分的已解析 html 时,你应该使用 parse api: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=house&prop=text§ion=3&disabletoc=1 或者,作为沙箱外部的 API 请求: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=text§ion=3&disabletoc=1
如果您想要特定部分的 wikitext,只需使用 wikitext 属性而不是 text 属性: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=wikitext§ion=3&disabletoc=1
为了知道哪个部分有什么索引,您可以使用“sections”属性查询此信息,而不需要任何部分索引: https://en.wikipedia.org/w/api.php?action=parse&format=json&page=house&prop=sections&disabletoc=1
因此,作为仅使用 API 检索布局部分文本的完整示例,您将:
回应:
{
"parse": {
"title": "House",
"pageid": 13590,
"sections": [
{
"toclevel": 1,
"level": "2",
"line": "Etymology",
"number": "1",
"index": "1",
"fromtitle": "House",
"byteoffset": 3549,
"anchor": "Etymology"
},
{
"toclevel": 1,
"level": "2",
"line": "Elements",
"number": "2",
"index": "2",
"fromtitle": "House",
"byteoffset": 4960,
"anchor": "Elements"
},
{
"toclevel": 2,
"level": "3",
"line": "Layout",
"number": "2.1",
"index": "3",
"fromtitle": "House",
"byteoffset": 4976,
"anchor": "Layout"
},
{
"toclevel": 2,
"level": "3",
"line": "Parts",
"number": "2.2",
"index": "4",
"fromtitle": "House",
"byteoffset": 6432,
"anchor": "Parts"
},
{
"toclevel": 2,
"level": "3",
"line": "History of the interior",
"number": "2.3",
"index": "5",
"fromtitle": "House",
"byteoffset": 7539,
"anchor": "History_of_the_interior"
},
{
"toclevel": 3,
"level": "4",
"line": "Communal rooms",
"number": "2.3.1",
"index": "6",
"fromtitle": "House",
"byteoffset": 8786,
"anchor": "Communal_rooms"
},
{
"toclevel": 3,
"level": "4",
"line": "Interconnecting rooms",
"number": "2.3.2",
"index": "7",
"fromtitle": "House",
"byteoffset": 9736,
"anchor": "Interconnecting_rooms"
},
{
"toclevel": 3,
"level": "4",
"line": "Corridor",
"number": "2.3.3",
"index": "8",
"fromtitle": "House",
"byteoffset": 11126,
"anchor": "Corridor"
},
{
"toclevel": 3,
"level": "4",
"line": "Employment-free house",
"number": "2.3.4",
"index": "9",
"fromtitle": "House",
"byteoffset": 13092,
"anchor": "Employment-free_house"
},
{
"toclevel": 2,
"level": "3",
"line": "Work location, technology and doctors",
"number": "2.4",
"index": "10",
"fromtitle": "House",
"byteoffset": 15969,
"anchor": "Work_location,_technology_and_doctors"
},
{
"toclevel": 3,
"level": "4",
"line": "Technology and privacy",
"number": "2.4.1",
"index": "11",
"fromtitle": "House",
"byteoffset": 17291,
"anchor": "Technology_and_privacy"
},
{
"toclevel": 1,
"level": "2",
"line": "Construction",
"number": "3",
"index": "12",
"fromtitle": "House",
"byteoffset": 18782,
"anchor": "Construction"
},
{
"toclevel": 2,
"level": "3",
"line": "Energy efficiency",
"number": "3.1",
"index": "13",
"fromtitle": "House",
"byteoffset": 21899,
"anchor": "Energy_efficiency"
},
{
"toclevel": 2,
"level": "3",
"line": "Earthquake protection",
"number": "3.2",
"index": "14",
"fromtitle": "House",
"byteoffset": 23057,
"anchor": "Earthquake_protection"
},
{
"toclevel": 1,
"level": "2",
"line": "Found materials",
"number": "4",
"index": "15",
"fromtitle": "House",
"byteoffset": 25172,
"anchor": "Found_materials"
},
{
"toclevel": 1,
"level": "2",
"line": "Legal issues",
"number": "5",
"index": "16",
"fromtitle": "House",
"byteoffset": 26235,
"anchor": "Legal_issues"
},
{
"toclevel": 2,
"level": "3",
"line": "United Kingdom",
"number": "5.1",
"index": "17",
"fromtitle": "House",
"byteoffset": 26644,
"anchor": "United_Kingdom"
},
{
"toclevel": 1,
"level": "2",
"line": "Identifying houses",
"number": "6",
"index": "18",
"fromtitle": "House",
"byteoffset": 26922,
"anchor": "Identifying_houses"
},
{
"toclevel": 1,
"level": "2",
"line": "Animal houses",
"number": "7",
"index": "19",
"fromtitle": "House",
"byteoffset": 27397,
"anchor": "Animal_houses"
},
{
"toclevel": 1,
"level": "2",
"line": "Houses and symbolism",
"number": "8",
"index": "20",
"fromtitle": "House",
"byteoffset": 27826,
"anchor": "Houses_and_symbolism"
},
{
"toclevel": 1,
"level": "2",
"line": "See also",
"number": "9",
"index": "21",
"fromtitle": "House",
"byteoffset": 28620,
"anchor": "See_also"
},
{
"toclevel": 1,
"level": "2",
"line": "References",
"number": "10",
"index": "22",
"fromtitle": "House",
"byteoffset": 29690,
"anchor": "References"
},
{
"toclevel": 1,
"level": "2",
"line": "External links",
"number": "11",
"index": "23",
"fromtitle": "House",
"byteoffset": 29720,
"anchor": "External_links"
}
]
}
}
回应:
{
"parse": {
"title": "House",
"pageid": 13590,
"wikitext": {
"*": "=== Layout ===\n[[File:Gingerbread House Essex CT.jpg|thumb|Example of an early [[Victorian architecture|Victorian]] \"Gingerbread House\" in [[Connecticut]], United States, built in 1855]]\n\nIdeally, [[architect]]s of houses design [[room]]s to meet the needs of the people who will live in the house. [[Feng shui]], originally a [[China|Chinese]] method of moving houses according to such factors as rain and micro-climates, has recently expanded its scope to address the design of interior spaces, with a view to promoting harmonious effects on the people living inside the house, although no actual effect has ever been demonstrated. Feng shui can also mean the \"aura\" in or around a dwelling, making it comparable to the [[real estate|real-estate]] sales concept of \"indoor-outdoor flow\".\n\nThe [[square footage]] of a house in the United States reports the area of \"living space\", excluding the garage and other non-living spaces. The \"square metres\" figure of a house in Europe <!-- including Malta ? --> reports the area of the walls enclosing the home, and thus includes any attached garage and non-living spaces.<ref>{{Cite book|title=Land Management: Challenges and Strategies (First Edition)|last=Iyyer|first=Chaitanya|publisher=Global India Publications Pvt Ltd|year=2009|isbn=978-9380228488|location=|pages=}}</ref>{{Citation needed|date=February 2007}} The number of floors or levels making up the house can affect the square footage of a home."
}
}
}
背景: 页面中的部分的想法尚未集成到修订中,修订“只是”整个页面的内容和附加元数据(例如在多个其他插槽中),但部分是内容的一部分(这是仅修订版中的一个位置)。这就是为什么当使用修订查询 API 时,您只能获取整个文本。需要解析页面才能知道各个部分是什么,因为部分是维基文本的概念,因此涉及解析器。