如何在php中使用api提取Wikipedia信息框[重复]

问题描述 投票:-1回答:1

我想使用PHP中的Wikipedia API提取Wikipedia信息框,但我不知道正确的方法是什么。

这是我尝试过的代码。

<?php
  $url = "http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=YouTube&rvsection=0&rvparse";
  $data = json_decode(file_get_contents($url), true);
  $data = current($data['query']['pages']);
  $regex = '#<\s*?table\b[^>]*>(.*)</table\b[^>]*>#s';
  $code = preg_match($regex, $data["revisions"][0]['*'], $matches);
  echo($matches[0]);
 ?>

此代码给出这样的输出image

但是我想删除多余的内容并显示为image

我也尝试过此URL

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0

此URL有效,但给出这样的输出

{
"batchcomplete":"","warnings":
{"main":
     {
     "*":"Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce
 for notice of API deprecations and breaking changes. 
     Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application."},
     "revisions":
     {
     "*":"Because \"rvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future
 the new format will always be used."}},
     "query":
     {
     "pages":
     {
     "29314104":{
     "pageid":29314104,"ns":0,"title":"Scary Monsters and Nice Sprites","revisions":
     [
     {
     "contentformat":"text/x-wiki","contentmodel":"wikitext","*":"{{about|the
 EP|the title song|Scary Monsters and Nice Sprites (song)}}\n{{Infobox
 album\n| name       = Scary Monsters and Nice Sprites\n| type       =
 EP\n| artist     = [[Skrillex]]\n| cover      = Skrillex scary
 monsters.jpg\n| alt        =\n| released   = October 22, 2010\n|
 recorded   = 2010\n| studio     =\n| genre      = [[Dubstep]],
 [[electro (music)|electro]]\n| length     = 44:02\n| label      =
 {{hlist|[[Big Beat Records (Atlantic Records subsidiary)|Big
 Beat]]|[[mau5trap]]}}\n| producer   = {{hlist|Bare Noize|[[Foreign
 Beggars]]|[[Noisia]]|[[Skrillex]]|[[Zedd (producer)|Zedd]]}}\n|
 prev_title = [[My Name Is Skrillex]]\n| prev_year  = 2010\n|
 next_title = [[More Monsters and Sprites]]\n| next_year  = 2011\n|
 misc       = {{Singles\n | name        = Scary Monsters and Nice
 Sprites\n | type        = EP\n | single1     = [[Scary Monsters and
 Nice Sprites (song)|Scary Monsters and Nice Sprites]]\n | single1date
 = October 22, 2010\n | single2     = Rock n' Roll (Will Take You to the Mountain)\n | single2date = June 20, 2011\n}}\n}}\n\n'''''Scary
 Monsters and Nice Sprites''''' is the second [[extended play]] (EP) by
 American [[electronic music]] producer [[Skrillex]]. It was released
 exclusively through [[Beatport]] on October 22, 2010 through
 [[mau5trap]] and [[Big Beat Records (Atlantic Records subsidiary)|Big
 Beat Records]], while being released on December 20 for [[Music
 download|digital download]] via other online retailers and on March 1,
 2011 as a physical release. It was recorded in 2010 at Skrillex's
 apartment using a laptop.<ref name=\"neontapedeck\" / The EP features
 guest contributions from Penny, [[Foreign Beggars]] and Bare Noize as
 well as remixes done by [[Noisia]], [[Zedd (musician)|Zedd]] and Bare
 Noize. It won two Grammys at the 54th Annual Grammy Awards: one for
 Best Dance Recording, and another for Best Dance/Electronica
 Album.\n\nThe EP received generally positive reviews from music
 critics and became a moderate commercial success, reaching number 49
 on the ''[[Billboard (magazine)|Billboard]]'' [[Billboard
 200|200]],<ref
 name=bb200{{BillboardURLbyName|artist=skrillex|chart=Billboard
 200}}</ref while also topping the ''Billboard'' [[Top
 Heatseekers|Heatseekers Albums]] chart and reaching number 28 in
 Australia.<ref name=\"australian-charts.com\"{{cite web
 |url=http://australian-charts.com/showinterpret.asp?interpret=Skrillex
 |title=Discography Skrillex |website=australian-charts.com
 |accessdate=2013-07-30}}</ref As of November 2011, the EP was
 certified Gold in Canada with sales exceeding 40,000
 copies.<refhttp://www.musiccanada.com/GPSearchResult.aspx?st=&ica=False&sa=&sl=&smt=0&sat=-1&ssd=11/1/2011&sed=12/1/2011&ssb=Cert.%20Date&ssdir=ascending {{webarchive|url=https://web.archive.org/web/20131224084912/http://www.musiccanada.com/GPSearchResult.aspx?st=&ica=False&sa=&sl=&smt=0&sat=-1&ssd=11%2F1%2F2011&sed=12%2F1%2F2011&ssb=Cert.%20Date&ssdir=ascending
 |date=2013-12-24 }}</ref The EP's [[lead single]], \"[[Scary Monsters
 and Nice Sprites (song)|Scary Monsters and Nice Sprites]]\", was a
 moderate commercial success internationally, peaking within the charts
 of the United States, United Kingdom, Australia, Canada and
 Sweden.<ref{{cite
 web|url=http://pandora.nla.gov.au/pan/23790/20110730-0000/Issue1114.pdf
 |title=Pandora Archive |publisher=Pandora.nla.gov.au |date=2006-08-23
 |accessdate=2013-07-30}}</ref<ref{{cite
 web|url=http://acharts.us/song/64937 |title=Skrillex - Scary Monsters
 And Nice Sprites - Music Charts |website=Acharts.us
 |accessdate=2013-07-30}}</ref<ref{{cite web
 |url=http://swedishcharts.com/showitem.asp?interpret=Skrillex&titel=Scary+Monsters+And+Nice+Sprites&cat=s
 |title=Skrillex - Scary Monsters And Nice Sprites
 |website=swedishcharts.com |date=
 |accessdate=2013-07-30}}</ref<ref[http://www.theofficialcharts.com/archive-chart/_/1/2011-05-15
 ]{{dead link|date=July 2013}}</ref<ref{{cite
 web|url=https://www.billboard.com/charts/2012-01-07/hot-100?order=gainer
 |title=The Hot 100 : Jan 07, 2012 &#124; Billboard Chart Archive
 |publisher=Billboard.com |date=2012-01-07
 |accessdate=2013-07-30}}</ref It was later certified [[RIAA
 certification|Gold]] by the [[Recording Industry Association of
 America]] (RIAA), with sales exceeding 500,000.<ref{{cite
 web|url=https://www.riaa.com/goldandplatinumdata.php?artist=%22SCARY+MONSTERS+AND+NICE+SPRITE%22
 |title=Gold & Platinum Searchable Database - July 30, 2013
 |publisher=RIAA |date= |accessdate=2013-07-30}}</ref On November 30,
 2011, it was announced that the EP was nominated at the [[54th Grammy
 Awards]] for [[Grammy Award for Best Dance/Electronica Album|Best
 Dance/Electronica Album]]. On February 12, 2012, the album won the
 Grammy for that category. A follow-up EP, ''[[More Monsters and
 Sprites]]'', contains several remixes of the title track done by
 [[Dirtyphonics]], Phonat, The Juggernaut and [[Kaskade]]. It is named
 after [[David Bowie]]'s 1980 album ''[[Scary Monsters (And Super
 Creeps)]]''.<ref[http://inflatableferret.com/flipflop/flipflop-skrillex-scary-monsters-and-nice-sprites/
 Scary Monsters and Nice Sprites]. ''Inflatable Ferret''.</ref"
     }
    ]
   }
  }
 }
 }

我对此输出感到困惑,任何人都可以告诉我正确的方法来进行此操作

php json regex wikipedia-api
1个回答
0
投票

我认为您的正则表达式几乎是正确的!

我进行了此更改:

$regex = '#<\s*?table\b[^>]*?>(.*?)<\s*/\s*table\s*>#si';

正如Wiktor所提到的,您必须使用“ ungreedy” ?以避免捕获过多。

我也添加了不区分大小写的标志,与安全性一样。

在此处查看结果:https://regex101.com/r/mZmk8J/1

但是过分地,我会使用DOM解析器,因为表的内容似乎很长而且很复杂。如果返回的HTML代码中有多个表,该怎么办?

© www.soinside.com 2019 - 2024. All rights reserved.