我想使用PHP中的Wikipedia API提取Wikipedia信息框,但我不知道正确的方法是什么。
这是我尝试过的代码。
<?php
$url = "http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=YouTube&rvsection=0&rvparse";
$data = json_decode(file_get_contents($url), true);
$data = current($data['query']['pages']);
$regex = '#<\s*?table\b[^>]*>(.*)</table\b[^>]*>#s';
$code = preg_match($regex, $data["revisions"][0]['*'], $matches);
echo($matches[0]);
?>
此代码给出这样的输出image
但是我想删除多余的内容并显示为image
我也尝试过此URL
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0
此URL有效,但给出这样的输出
{ "batchcomplete":"","warnings": {"main": { "*":"Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce for notice of API deprecations and breaking changes. Use [[Special:ApiFeatureUsage]] to see usage of deprecated features by your application."}, "revisions": { "*":"Because \"rvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used."}}, "query": { "pages": { "29314104":{ "pageid":29314104,"ns":0,"title":"Scary Monsters and Nice Sprites","revisions": [ { "contentformat":"text/x-wiki","contentmodel":"wikitext","*":"{{about|the EP|the title song|Scary Monsters and Nice Sprites (song)}}\n{{Infobox album\n| name = Scary Monsters and Nice Sprites\n| type = EP\n| artist = [[Skrillex]]\n| cover = Skrillex scary monsters.jpg\n| alt =\n| released = October 22, 2010\n| recorded = 2010\n| studio =\n| genre = [[Dubstep]], [[electro (music)|electro]]\n| length = 44:02\n| label = {{hlist|[[Big Beat Records (Atlantic Records subsidiary)|Big Beat]]|[[mau5trap]]}}\n| producer = {{hlist|Bare Noize|[[Foreign Beggars]]|[[Noisia]]|[[Skrillex]]|[[Zedd (producer)|Zedd]]}}\n| prev_title = [[My Name Is Skrillex]]\n| prev_year = 2010\n| next_title = [[More Monsters and Sprites]]\n| next_year = 2011\n| misc = {{Singles\n | name = Scary Monsters and Nice Sprites\n | type = EP\n | single1 = [[Scary Monsters and Nice Sprites (song)|Scary Monsters and Nice Sprites]]\n | single1date = October 22, 2010\n | single2 = Rock n' Roll (Will Take You to the Mountain)\n | single2date = June 20, 2011\n}}\n}}\n\n'''''Scary Monsters and Nice Sprites''''' is the second [[extended play]] (EP) by American [[electronic music]] producer [[Skrillex]]. It was released exclusively through [[Beatport]] on October 22, 2010 through [[mau5trap]] and [[Big Beat Records (Atlantic Records subsidiary)|Big Beat Records]], while being released on December 20 for [[Music download|digital download]] via other online retailers and on March 1, 2011 as a physical release. It was recorded in 2010 at Skrillex's apartment using a laptop.<ref name=\"neontapedeck\" / The EP features guest contributions from Penny, [[Foreign Beggars]] and Bare Noize as well as remixes done by [[Noisia]], [[Zedd (musician)|Zedd]] and Bare Noize. It won two Grammys at the 54th Annual Grammy Awards: one for Best Dance Recording, and another for Best Dance/Electronica Album.\n\nThe EP received generally positive reviews from music critics and became a moderate commercial success, reaching number 49 on the ''[[Billboard (magazine)|Billboard]]'' [[Billboard 200|200]],<ref name=bb200{{BillboardURLbyName|artist=skrillex|chart=Billboard 200}}</ref while also topping the ''Billboard'' [[Top Heatseekers|Heatseekers Albums]] chart and reaching number 28 in Australia.<ref name=\"australian-charts.com\"{{cite web |url=http://australian-charts.com/showinterpret.asp?interpret=Skrillex |title=Discography Skrillex |website=australian-charts.com |accessdate=2013-07-30}}</ref As of November 2011, the EP was certified Gold in Canada with sales exceeding 40,000 copies.<refhttp://www.musiccanada.com/GPSearchResult.aspx?st=&ica=False&sa=&sl=&smt=0&sat=-1&ssd=11/1/2011&sed=12/1/2011&ssb=Cert.%20Date&ssdir=ascending {{webarchive|url=https://web.archive.org/web/20131224084912/http://www.musiccanada.com/GPSearchResult.aspx?st=&ica=False&sa=&sl=&smt=0&sat=-1&ssd=11%2F1%2F2011&sed=12%2F1%2F2011&ssb=Cert.%20Date&ssdir=ascending |date=2013-12-24 }}</ref The EP's [[lead single]], \"[[Scary Monsters and Nice Sprites (song)|Scary Monsters and Nice Sprites]]\", was a moderate commercial success internationally, peaking within the charts of the United States, United Kingdom, Australia, Canada and Sweden.<ref{{cite web|url=http://pandora.nla.gov.au/pan/23790/20110730-0000/Issue1114.pdf |title=Pandora Archive |publisher=Pandora.nla.gov.au |date=2006-08-23 |accessdate=2013-07-30}}</ref<ref{{cite web|url=http://acharts.us/song/64937 |title=Skrillex - Scary Monsters And Nice Sprites - Music Charts |website=Acharts.us |accessdate=2013-07-30}}</ref<ref{{cite web |url=http://swedishcharts.com/showitem.asp?interpret=Skrillex&titel=Scary+Monsters+And+Nice+Sprites&cat=s |title=Skrillex - Scary Monsters And Nice Sprites |website=swedishcharts.com |date= |accessdate=2013-07-30}}</ref<ref[http://www.theofficialcharts.com/archive-chart/_/1/2011-05-15 ]{{dead link|date=July 2013}}</ref<ref{{cite web|url=https://www.billboard.com/charts/2012-01-07/hot-100?order=gainer |title=The Hot 100 : Jan 07, 2012 | Billboard Chart Archive |publisher=Billboard.com |date=2012-01-07 |accessdate=2013-07-30}}</ref It was later certified [[RIAA certification|Gold]] by the [[Recording Industry Association of America]] (RIAA), with sales exceeding 500,000.<ref{{cite web|url=https://www.riaa.com/goldandplatinumdata.php?artist=%22SCARY+MONSTERS+AND+NICE+SPRITE%22 |title=Gold & Platinum Searchable Database - July 30, 2013 |publisher=RIAA |date= |accessdate=2013-07-30}}</ref On November 30, 2011, it was announced that the EP was nominated at the [[54th Grammy Awards]] for [[Grammy Award for Best Dance/Electronica Album|Best Dance/Electronica Album]]. On February 12, 2012, the album won the Grammy for that category. A follow-up EP, ''[[More Monsters and Sprites]]'', contains several remixes of the title track done by [[Dirtyphonics]], Phonat, The Juggernaut and [[Kaskade]]. It is named after [[David Bowie]]'s 1980 album ''[[Scary Monsters (And Super Creeps)]]''.<ref[http://inflatableferret.com/flipflop/flipflop-skrillex-scary-monsters-and-nice-sprites/ Scary Monsters and Nice Sprites]. ''Inflatable Ferret''.</ref" } ] } } } }
我对此输出感到困惑,任何人都可以告诉我正确的方法来进行此操作
我认为您的正则表达式几乎是正确的!
我进行了此更改:
$regex = '#<\s*?table\b[^>]*?>(.*?)<\s*/\s*table\s*>#si';
正如Wiktor所提到的,您必须使用“ ungreedy” ?
以避免捕获过多。
我也添加了不区分大小写的标志,与安全性一样。
在此处查看结果:https://regex101.com/r/mZmk8J/1
但是过分地,我会使用DOM解析器,因为表的内容似乎很长而且很复杂。如果返回的HTML代码中有多个表,该怎么办?