使用API 从维基百科中读取数据[关闭]

Question

对于我的项目，我试图从维基百科读取数据，我不完全确定，我该怎么做。

我主要关注的是活动的阅读，日期，地点和主题。首先，我已经开始阅读第91届学院奖项的上述信息。

我尝试使用维基百科查询服务，但它没有多大帮助。

然后我遇到了API解决方案并运行了以下URL，https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=sections&page=91st_Academy_Awards

但没有找到我正在寻找的信息。

我正在尝试阅读下图中红色框中标记的信息，

有人可以帮我这个，让我知道如何阅读上面提到的部分。

PS：我正在使用Matlab编写算法

Answer 1

一个可能的解决方案是使用webread读取网页，并使用Text Analytics Toolbox中的函数处理数据：

% Read HTML data.
raw = webread('https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=text&page=91st_Academy_Awards');

% Specify sections of interest.
SectionsOfInterest = ["Date","Site","Preshow hosts","Produced by","Directed by"];

% Parse HTML data.
myTree = htmlTree(raw.parse.text.x_);

% Find table element.
tableElements = findElement(myTree,'Table');
tableOfInterest = tableElements(1);

% Find header cell elements.
thElements = findElement(tableOfInterest,"th");
% Find cell elements.
tdElements = findElement(tableOfInterest,"td");

% Extract text.
thHTML = thElements.extractHTMLText;
tdHTML = tdElements.extractHTMLText;

for section = 1:numel(SectionsOfInterest)

   sectionName = SectionsOfInterest(section);
   sectIndex = strcmp(sectionName,thHTML);

   % Remove spaces if present from section name.
   sectionName = strrep(sectionName,' ','');

   % Clean up data.
   sectData = regexprep(tdHTML(sectIndex),'\n+','.');

   % Create structure.
   s.(sectionName) = sectData;
end

可视化输出结构：

>> s
s = 

struct with fields:

        Date: "February 24, 2019"
        Site: "Dolby Theatre.Hollywood, Los Angeles, California, U.S."
Preshowhosts: "Ashley Graham.Maria Menounos.Elaine Welteroth.Billy Porter.Ryan Seacrest. "
  Producedby: "Donna Gigliotti.Glenn Weiss"
  Directedby: "Glenn Weiss"

使用API 从维基百科中读取数据[关闭]

问题描述投票：1回答：1

1个回答

最新问题

使用API 从维基百科中读取数据[关闭]

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1