刮擦维基(不是维基百科)信息框

问题描述 投票:0回答:1

我可以使用rvest在任何维基百科网站上删除信息框,但我想在维基页面上做同样的事情,但无法让它工作......

链接:https://dc.fandom.com/wiki/Wonder_Woman_(Diana_Prince)在页面上你有信息框(看起来像一个普通的维基百科表),CSS选择器似乎是“.pi-layout-default”

我想要一个包含真实姓名,别名等的数据框。

有关如何做到这一点的任何想法?

r web-scraping rvest wiki
1个回答
2
投票

使用rvestselectorgadet

library(rvest)
library(tidyverse)

read_html("https://dc.fandom.com/wiki/Wonder_Woman_(Diana_Prince)") %>%
  html_nodes(".pi-font , .pi-data-label") %>%
  html_text() %>%
  matrix(ncol = 2, byrow = TRUE) %>%
  as_tibble()
# A tibble: 21 x 2
   V1                V2                                                                                                               
   <chr>             <chr>                                                                                                            
 1 Real Name         Diana of Themyscira                                                                                              
 2 Current Alias     Wonder Woman                                                                                                     
 3 Aliases           Diana Prince, Princess Diana, Miss America, Goddess of Truth, Dinanna Truthqueen                                 
 4 Relatives         Ares (grandfather)[1]Hippolyta (mother)Antiope (aunt, deceased)Theseus (uncle by Antiope, deceased)Hippolytus (c~
 5 Affiliation       Justice League · formerly Department of Metahuman Affairs, Star Sapphire Corps, Female Furies, White Lantern Cor~
 6 Base Of Operatio~ Washington, D.C. · Themyscira · JLA Watchtower, Hall of Justice · formerly Boston, Gateway City                  
 7 Alignment         Good                                                                                                             
 8 Identity          Public Identity                                                                                                  
 9 Race              Amazon                                                                                                           
10 Citizenship       Amazon                                                                                                           
# ... with 11 more rows
© www.soinside.com 2019 - 2024. All rights reserved.