使用R从计划生育网站上刮取信息。

Question

我正试图使用Rvest库从一个计划生育网站上搜索某些信息。我正在看的网页是此处. 我目前想把网页右侧的 "提供的服务"，如 "人流服务"、"计生 "等拉出来。我有下面的代码，这是近？

 URL <- "https://www.plannedparenthood.org/health-center/tn"
  Webpage <- read_html(URL)
  all_links <- Webpage %>% 
    html_nodes("p a") %>%
    html_attr('href') %>%
    paste0('https://www.plannedparenthood.org', .)
 URL <- all_links[1]
 Website <- URL
 Webpage <- read_html(URL)
 Services <- Webpage %>% html_nodes("ul li a") %>% html_attr("href")

我从计划生育的主页面开始，导航到田纳西州的第一个设施。谁能帮助我获得所提供的服务？

Answer 1

这应该可以解决这个问题。

URL <- "https://www.plannedparenthood.org/health-center/tn"
Webpage <- read_html(URL)
all_links <- Webpage %>% 
  html_nodes("p a") %>%
  html_attr('href') %>%
  paste0('https://www.plannedparenthood.org', .)
URL <- all_links[1]
Website <- URL
Webpage <- read_html(URL)
Services <- Webpage %>% html_nodes(".services a") %>% html_text()

这给。

> Services
[1] "Abortion Services"                            "Birth Control"                                "HIV Testing"                                  "LGBTQ Services"                              
[5] "Men's Health Care"                            "Morning-After Pill (Emergency Contraception)" "Pregnancy Testing & Services"                 "STD Testing, Treatment & Vaccines"           
[9] "Women's Health Care"

我只在最后一行改了这个 %>% html_nodes(".services a") %>% html_text()

所以我用了一个比较特殊的css选择器，然后只取这个选择器产生的html文本。

如果你不熟悉CSS，可以试试这个 Google Chrome插件，让你更容易获得正确的CSS选择器。

使用R从计划生育网站上刮取信息。

问题描述投票：1回答：1

1个回答

最新问题

使用R从计划生育网站上刮取信息。

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1