有没有办法使用 R Studio 从“检查源”页面从 Forex Factory 的事件中提取“event_type”?

问题描述 投票:0回答:1

我正在尝试从 Forex Factory 的日历页面中提取事件的“event_type”。例如,如果检查页面“https://www.forexfactory.com/calendar?week=sep17.2023”的源代码,我可以发现第 95 行事件类型声明为:

"event_types":{"1":"Growth","2":"Inflation","3":"Employment","4":"Central Bank","5":"Bonds","7":"Housing","8":"Consumer Surveys","9":"Business Surveys","10":"Speeches","11":"Misc"}

现在,我希望每个事件都有某种与之关联的 event_type_id。但是,我似乎无法在源代码中找到该值的位置。第 88 行列出了事件。其中列出的事件之一如下:

{"id":131238,"ebaseId":852,"name":"BusinessNZ Services Index","dateline":1694989800,"country":"NZ","currency":"NZD","hasLinkedThreads":true,"hasNotice":false,"hasGraph":true,"checkedIn":false,"isMasterList":false,"firstInDay":true,"showGridLine":true,"greyed":true,"upNext":false,"releaser":"JS","checker":"TR","impactClass":"icon--ff-impact-yel","impactTitle":"Low Impact Expected","timeLabel":"6:30pm","actual":"47.1","previous":"47.8","revision":"48.0","forecast":"","leaked":false,"actualBetterWorse":0,"revisionBetterWorse":0,"isSubscribable":true,"isSubscribed":false,"showDetails":false,"showGraph":false,"enableDetailComponent":false,"enableExpandComponent":false,"enableActualComponent":false,"showExpanded":false,"siteId":1,"editUrl":"","date":"Sep 17, 2023","url":"\/calendar?day=sep17.2023#detail=131238"}

我在这里没有看到任何可能引用“event_type”的内容。那么当用户过滤日历时,网站如何知道要过滤哪些事件呢?一定有一种方法可以提取我丢失的信息。例如,如果该事件对应于事件 ID“1”(即“增长”),那么该信息在哪里?

我想从网上抓取这些信息!

html r web-scraping rvest
1个回答
0
投票

这可能就是您想要的。就像 QHarr 建议的那样,您可以单独过滤每个事件类型,这样就可以反向确定每个事件属于哪个事件类型。

library(httr2)

## recreate basic post request with just one event type
data = '{"default_view":"this_week","impacts":[3,2,1,0],"event_types":[1],"currencies":[1,2,3,4,5,6,7,8,9],"begin_date":"September 17, 2023","end_date":"September 23, 2023"}'

## make character vector with all event types in sequence
post_data <- stringr::str_c('{"default_view":"this_week","impacts":[3,2,1,0],"event_types":[',
             1:11,
             '],"currencies":[1,2,3,4,5,6,7,8,9],"begin_date":"September 17, 2023","end_date":"September 23, 2023"}')

result <- purrr::map2_dfr( ## loop over different event types
  post_data, ## provide vector with different event type requests
  1:11,
  ~ request( 
    "https://www.forexfactory.com/calendar/apply-settings/1?navigation=0"
  ) |> ## create request with url in httr2
    req_headers("User-Agent" = "macosx, Rstudio, httr2") |> ## provide a relatively random user agent, because the default httr2 seems to be alerady blocked
    req_verbose() |> ## make the requests verbose so we can check what is send
    req_throttle(rate = 10 / 60) |> ## set a throttle, so we cont get instantly blocked
    req_body_raw(.x) |> ## tell map where to provide the post data information
    req_perform() |> ## perform the request
    resp_body_json(simplifyVector = T) |> ## extract the json
    purrr::pluck("days") |> ## get the days event information
    tidyr::unnest(events, names_repair = "unique") |> ## make a nicer data frame without nested datastructure
    dplyr::mutate(event_id = .y) ## remember which event type we just provided by putting the post data
)
© www.soinside.com 2019 - 2024. All rights reserved.