我正在尝试从 Forex Factory 的日历页面中提取事件的“event_type”。例如,如果检查页面“https://www.forexfactory.com/calendar?week=sep17.2023”的源代码,我可以发现第 95 行事件类型声明为:
"event_types":{"1":"Growth","2":"Inflation","3":"Employment","4":"Central Bank","5":"Bonds","7":"Housing","8":"Consumer Surveys","9":"Business Surveys","10":"Speeches","11":"Misc"}
现在,我希望每个事件都有某种与之关联的 event_type_id。但是,我似乎无法在源代码中找到该值的位置。第 88 行列出了事件。其中列出的事件之一如下:
{"id":131238,"ebaseId":852,"name":"BusinessNZ Services Index","dateline":1694989800,"country":"NZ","currency":"NZD","hasLinkedThreads":true,"hasNotice":false,"hasGraph":true,"checkedIn":false,"isMasterList":false,"firstInDay":true,"showGridLine":true,"greyed":true,"upNext":false,"releaser":"JS","checker":"TR","impactClass":"icon--ff-impact-yel","impactTitle":"Low Impact Expected","timeLabel":"6:30pm","actual":"47.1","previous":"47.8","revision":"48.0","forecast":"","leaked":false,"actualBetterWorse":0,"revisionBetterWorse":0,"isSubscribable":true,"isSubscribed":false,"showDetails":false,"showGraph":false,"enableDetailComponent":false,"enableExpandComponent":false,"enableActualComponent":false,"showExpanded":false,"siteId":1,"editUrl":"","date":"Sep 17, 2023","url":"\/calendar?day=sep17.2023#detail=131238"}
我在这里没有看到任何可能引用“event_type”的内容。那么当用户过滤日历时,网站如何知道要过滤哪些事件呢?一定有一种方法可以提取我丢失的信息。例如,如果该事件对应于事件 ID“1”(即“增长”),那么该信息在哪里?
我想从网上抓取这些信息!
这可能就是您想要的。就像 QHarr 建议的那样,您可以单独过滤每个事件类型,这样就可以反向确定每个事件属于哪个事件类型。
library(httr2)
## recreate basic post request with just one event type
data = '{"default_view":"this_week","impacts":[3,2,1,0],"event_types":[1],"currencies":[1,2,3,4,5,6,7,8,9],"begin_date":"September 17, 2023","end_date":"September 23, 2023"}'
## make character vector with all event types in sequence
post_data <- stringr::str_c('{"default_view":"this_week","impacts":[3,2,1,0],"event_types":[',
1:11,
'],"currencies":[1,2,3,4,5,6,7,8,9],"begin_date":"September 17, 2023","end_date":"September 23, 2023"}')
result <- purrr::map2_dfr( ## loop over different event types
post_data, ## provide vector with different event type requests
1:11,
~ request(
"https://www.forexfactory.com/calendar/apply-settings/1?navigation=0"
) |> ## create request with url in httr2
req_headers("User-Agent" = "macosx, Rstudio, httr2") |> ## provide a relatively random user agent, because the default httr2 seems to be alerady blocked
req_verbose() |> ## make the requests verbose so we can check what is send
req_throttle(rate = 10 / 60) |> ## set a throttle, so we cont get instantly blocked
req_body_raw(.x) |> ## tell map where to provide the post data information
req_perform() |> ## perform the request
resp_body_json(simplifyVector = T) |> ## extract the json
purrr::pluck("days") |> ## get the days event information
tidyr::unnest(events, names_repair = "unique") |> ## make a nicer data frame without nested datastructure
dplyr::mutate(event_id = .y) ## remember which event type we just provided by putting the post data
)