如何在 R 中总结组外观察结果?

问题描述 投票:0回答:1

我正在尝试复制Gayle & Wu (2013),并且有两个时间段的类似数据:

df_2016 <- structure(list(YEAR = c(2016L, 2016L, 2016L, 2016L, 2016L, 2016L
), MARKET = c("ATL-AUS", "ATL-AUS", "ATL-AUS", "ATL-AUS", "ATL-AUS", 
"ATL-AUS"), AIRLINE = c("WN", "UA", "DL", "AA", "F9", "WN"), 
    DIRECT = c(1L, 0L, 1L, 0L, 1L, 0L)), row.names = c(NA, 6L
), class = "data.frame")

  YEAR  MARKET AIRLINE DIRECT
1 2016 ATL-AUS      WN      1
2 2016 ATL-BIR      UA      0
3 2016 BIR-OGC      DL      1
4 2016 BIR-OGC      AA      0
5 2016 CFR-GHV      F9      1
6 2016 GHV-OFV      WN      0

df_2017 <- structure(list(YEAR = c(2017L, 2017L, 2017L, 2017L, 2017L, 2017L
), MARKET = c("ATL-AUS", "ATL-AUS", "ATL-AUS", "ATL-AUS", "ATL-AUS", 
"ATL-AUS"), AIRLINE = c("WN", "UA", "AA", "DL", "F9", "WN"), 
    DIRECT = c(1L, 0L, 0L, 1L, 1L, 0L)), row.names = c(NA, 6L
), class = "data.frame")

  YEAR  MARKET AIRLINE DIRECT
1 2017 ATL-AUS      WN      1
2 2017 ATL-BOS      UA      0
3 2017 GHV-OFV      AA      0
4 2017 ATL-AUS      DL      1
5 2017 ATL-AUS      F9      1
6 2017 ATL-AUS      WN      0

对于给定的市场(2017 年),我想从 2016 年的数据中计算在每个端点运营但不在市场中运营的航空公司数量。如有任何帮助,我们将不胜感激。

r variables dplyr tidyverse data-cleaning
1个回答
0
投票

请检查此逻辑是否适合您的任务:

library(dplyr)
library(tidyr)


x <- df_2016 %>%
  separate_wider_delim(MARKET, names = c("Origin", "Destination"), delim = "-") %>% 
  summarise(Airlines = list(AIRLINE), .by = c(Origin, Destination)) %>% 
  left_join(df_2016_split, join_by("Origin", "Destination")) %>%
  filter(!AIRLINE %in% Airlines) %>%
  select(-Airlines)

df_2017 %>%
  separate_wider_delim(MARKET, names = c("Origin", "Destination"), delim = "-") %>% 
  left_join(airlines_not_in_market_2016, join_by(Origin, Destination)) %>%
  summarise(Count = n_distinct(AIRLINE), .by =   c(YEAR, Origin, Destination))

  YEAR Origin Destination Count
1 2017    ATL         AUS     5
© www.soinside.com 2019 - 2024. All rights reserved.