我正在尝试复制Gayle & Wu (2013),并且有两个时间段的类似数据:
df_2016 <- structure(list(YEAR = c(2016L, 2016L, 2016L, 2016L, 2016L, 2016L
), MARKET = c("ATL-AUS", "ATL-AUS", "ATL-AUS", "ATL-AUS", "ATL-AUS",
"ATL-AUS"), AIRLINE = c("WN", "UA", "DL", "AA", "F9", "WN"),
DIRECT = c(1L, 0L, 1L, 0L, 1L, 0L)), row.names = c(NA, 6L
), class = "data.frame")
YEAR MARKET AIRLINE DIRECT
1 2016 ATL-AUS WN 1
2 2016 ATL-BIR UA 0
3 2016 BIR-OGC DL 1
4 2016 BIR-OGC AA 0
5 2016 CFR-GHV F9 1
6 2016 GHV-OFV WN 0
df_2017 <- structure(list(YEAR = c(2017L, 2017L, 2017L, 2017L, 2017L, 2017L
), MARKET = c("ATL-AUS", "ATL-AUS", "ATL-AUS", "ATL-AUS", "ATL-AUS",
"ATL-AUS"), AIRLINE = c("WN", "UA", "AA", "DL", "F9", "WN"),
DIRECT = c(1L, 0L, 0L, 1L, 1L, 0L)), row.names = c(NA, 6L
), class = "data.frame")
YEAR MARKET AIRLINE DIRECT
1 2017 ATL-AUS WN 1
2 2017 ATL-BOS UA 0
3 2017 GHV-OFV AA 0
4 2017 ATL-AUS DL 1
5 2017 ATL-AUS F9 1
6 2017 ATL-AUS WN 0
对于给定的市场(2017 年),我想从 2016 年的数据中计算在每个端点运营但不在市场中运营的航空公司数量。如有任何帮助,我们将不胜感激。
请检查此逻辑是否适合您的任务:
library(dplyr)
library(tidyr)
x <- df_2016 %>%
separate_wider_delim(MARKET, names = c("Origin", "Destination"), delim = "-") %>%
summarise(Airlines = list(AIRLINE), .by = c(Origin, Destination)) %>%
left_join(df_2016_split, join_by("Origin", "Destination")) %>%
filter(!AIRLINE %in% Airlines) %>%
select(-Airlines)
df_2017 %>%
separate_wider_delim(MARKET, names = c("Origin", "Destination"), delim = "-") %>%
left_join(airlines_not_in_market_2016, join_by(Origin, Destination)) %>%
summarise(Count = n_distinct(AIRLINE), .by = c(YEAR, Origin, Destination))
YEAR Origin Destination Count
1 2017 ATL AUS 5