我的数据具有以下结构:
data <- data.frame(
uniqueid = c(1, 1, 2, 2, 3, 3),
year = c(2010, 2011, 2010, 2011, 2010, 2011),
agency = c("SZ", "SZ", "SZ", NA, "SZ", "HE"),
switch = c(0, 0, 0, NA, 0, 1)
)
如您所见,数据是按给定年份中出现的 uniqueid 组织的。请记住,对于代理列,不同的 uniqueid 中可能会出现 13 个不同的唯一字符串。我希望数据如下所示:
data <- data.frame(
uniqueid = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3),
year = c(2010, 2010, 2011, 2011, 2010, 2010, 2011, 2011, 2011, 2010, 2010, 2011, 2011),
agency = c("SZ", "HE", "SZ", "HE", "SZ", "HE", "SZ", NA, "HE", "SZ", "HE", "SZ", "HE"),
switch = c(0, 0, 0, 0, 0, 0, NA, NA, NA, 0, 0, 0, 1)
)
在此转换中,行采用代理变量的不同唯一值,并且开关变量很大程度上映射了它以前的样子。我不太确定如何在 R 中实现这一点,尽管我更希望解决方案在 tidyverse 中。谢谢!
我一直在尝试类似以下的方法,但我似乎没有得到我想要的:
data1 <- data %>%
pivot_wider(names_from = agency, values_from = lead, names_prefix = "agency_", values_fill = "0") %>%
gather(key = agency, value = lead, starts_with("agency_")) %>%
arrange(uniqueid, year, agency)
你可以把这个问题分成3步,尝试一下:
raw_data
:library(tidyverse)
raw_data <- data.frame(
uniqueid = c(1, 1, 2, 2, 3, 3),
year = c(2010, 2011, 2010, 2011, 2010, 2011),
agency = c("SZ", "SZ", "SZ", NA, "SZ", "HE"),
switch = c(0, 0, 0, NA, 0, 1)
)
agency
列为 NA 的行:NA_rows <- raw_data |> filter(is.na(agency))
filled_rows <- raw_data |>
complete(uniqueid, year, agency) |>
select(-switch) |>
filter(!is.na(agency)) |>
left_join(raw_data, join_by(uniqueid, year, agency)) |>
mutate(switch = case_when(
is.na(switch) ~ 0,
TRUE ~ switch
))
filled_rows
与 NA_rows
绑定在一起:bind_rows(filled_rows, NA_rows)
我相信还有更优雅的方法,希望有帮助。