我正在尝试使用pivot_wider来获取1991 - 1995年间每个国家/地区的二进制结果,如下表所示:
+------+-------+--------+--------+
| year | USA | Israel | Sweden |
| 1991 | FALSE | TRUE | TRUE |
| 1992 | FALSE | FALSE | TRUE |
| 1993 | FALSE | TRUE | TRUE |
| 1994 | FALSE | FALSE | TRUE |
| 1995 | TRUE | TRUE | TRUE |
+------+-------+--------+--------+
当然,除了真/假之外,任何二进制指示都很好。
但是,我的数据框看起来像:
country = c("Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Israel", "Israel",
"Israel", "USA")
year = c(1991,1992,1993,1994,1995,1991,1993,1995,1995)
df = as.data.frame(cbind(year,country))
df
+---------+------+
| country | Year |
| Sweden | 1991 |
| Sweden | 1992 |
| Sweden | 1993 |
| Sweden | 1994 |
| Sweden | 1995 |
| Israel | 1991 |
| Israel | 1993 |
| Israel | 1995 |
| USA | 1995 |
+---------+------+
我尝试了以下代码并获得了下面的结果,这不是我想要的
library(dplyr)
df2 = df %>%
group_by(country) %>%
mutate(row = row_number()) %>%
pivot_wider(names_from = country, values_from = year) %>%
select(-row)
df2
+------+--------+--------+
| USA | Israel | Sweden |
| 1995 | 1991 | 1991 |
| NA | 1993 | 1992 |
| NA | 1995 | 1993 |
| NA | NA | 1994 |
| NA | NA | 1995 |
+------+--------+--------+
你可以试试这个:
library(dplyr)
library(tidyr)
df %>% mutate(val=1) %>% pivot_wider(names_from = country,values_from = val) %>%
mutate(across(-year, ~replace_na(.x, 0))) %>%
mutate(across(-year, ~ifelse(.x==1, TRUE,FALSE)))
输出:
# A tibble: 5 x 4
year Sweden Israel USA
<fct> <lgl> <lgl> <lgl>
1 1991 TRUE TRUE FALSE
2 1992 TRUE FALSE FALSE
3 1993 TRUE TRUE FALSE
4 1994 TRUE FALSE FALSE
5 1995 TRUE TRUE TRUE
这是一个
data.table
解决方案
library( data.table )
#custom function, odetermins is the length of a vector >1 (TRUE/FALSE)
cust_fun <- function(x) length(x) > 0
#cast to wide, aggregating with the custom function above
dcast( setDT(df), year ~ country, fun.aggregate = cust_fun )
# year Israel Sweden USA
# 1: 1991 TRUE TRUE FALSE
# 2: 1992 FALSE TRUE FALSE
# 3: 1993 TRUE TRUE FALSE
# 4: 1994 FALSE TRUE FALSE
# 5: 1995 TRUE TRUE TRUE