我有以下数据框:
id = c(1,2,3)
where_home = c(1, 0, NA)
where_work = c(0, 1, NA)
with_alone = c(0,0,0)
with_parents = c(0,1,1)
with_colleagues = c(1,1,0)
gender_male = c(1,0,1)
gender_female = c(0,1,0)
p_affect = c(10,14,20)
n_affect = c(20,30,10)
df = data.frame(id, where_home, where_work,
with_alone, with_parents, with_colleagues,
gender_male, gender_female, p_affect, n_affect)
有 3 个 ID,以及多个热编码列(where、with、性别)以及非热编码列(p_affect、n_affect)。
我想要的是转换热编码列,同时保持非热编码列不变。
我做了以下事情:
library(dplyr)
df_transformed <- df %>%
rowwise() %>%
mutate(Gender = case_when(
gender_male == 1 ~ "Male",
gender_female == 1 ~ "Female",
TRUE ~ NA_character_
),
Context = paste(
ifelse(with_alone == 1, "Alone", ""),
ifelse(with_parents == 1, "Parents", ""),
ifelse(with_colleagues == 1, "Colleagues", ""),
collapse = " and "
),
Location = trimws(ifelse(
where_home == 1 & where_work == 1,
'Home and Work',
paste(
ifelse(where_home == 1, 'Home', ''),
ifelse(where_work == 1, 'Work', '')
)
))) %>%
select(-starts_with("gender_"), -starts_with("with_"))
df_transformed <- df_transformed %>%
select(id, Gender, Context, Location, p_affect, n_affect)
结果:
id Gender Context Location p_affect n_affect
<dbl> <chr> <chr> <chr> <dbl> <dbl>
1 1 Male " Colleagues" Home 10 20
2 2 Female " Parents Colleagues" Work 14 30
3 3 Male " Parents " NA 20 10
这似乎可行,但有一些问题:
pseudocode:
vector_of_columns_that_are_hot_encoded = c('where', 'with', 'gender')
for column in vector_of_columns:
# modify the hot-encoded columns and make a new data frame while keeping the columns that are not in the vector_of_columns_that_are_hot_encoded as they are
# mind that some hot-encoded columns are binary (gender), while others have multiple values. If multiple values are present, put them in the data frame using "Value 1 and Value 2 and ..."
我认为必须有一种简单的方法来做到这一点。由于我是 dplyr 的初学者,如果可能的话请解释一下代码并保持简单。
使用现有代码,您可以应用一些后处理来调整格式:
df_transformed |>
mutate(
Context = str_trim(Context),
Context = str_replace_all(Context, " ", " and ")
)
#> # A tibble: 3 × 6
#> # Rowwise:
#> id Gender Context Location p_affect n_affect
#> <dbl> <chr> <chr> <chr> <dbl> <dbl>
#> 1 1 Male Colleagues Home 10 20
#> 2 2 Female Parents and Colleagues Work 14 30
#> 3 3 Male Parents <NA> 20 10