我有这种dataframe
:
id institution name_a info_a bfullname idb
1 A Chet Baker 666 Clifford Brown 123
我需要重塑它,保持id
,institution
并配对列保持这样的值:
id institution role name id_name
1 A student Chet Baker 666
1 A teacher Clifford Brown 123
角色列由column name
定义,我有一个id向量标识如下:
value id
name_a student
bfullname teacher
问题是我有很多不同名称的列,我需要一种方法来指定哪些列与另一列相同,或者可能是一个我可以重命名列的解决方案。
我见过很多reshape
,dcast
,melt
等话题,但仍然无法弄明白
有什么想法怎么做?
library(data.table)
setDT(df)
melt(
df,
id.vars = 1:2,
measure.vars = list(name = c(3, 5), id_name = c(4, 6)),
variable.name = "role"
)
#> id institution role name id_name
#> 1: 1 A 1 Chet Baker 666
#> 2: 1 A 2 Clifford Brown 123
df
在哪里:
df <- read.table(text = '
id institution name_a info_a bfullname idb
1 A "Chet Baker" 666 "Clifford Brown" 123
', header = TRUE)
由reprex package创建于2019-02-14(v0.2.1)
忘记reshape
,使用tidyr
:
require(dplyr)
require(tidyr)
df <- tribble(
~id, ~institution, ~name_a, ~info_a, ~bfullname, ~idb,
1, "A", "Chet Baker", 666, "Clifford Brown", 123,
2, "B", "George Baker", 123, "Charlie Brown", 234,
3, "C", "Banket Baker", 456, "James Brown", 647,
4, "D", "Koeken Baker", 789, "Golden Brown", 967
)
def <- tribble(~value, ~roleid, ~info,
"name_a", "student", "info_a",
"bfullname", "teacher", "idb")
def
dflong <- df %>%
gather(key, value, -id, -institution)
dflong %>%
filter(key %in% def$value) %>%
rename(role = key, name = value) %>%
inner_join(def, by = c('role' = 'value')) %>%
left_join(dflong %>% select(- institution), by = c('id' = 'id','info' = 'key'))
这将导致:
# A tibble: 8 x 7
id institution role name roleid info value
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 A name_a Chet Baker student info_a 666
2 2 B name_a George Baker student info_a 123
3 3 C name_a Banket Baker student info_a 456
4 4 D name_a Koeken Baker student info_a 789
5 1 A bfullname Clifford Brown teacher idb 123
6 2 B bfullname Charlie Brown teacher idb 234
7 3 C bfullname James Brown teacher idb 647
8 4 D bfullname Golden Brown teacher idb 967