这里是数据示例和重塑它的代码,具有当前输出:
DT <- data.table(
parent.name = words[101:103],
parent.dob = as.Date(1:3, origin="2020-01-01"),
child_boy = words[11:13],
child_girl = words[21:23],
child_trans = words[201:203],
dob_boy= as.Date(1:3, origin="2010-01-01"),
dob_girk= as.Date(1:3, origin="2012-01-01"),
dob_trans= as.Date(1:3, origin="2022-01-01")
)
DT
> DT
parent.name parent.dob child_boy child_girl child_trans dob_boy dob_girk dob_trans
<char> <Date> <char> <char> <char> <Date> <Date> <Date>
1: boat 2020-01-02 actual against course 2010-01-02 2012-01-02 2022-01-02
2: body 2020-01-03 add age court 2010-01-03 2012-01-03 2022-01-03
3: book 2020-01-04 address agent cover 2010-01-04 2012-01-04 2022-01-04
DT2 <- melt(DT, id.vars = c("parent.name", "parent.dob"), measure=patterns(dob="^dob_", name="^child_"), value.factor=TRUE, variable.name = "child")
DT2
> DT2
parent.name parent.dob child dob name
<char> <Date> <fctr> <Date> <char>
1: boat 2020-01-02 1 2010-01-02 actual
2: body 2020-01-03 1 2010-01-03 add
3: book 2020-01-04 1 2010-01-04 address
4: boat 2020-01-02 2 2012-01-02 against
5: body 2020-01-03 2 2012-01-03 age
6: book 2020-01-04 2 2012-01-04 agent
7: boat 2020-01-02 3 2022-01-02 course
8: body 2020-01-03 3 2022-01-03 court
9: book 2020-01-04 3 2022-01-04 cover
DT2 [child=="1", child:="boy"] [child=="2", child:="girl"][child=="3", child:="trans"]
DT2
> DT2
parent.name parent.dob child dob name
<char> <Date> <fctr> <Date> <char>
1: boat 2020-01-02 boy 2010-01-02 actual
2: body 2020-01-03 boy 2010-01-03 add
3: book 2020-01-04 boy 2010-01-04 address
4: boat 2020-01-02 girl 2012-01-02 against
5: body 2020-01-03 girl 2012-01-03 age
6: book 2020-01-04 girl 2012-01-04 agent
7: boat 2020-01-02 trans 2022-01-02 course
8: body 2020-01-03 trans 2022-01-03 court
9: book 2020-01-04 trans 2022-01-04 cover
>
在上面的代码中,我手动将新变量值重新分配给原始表头中使用的值。
所以问题是: 是否可以自动执行此步骤?
想象一下,如果有几十个这样的列融合成一个 - 你不想冒险通过手动重命名它们来偶然引入错误。
使用
data.table
可以熔化,然后使用 tstrsplit
在再次浇铸之前分离柱子。
x <- melt(DT, id.vars = c("parent.name", "parent.dob"))
x[, c("variable", "name") := tstrsplit(variable, "_", fixed = TRUE)]
dcast(x, parent.name + parent.dob + name ~ variable)
结果
parent.name parent.dob name child dob
1: boat 2020-01-02 boy actual 2010-01-02
2: boat 2020-01-02 girk <NA> 2012-01-02
3: boat 2020-01-02 girl against <NA>
4: boat 2020-01-02 trans course 2022-01-02
5: body 2020-01-03 boy add 2010-01-03
6: body 2020-01-03 girk <NA> 2012-01-03
7: body 2020-01-03 girl age <NA>
8: body 2020-01-03 trans court 2022-01-03
9: book 2020-01-04 boy address 2010-01-04
10: book 2020-01-04 girk <NA> 2012-01-04
11: book 2020-01-04 girl agent <NA>
12: book 2020-01-04 trans cover 2022-01-04
我认为您需要的开发版本支持
data.table::measure
。在那之前,也许你可以使用这个:
tidyr::pivot_longer(DT, -(1:2),
names_pattern = "(.*)_(.*)", names_to = c(".value", "name"))
# # A tibble: 12 × 5
# parent.name parent.dob name child dob
# <chr> <chr> <chr> <chr> <chr>
# 1 boat 2020-01-02 boy actual 2010-01-02
# 2 boat 2020-01-02 girl against <NA>
# 3 boat 2020-01-02 trans course 2022-01-02
# 4 boat 2020-01-02 girk <NA> 2012-01-02
# 5 body 2020-01-03 boy add 2010-01-03
# 6 body 2020-01-03 girl age <NA>
# 7 body 2020-01-03 trans court 2022-01-03
# 8 body 2020-01-03 girk <NA> 2012-01-03
# 9 book 2020-01-04 boy address 2010-01-04
# 10 book 2020-01-04 girl agent <NA>
# 11 book 2020-01-04 trans cover 2022-01-04
# 12 book 2020-01-04 girk <NA> 2012-01-04
数据
DT <- data.table::as.data.table(structure(list(parent.name = c("boat", "body", "book"), parent.dob = c("2020-01-02", "2020-01-03", "2020-01-04"), child_boy = c("actual", "add", "address"), child_girl = c("against", "age", "agent"), child_trans = c("course", "court", "cover"), dob_boy = c("2010-01-02", "2010-01-03", "2010-01-04"), dob_girk = c("2012-01-02", "2012-01-03", "2012-01-04"), dob_trans = c("2022-01-02", "2022-01-03", "2022-01-04")), row.names = c(NA, -3L), class = c("data.table", "data.frame")))