melt data.table 使用自动提取度量名称的模式

问题描述 投票:0回答:2

这里是数据示例和重塑它的代码,具有当前输出:

  DT <- data.table(
    parent.name = words[101:103],
    parent.dob = as.Date(1:3, origin="2020-01-01"),
    child_boy = words[11:13],
    child_girl = words[21:23],
    child_trans = words[201:203],
    dob_boy= as.Date(1:3, origin="2010-01-01"),
    dob_girk= as.Date(1:3, origin="2012-01-01"), 
    dob_trans= as.Date(1:3, origin="2022-01-01") 
  )
  DT

>   DT
   parent.name parent.dob child_boy child_girl child_trans    dob_boy   dob_girk  dob_trans
        <char>     <Date>    <char>     <char>      <char>     <Date>     <Date>     <Date>
1:        boat 2020-01-02    actual    against      course 2010-01-02 2012-01-02 2022-01-02
2:        body 2020-01-03       add        age       court 2010-01-03 2012-01-03 2022-01-03
3:        book 2020-01-04   address      agent       cover 2010-01-04 2012-01-04 2022-01-04


  DT2 <- melt(DT, id.vars = c("parent.name", "parent.dob"), measure=patterns(dob="^dob_", name="^child_"), value.factor=TRUE, variable.name =  "child")
  DT2

>  DT2
   parent.name parent.dob  child        dob    name
        <char>     <Date> <fctr>     <Date>  <char>
1:        boat 2020-01-02      1 2010-01-02  actual
2:        body 2020-01-03      1 2010-01-03     add
3:        book 2020-01-04      1 2010-01-04 address
4:        boat 2020-01-02      2 2012-01-02 against
5:        body 2020-01-03      2 2012-01-03     age
6:        book 2020-01-04      2 2012-01-04   agent
7:        boat 2020-01-02      3 2022-01-02  course
8:        body 2020-01-03      3 2022-01-03   court
9:        book 2020-01-04      3 2022-01-04   cover

  DT2 [child=="1", child:="boy"] [child=="2", child:="girl"][child=="3", child:="trans"]
 DT2

>   DT2
   parent.name parent.dob  child        dob    name
        <char>     <Date> <fctr>     <Date>  <char>
1:        boat 2020-01-02    boy 2010-01-02  actual
2:        body 2020-01-03    boy 2010-01-03     add
3:        book 2020-01-04    boy 2010-01-04 address
4:        boat 2020-01-02   girl 2012-01-02 against
5:        body 2020-01-03   girl 2012-01-03     age
6:        book 2020-01-04   girl 2012-01-04   agent
7:        boat 2020-01-02  trans 2022-01-02  course
8:        body 2020-01-03  trans 2022-01-03   court
9:        book 2020-01-04  trans 2022-01-04   cover
> 

在上面的代码中,我手动将新变量值重新分配给原始表头中使用的值。

所以问题是: 是否可以自动执行此步骤?

想象一下,如果有几十个这样的列融合成一个 - 你不想冒险通过手动重命名它们来偶然引入错误。

automation data.table reshape melt
2个回答
2
投票

使用

data.table
可以熔化,然后使用
tstrsplit
在再次浇铸之前分离柱子。

x <- melt(DT, id.vars = c("parent.name", "parent.dob"))
x[, c("variable", "name") := tstrsplit(variable, "_", fixed = TRUE)]
dcast(x, parent.name + parent.dob + name ~ variable)

结果

    parent.name parent.dob  name   child        dob
 1:        boat 2020-01-02   boy  actual 2010-01-02
 2:        boat 2020-01-02  girk    <NA> 2012-01-02
 3:        boat 2020-01-02  girl against       <NA>
 4:        boat 2020-01-02 trans  course 2022-01-02
 5:        body 2020-01-03   boy     add 2010-01-03
 6:        body 2020-01-03  girk    <NA> 2012-01-03
 7:        body 2020-01-03  girl     age       <NA>
 8:        body 2020-01-03 trans   court 2022-01-03
 9:        book 2020-01-04   boy address 2010-01-04
10:        book 2020-01-04  girk    <NA> 2012-01-04
11:        book 2020-01-04  girl   agent       <NA>
12:        book 2020-01-04 trans   cover 2022-01-04

1
投票

我认为您需要的开发版本支持

data.table::measure
。在那之前,也许你可以使用这个:

tidyr::pivot_longer(DT, -(1:2),
  names_pattern = "(.*)_(.*)", names_to = c(".value", "name"))
# # A tibble: 12 × 5
#    parent.name parent.dob name  child   dob       
#    <chr>       <chr>      <chr> <chr>   <chr>     
#  1 boat        2020-01-02 boy   actual  2010-01-02
#  2 boat        2020-01-02 girl  against <NA>      
#  3 boat        2020-01-02 trans course  2022-01-02
#  4 boat        2020-01-02 girk  <NA>    2012-01-02
#  5 body        2020-01-03 boy   add     2010-01-03
#  6 body        2020-01-03 girl  age     <NA>      
#  7 body        2020-01-03 trans court   2022-01-03
#  8 body        2020-01-03 girk  <NA>    2012-01-03
#  9 book        2020-01-04 boy   address 2010-01-04
# 10 book        2020-01-04 girl  agent   <NA>      
# 11 book        2020-01-04 trans cover   2022-01-04
# 12 book        2020-01-04 girk  <NA>    2012-01-04

数据

DT <- data.table::as.data.table(structure(list(parent.name = c("boat", "body", "book"), parent.dob = c("2020-01-02", "2020-01-03", "2020-01-04"), child_boy = c("actual", "add", "address"), child_girl = c("against", "age", "agent"), child_trans = c("course", "court", "cover"), dob_boy = c("2010-01-02", "2010-01-03", "2010-01-04"), dob_girk = c("2012-01-02", "2012-01-03", "2012-01-04"), dob_trans = c("2022-01-02", "2022-01-03", "2022-01-04")), row.names = c(NA, -3L), class = c("data.table", "data.frame")))
© www.soinside.com 2019 - 2024. All rights reserved.