如何将R中的两个melt/gather/pivot_longer命令合并为一个命令?

问题描述 投票:0回答:1

据我了解,我有一个相当简单的问题,但到目前为止我找不到解决方案。

我有一个数据集,其中有 4 列以时间 (time1, time2...) 开头,4 列以计数 (count1, count2...) 开头。

#Step 1: Create data
time_cols <- data.frame(time1 = sample(1:50, 10, replace = TRUE),
                        time2 = sample(1:50, 10, replace = TRUE),
                        time3 = sample(1:50, 10, replace = TRUE),
                        time4 = sample(1:50, 10, replace = TRUE))

count_cols <- data.frame(count1 = sample(1:4, 10, replace = TRUE),
                         count2 = sample(1:4, 10, replace = TRUE),
                         count3 = sample(1:4, 10, replace = TRUE),
                         count4 = sample(1:4, 10, replace = TRUE))

# Step 2: Create the "id" column
id <- 1:10

# Step 3: Combine all the columns to form the dataframe
df <- cbind(time_cols, count_cols, id)


   time1 time2 time3 time4 count1 count2 count3 count4 id
1     40    47    24    48      1      2      3      2  1
2     16    15    39    16      1      4      2      1  2
3     16    41    16    21      1      3      1      4  3
4     16    47    14     3      4      2      1      1  4
5     31    28    29    30      3      4      4      1  5
6      5    15    41    13      4      2      2      4  6
7     46    19    29    30      1      1      3      1  7
8     28    43    10    27      2      2      3      2  8
9     23    37    35    49      2      4      2      2  9
10    43    28     6    20      3      3      3      1 10

我想在一个命令内将数据集从宽变为长:我想要 将时间列收集到变量列“timepoint”和值列“time_count_value”中。同时,我也想以同样的方式将计数列收集到“count_no”和“count_value”中。输出应如下所示:

  timepoint time_count_value count_no count_value id
1      time1               40   count1           1  1
2      time2               47   count2           2  1
3      time3               24   count3           3  1
4      time4               48   count4           2  1
5      time1               16   count1           1  2
6      time2               15   count2           4  2
7      time3               39   count3           2  2
8      time4               16   count4           1  2
9      time1               16   count1           1  3
10     time2               41   count2           3  3

所以我不想将多个列收集到一个(对于这个问题,stackoverflow上有多个答案),但以某种方式我想执行“同时执行两个熔化/收集功能”。

我尝试了以下熔化:

example_trials2 <- setnames(
  reshape2::melt(example_trials2, measure = data.table:::patterns("^time", "^count"),
                 value.name =  c("time_count_value", "count_value"),
                 variable.name = c("timepoint", "count_no")))

但是,它首先告诉我“找不到函数“模式”。因此我在模式之前添加了 data.table::: ,现在它说“错误:找不到模式”。我还尝试重新安装 data.table 和安装旧版本的软件包,但没有成功。

我相信这个函数实际上正在做我正在寻找的事情,但我不明白为什么它总是抛出正则表达式/模式的错误。

所以:我很高兴听到如何用模式修复错误,或者如果您对使用melt或其他包有不同的想法,例如枢轴_更长。

r data.table multiple-columns reshape2 melt
1个回答
0
投票

这是使用

reshape
+
transform

的基本 R 选项
dfout <- transform(
    reshape(
        setNames(df, gsub("(\\d+)", ".\\1", names(df))),
        direction = "long",
        idvar = "id",
        varying = -length(df),
        timevar = "grp"
    ),
    timepoint = paste0("time", grp),
    count_no = paste0("count", grp)
)[c("timepoint", "time", "count_no", "count", "id")]


dfout <- `row.names<-`(dfout[order(dfout$id), ], NULL)

你会看到

> head(dfout, 10)
   timepoint time count_no count id
1      time1   14   count1     4  1
2      time2   21   count2     3  1
3      time3   41   count3     2  1
4      time4   33   count4     4  1
5      time1    4   count1     2  2
6      time2   21   count2     4  2
7      time3   25   count3     2  2
8      time4   20   count4     2  2
9      time1   39   count1     4  3
10     time2   42   count2     2  3

数据

> dput(df)
structure(list(time1 = c(14L, 4L, 39L, 1L, 34L, 23L, 43L, 14L,
18L, 33L), time2 = c(21L, 21L, 42L, 46L, 10L, 7L, 9L, 15L, 21L,
37L), time3 = c(41L, 25L, 46L, 37L, 37L, 34L, 42L, 25L, 44L,
15L), time4 = c(33L, 20L, 35L, 6L, 10L, 42L, 38L, 47L, 20L, 28L
), count1 = c(4L, 2L, 4L, 1L, 3L, 2L, 1L, 4L, 4L, 1L), count2 = c(3L,
4L, 2L, 2L, 3L, 3L, 2L, 2L, 4L, 4L), count3 = c(2L, 2L, 1L, 2L, 
2L, 2L, 2L, 1L, 3L, 3L), count4 = c(4L, 2L, 3L, 3L, 4L, 2L, 4L,
3L, 3L, 1L), id = 1:10), class = "data.frame", row.names = c(NA,
-10L))

> df
   time1 time2 time3 time4 count1 count2 count3 count4 id
1     14    21    41    33      4      3      2      4  1
2      4    21    25    20      2      4      2      2  2
3     39    42    46    35      4      2      1      3  3
4      1    46    37     6      1      2      2      3  4
5     34    10    37    10      3      3      2      4  5
6     23     7    34    42      2      3      2      2  6
7     43     9    42    38      1      2      2      4  7
8     14    15    25    47      4      2      1      3  8
9     18    21    44    20      4      4      3      3  9
10    33    37    15    28      1      4      3      1 10
© www.soinside.com 2019 - 2024. All rights reserved.