据我了解,我有一个相当简单的问题,但到目前为止我找不到解决方案。
我有一个数据集,其中有 4 列以时间 (time1, time2...) 开头,4 列以计数 (count1, count2...) 开头。
#Step 1: Create data
time_cols <- data.frame(time1 = sample(1:50, 10, replace = TRUE),
time2 = sample(1:50, 10, replace = TRUE),
time3 = sample(1:50, 10, replace = TRUE),
time4 = sample(1:50, 10, replace = TRUE))
count_cols <- data.frame(count1 = sample(1:4, 10, replace = TRUE),
count2 = sample(1:4, 10, replace = TRUE),
count3 = sample(1:4, 10, replace = TRUE),
count4 = sample(1:4, 10, replace = TRUE))
# Step 2: Create the "id" column
id <- 1:10
# Step 3: Combine all the columns to form the dataframe
df <- cbind(time_cols, count_cols, id)
time1 time2 time3 time4 count1 count2 count3 count4 id
1 40 47 24 48 1 2 3 2 1
2 16 15 39 16 1 4 2 1 2
3 16 41 16 21 1 3 1 4 3
4 16 47 14 3 4 2 1 1 4
5 31 28 29 30 3 4 4 1 5
6 5 15 41 13 4 2 2 4 6
7 46 19 29 30 1 1 3 1 7
8 28 43 10 27 2 2 3 2 8
9 23 37 35 49 2 4 2 2 9
10 43 28 6 20 3 3 3 1 10
我想在一个命令内将数据集从宽变为长:我想要 将时间列收集到变量列“timepoint”和值列“time_count_value”中。同时,我也想以同样的方式将计数列收集到“count_no”和“count_value”中。输出应如下所示:
timepoint time_count_value count_no count_value id
1 time1 40 count1 1 1
2 time2 47 count2 2 1
3 time3 24 count3 3 1
4 time4 48 count4 2 1
5 time1 16 count1 1 2
6 time2 15 count2 4 2
7 time3 39 count3 2 2
8 time4 16 count4 1 2
9 time1 16 count1 1 3
10 time2 41 count2 3 3
所以我不想将多个列收集到一个(对于这个问题,stackoverflow上有多个答案),但以某种方式我想执行“同时执行两个熔化/收集功能”。
我尝试了以下熔化:
example_trials2 <- setnames(
reshape2::melt(example_trials2, measure = data.table:::patterns("^time", "^count"),
value.name = c("time_count_value", "count_value"),
variable.name = c("timepoint", "count_no")))
但是,它首先告诉我“找不到函数“模式”。因此我在模式之前添加了 data.table::: ,现在它说“错误:找不到模式”。我还尝试重新安装 data.table 和安装旧版本的软件包,但没有成功。
我相信这个函数实际上正在做我正在寻找的事情,但我不明白为什么它总是抛出正则表达式/模式的错误。
所以:我很高兴听到如何用模式修复错误,或者如果您对使用melt或其他包有不同的想法,例如枢轴_更长。
这是使用
reshape
+ transform
的基本 R 选项
dfout <- transform(
reshape(
setNames(df, gsub("(\\d+)", ".\\1", names(df))),
direction = "long",
idvar = "id",
varying = -length(df),
timevar = "grp"
),
timepoint = paste0("time", grp),
count_no = paste0("count", grp)
)[c("timepoint", "time", "count_no", "count", "id")]
dfout <- `row.names<-`(dfout[order(dfout$id), ], NULL)
你会看到
> head(dfout, 10)
timepoint time count_no count id
1 time1 14 count1 4 1
2 time2 21 count2 3 1
3 time3 41 count3 2 1
4 time4 33 count4 4 1
5 time1 4 count1 2 2
6 time2 21 count2 4 2
7 time3 25 count3 2 2
8 time4 20 count4 2 2
9 time1 39 count1 4 3
10 time2 42 count2 2 3
> dput(df)
structure(list(time1 = c(14L, 4L, 39L, 1L, 34L, 23L, 43L, 14L,
18L, 33L), time2 = c(21L, 21L, 42L, 46L, 10L, 7L, 9L, 15L, 21L,
37L), time3 = c(41L, 25L, 46L, 37L, 37L, 34L, 42L, 25L, 44L,
15L), time4 = c(33L, 20L, 35L, 6L, 10L, 42L, 38L, 47L, 20L, 28L
), count1 = c(4L, 2L, 4L, 1L, 3L, 2L, 1L, 4L, 4L, 1L), count2 = c(3L,
4L, 2L, 2L, 3L, 3L, 2L, 2L, 4L, 4L), count3 = c(2L, 2L, 1L, 2L,
2L, 2L, 2L, 1L, 3L, 3L), count4 = c(4L, 2L, 3L, 3L, 4L, 2L, 4L,
3L, 3L, 1L), id = 1:10), class = "data.frame", row.names = c(NA,
-10L))
> df
time1 time2 time3 time4 count1 count2 count3 count4 id
1 14 21 41 33 4 3 2 4 1
2 4 21 25 20 2 4 2 2 2
3 39 42 46 35 4 2 1 3 3
4 1 46 37 6 1 2 2 3 4
5 34 10 37 10 3 3 2 4 5
6 23 7 34 42 2 3 2 2 6
7 43 9 42 38 1 2 2 4 7
8 14 15 25 47 4 2 1 3 8
9 18 21 44 20 4 4 3 3 9
10 33 37 15 28 1 4 3 1 10