group1 group2 value
chr1 a 1
chr1 a 1
chr1 a 1
chr1 b 2.2
chr1 b 2.5
chr1 b 2.5
chr1 b 2.8
chr2 c 3.1
chr2 c -3.2
chr2 c -3.7
chr2 c -3.1
chr2 d 4
对于属于同一组2和组1的“值”列中的值,如果有3个以上的连续值大于2或小于-2,则将计算这些值的平均值,否则将保留原始值。] >输出应为:
group1 group2 value mean chr1 a 1 1 # does not change because it's smaller than 2 chr1 a 1 1 chr1 a 1 1 chr1 b 2.2 2.5 # mean of 2.2, 2.5, 2.5, 2.8 chr1 b 2.5 2.5 chr1 b 2.5 2.5 chr1 b 2.8 2.5 chr2 c 3.1 3.1 # not used for mean calculation above (different group) chr2 c -3.2 -3.3 # mean of -3.2, -3.7, -3.1 chr2 c -3.7 -3.3 chr2 c -3.1 -3.3 chr2 d 4 4
问题略有不同。进一步的问题
如下图所示,对于属于相同group2和group1的“值”列中的值,如果“值”列中有3个以上的连续值大于2或小于-2,则“均值”计算“ log2FC”列中相应值的“ |”,否则
保留“值”列中的原始值。
所以:类似于我的真实数据的数据格式是:
group1 group2 value log2FC chr1 a 1 1.1 chr1 a 1 1.1 chr1 a 1 1.1 chr1 b 2.2 2 chr1 b 2.5 3 chr1 b 2.5 3 chr1 b 2.8 4 chr2 c 3.1 1 chr2 c -3.2 2 chr2 c -3.7 3 chr2 c -3.1 4 chr2 d 4 1
我想要的输出是:
group1 group2 value log2FC mean chr1 a 1 1.1 1 # the original value in column "value" kept chr1 a 1 1.1 1 chr1 a 1 1.1 1 chr1 b 2.2 2 3 # mean of 2,3,3,4 in column "log2FC" because there are more than 3 numbers (belonging to group2 and group1) bigger than 2 in column "value" chr1 b 2.5 3 3 # mean of 2,3,3,4 in column "log2FC" chr1 b 2.5 3 3 # mean of 2,3,3,4 in column "log2FC" chr1 b 2.8 4 3 # mean of 2,3,3,4 in column "log2FC" chr2 c 3.1 1 3.1 # not count into mean calculation (different group) chr2 c -3.2 2 3 # mean of 2,3,4 from column "log2FC" chr2 c -3.7 3 3 # mean of 2,3,4 from column chr2 c -3.1 4 3 # mean of 2,3,4 from column chr2 d 4 1 1
感谢您的任何帮助。这是我的df(data.frame):group1 group2值chr1 a 1 chr1 a 1 chr1 a 1 chr1 b 2.2 chr1 b 2.5 chr1 b 2.5 chr1 b 2 ....
DF
,使用data.table中的rleid
创建一个分组变量。否则不使用data.table。然后创建一个使用问题规则的均值函数。最后,对value
的每个分量将均值应用于g
。