分组数据,然后根据分组数据在R中填充一列

问题描述 投票:0回答:1

对于A,C,D,E,F和G列的任意组合,我试图在B列的值位于D列的值的5%之内时找到B列的值。一旦找到,我想粘贴该列组合所在的新列中的值。

这里是我正在使用的数据的示例:

structure(list(A = c(500L, 10000L, 5000L, 500L, 100L, 500L, 1000L, 
10000L, 5000L, 1000L, 500L, 5000L, 100L, 5000L, 500L, 500L, 500L, 
1000L, 10000L, 500L), B = c(1.53147891704226, 5.51999984066968, 
1.69897000433602, 3.49996186559619, 2.8668778143375, 2.27415784926368, 
2.69983772586725, 4.30000820255381, 4.28000895310819, 1.14612803567824, 
3.40001963506516, 4.88000138832177, 2.3747483460101, 4, 3.03342375548695, 
3.04999285692014, 2.59988307207369, 3.51666755909904, 4.40000234592796, 
2.82477646247555), C = c(0.118917162666339, 32.46875, 0.00120927734375, 
6.69645182291667e-06, 38.1009114583333, 0.03888505859375, 0.984812890625, 
181.953125, 0.0079256796875, 0.0397203010315885, 1.693359375, 
0.25630859375, 0.00419210611979167, 1.4658203125, 0.00764973958333333, 
0.294973113716194, 8.8974609375, 0.0014642802734375, 67.609375, 
0.00205580344395639), D = c(4.63125661725864, 34.1632795742744, 
0.262987871586425, 9.53427792464916e-06, 38.7106620745277, 0.187395038620314, 
0.99014163328848, 211.108639904501, 0.0108561099088211, 9.82604248822947, 
1.95692192890506, 0.262987871586425, 0.00616933538501461, 2.23297962243741, 
0.020686261349356, 0.53228350287947, 26.4570757028734, 0.00221508528097736, 
68.1735822402243, 0.00495578134094092), E = c(2, 2, 2, 100, 2, 
100, 2, 2, 100, 2, 2, 2, 2, 100, 100, 2, 2, 100, 100, 2), F = c(1e-05, 
1e-06, 1e-07, 1e-08, 1e-05, 1e-06, 1e-04, 1e-05, 1e-06, 1e-05, 
1e-06, 1e-07, 1e-07, 1e-07, 1e-08, 1e-06, 1e-06, 1e-06, 1e-05, 
1e-08), G = c("Effective Number of Haplotypes", "Number of Polymorphic Sites", 
"Gene Diversity", "Nucleotide Diversity", "Number of Heterozygotes", 
"Gene Diversity", "Gene Diversity", "Number of Polymorphic Sites", 
"Nucleotide Diversity", "Effective Number of Haplotypes", "Number of Haplotypes", 
"Gene Diversity", "Gene Diversity", "Number of Haplotypes", "Number of Polymorphic Sites", 
"Effective Number of Haplotypes", "Number of Heterozygotes", 
"Nucleotide Diversity", "Number of Heterozygotes", "Effective Number of Haplotypes"
)), .Names = c("A", "B", "C", "D", "E", "F", "G"), row.names = c("11025", 
"13649", "37612", "178511", "9864", "15883", "2469", "7104", 
"15089", "11140", "18719", "47812", "36151", "31315", "66810", 
"17609", "16501", "14975", "10860", "45318"), class = "data.frame")

我拥有的代码行是:

min(df[which(df$C>=(0.05*df$D) & df$G == 'Nucleotide Diversity' & df$F==1e-6 & df$A==5000 & df$E==100),]$B) 这将返回我想要的A,C,D,E,F和G列组合的一个数字。

问题/问题1:我坚持将此数字粘贴到新列H中,在该列中可以找到A,C,D,E,F和G的所有组合。

问题/问题2:有没有一种自动的方法可以在不插入df$G==df$F==df$A==df$E==的值的情况下执行此操作?

理想输出

A       B         C           D          E   F                   G                        H
500 1.531479 1.189172e-01 4.631257e+00   2   1e-05     Effective Number of Haplotypes
10000 5.520000 3.246875e+01 3.416328e+01   2   1e-06    Number of Polymorphic Sites
5000 1.698970 1.209277e-03 2.629879e-01   2   1e-07                 Gene Diversity
5000 3.499962 6.696452e-06 9.534278e-06  100   1e-06           Nucleotide Diversity         4.280009
100 2.866878 3.810091e+01 3.871066e+01   2   1e-05        Number of Heterozygotes
500 2.274158 3.888506e-02 1.873950e-01  100   1e-06                 Gene Diversity
1000 2.699838 9.848129e-01 9.901416e-01   2   1e-04                 Gene Diversity
10000 4.300008 1.819531e+02 2.111086e+02   2   1e-05    Number of Polymorphic Sites
5000 4.280009 7.925680e-03 1.085611e-02  100   1e-06           Nucleotide Diversity        4.280009
...

我以为有办法做到这一点,但是我不应该搜索这个词。

r
1个回答
0
投票

您的意思是这样的吗?

library(dplyr)
df %>% group_by(A, G) %>% mutate(H = min(B[C >= 0.05 * D]))

这将返回B的最小值,其中C值大于等于D的5%。也许您想在group_by中添加更多变量。

© www.soinside.com 2019 - 2024. All rights reserved.