我有一个数据框,我想根据一列中因子水平的层次结构偏好对它进行子集化。在下面的示例中,我想显示每个级别的“ ID”,我只想选择一个“方法”。具体而言,如果可能保留CACL,则如果此级别不存在CACL,则为“ KCL”的子集,如果不存在,则为“ H2O”的子集。
ID<-c(1,1,1,2,2,3)
method<-c("CACL","KCL","H2O","H2O","KCL","H2O")
df1<-data.frame(ID,sample,method)
ID method
1 1 CACL
2 1 KCL
3 1 H2O
4 2 H2O
5 2 KCL
6 3 H2O
ID<-c(1,2,3)
method<-c("CACL","KCL","H2O")
df2<-data.frame(ID,sample,method)
ID method
1 1 CACL
2 2 KCL
3 3 H2O
我通过选择一个级别内的最小数字来完成类似的子设置,但是无法适应它。想知道我是否也应该在这里使用ifelse吗?
#if present, choose rows containing "number" 2 instead of 1 (this column contained only the two numbers 1 and 2)
library(dplyr)
new<-df %>%
group_by(col1,col2,col3) %>%
summarize(number = ifelse(any(number > 1), min(number[number>1]),1))
dfnew<-merge(new,df,by=c("colxyz","number"),all.x=T)
您可以将order
与match
一起使用,然后简单地将!duplicated
:
df1 <- df1[order(match(df1$method, c("CACL","KCL","H2O"))),]
df1[!duplicated(df1$ID),]
# ID method
#1 1 CACL
#5 2 KCL
#6 3 H2O
#Variant not changing df1
i <- order(match(df1$method, c("CACL","KCL","H2O")))
df1[i[!duplicated(df1$ID[i])],]
使用dplyr
的选项:
df1 %>%
mutate(preference = match(method, c("CACL","KCL","H2O"))) %>%
group_by(ID) %>%
filter(preference == min(preference)) %>%
select(-preference)
# A tibble: 3 x 2
# Groups: ID [3]
ID method
<dbl> <fct>
1 1 CACL
2 2 KCL
3 3 H2O