基于R中列内因子级别的层次偏好的子数据集

问题描述 投票:2回答:2

我有一个数据框,我想根据一列中因子水平的层次结构偏好对它进行子集化。在下面的示例中,我想显示每个级别的“ ID”,我只想选择一个“方法”。具体而言,如果可能保留CACL,则如果此级别不存在CACL,则为“ KCL”的子集,如果不存在,则为“ H2O”的子集。

ID<-c(1,1,1,2,2,3)
method<-c("CACL","KCL","H2O","H2O","KCL","H2O")
df1<-data.frame(ID,sample,method)

  ID  method
1  1    CACL
2  1     KCL
3  1     H2O
4  2     H2O
5  2     KCL
6  3     H2O

ID<-c(1,2,3)
method<-c("CACL","KCL","H2O")
df2<-data.frame(ID,sample,method)

  ID  method
1  1    CACL
2  2     KCL
3  3     H2O

我通过选择一个级别内的最小数字来完成类似的子设置,但是无法适应它。想知道我是否也应该在这里使用ifelse吗?

#if present, choose rows containing "number" 2 instead of 1 (this column contained only the two numbers 1 and 2)

library(dplyr)
new<-df %>%
group_by(col1,col2,col3) %>%
summarize(number = ifelse(any(number > 1), min(number[number>1]),1))
dfnew<-merge(new,df,by=c("colxyz","number"),all.x=T)
r dataframe subset
2个回答
2
投票

您可以将ordermatch一起使用,然后简单地将!duplicated

df1 <- df1[order(match(df1$method, c("CACL","KCL","H2O"))),]
df1[!duplicated(df1$ID),]
#  ID method
#1  1   CACL
#5  2    KCL
#6  3    H2O

#Variant not changing df1
i <- order(match(df1$method, c("CACL","KCL","H2O")))
df1[i[!duplicated(df1$ID[i])],]

0
投票

使用dplyr的选项:

df1 %>% 
  mutate(preference = match(method,  c("CACL","KCL","H2O"))) %>% 
  group_by(ID) %>% 
  filter(preference == min(preference)) %>% 
  select(-preference)

# A tibble: 3 x 2
# Groups:   ID [3]
     ID method
  <dbl> <fct> 
1     1 CACL  
2     2 KCL   
3     3 H2O 
© www.soinside.com 2019 - 2024. All rights reserved.