分组,聚合以根据条件创建新列

问题描述 投票:1回答:3

这是数据

DPS Comodity    Std Issue
111 Hard drive  No Post
111 MBD         NoBoot
111 LCD         Flicker
222 MBD         No Post
222 LCD         No Post
333 MBD         No power

我必须得到以下格式

DPS Comodity            Std Issue
111 Hard drive,MBD,LCD  Hard drive-No Post,MBD-NoBoot,LCD-Flicker
222 MBD,LCD                 No Post
333 MBD                 No Power

我试过aggregate(Std Issue~DPS,df,function(x)toString(uniqe(x))),但它导致Std问题为

No Post,No Boot, Flicker
No Post
No Power

这不符合我的要求,任何解决此类问题的建议都将非常有用和赞赏。

aggregate(Std Issue~DPS,df,function(x)toString(uniqe(x)))

要么

这是预期的结果

DPS Comodity            Std Issue
111 Hard drive,MBD,LCD  Hard drive-No Post,MBD-NoBoot,LCD-Flicker
222 MBD,LCD                 No Post
333 MBD                 No Power
r
3个回答
1
投票

你可以使用data.table包来做到这一点 -

  > library(data.table)
  > setDT(dt)[,Std_Issue:=paste0(Comodity,"-",Std.Issue)]
  > setDT(dt)[, list(Comodity = paste(Comodity, collapse=","),
             `Std Issue` = paste(Std_Issue, collapse=",")), by = DPS]

输出 -

DPS           Comodity                                 Std Issue
1: 111 Hard drive,MBD,LCD     Hard drive-No Post,MBD-NoBoot,LCD-Flicker
2: 222            MBD,LCD                   MBD-No Post,LCD-No Post
3: 333                MBD                              MBD-No power

输入数据-

dt <- read.table(text="DPS  Comodity    Std Issue
111 Hard drive  No Post
                 111    MBD NoBoot
                 111    LCD Flicker
                 222    MBD No Post
                 222    LCD No Post
                 333    MBD No power",header=T,sep="\t")

EDITED-

你可以在没有for loop-的情况下实现这一点

> setDT(dt)[,Std_Issue:=paste0(Comodity,"-",Std.Issue)]
> setDT(dt)[, list(Std_issue = ifelse(length(unlist(unique(lapply(str_split(Std_Issue,"-"),function(x)x[2]))))<3,paste(unique(`Std.Issue`), collapse=","),paste(Std_Issue, collapse=",")),Commodity=paste(Comodity, collapse=",")), by=DPS]

   DPS                            Std_issue                  Commodity
1: 111       Hard drive-No Post,MBD-NoBoot,LCD-Flicker   Hard drive,MBD,LCD
2: 222                              No Post                   MBD,LCD
3: 333                              No power                    MBD

0
投票

我们可以使用dplyr来应用于两个列,即

library(dplyr)
df %>% 
 group_by(DPS) %>% 
 summarise_all(funs(toString(unique(.))))

这使,

# A tibble: 3 x 3
    DPS Comodity             Std_Issue               
  <int> <chr>                <chr>                   
1   111 Hard_drive, MBD, LCD No_Post, NoBoot, Flicker
2   222 MBD, LCD             No_Post                 
3   333 MBD                  No_power

0
投票

最后我找到了解决方案:

test_df <- data.frame(DPS=c(111,111,111,222,222,333),comodity =c("HDD","MBD","LCD","MBD","LCD","MBD"),stdIss=c("No Post","No Boot","Flicker","No Post","No Post","No Power"))
A <- data.frame(tapply(test_df$comodity,test_df$DPS,FUN = function(x){toString(x)}))
B <- data.frame(tapply(test_df$stdIss,test_df$DPS,FUN=function(x{toString(unique(x))}))
C <- data.frame(A,B)
colnames(C)[1] <- "comodity"
colnames(C)[2] <- "Std Issue"
C$comodity <- strsplit(C$comodity, split = ",")

C$`Std Issue` <- strsplit(C$`Std Issue`,split = ",")
C$new <- NA

D <- list()

for(i in 1:nrow(C)){

   if(length(C$`Std Issue`[[i]])>1){for(j in 1:length(C$`Std Issue`[[i]]))
     {
       D[j]<- paste(C$comodity[[i]][j],C$`Std Issue`[[i]][j],sep = "-")
     }
       C$new[i]<-paste(D,collapse = ",")

     }
    else 
     { 
       C$new[i] <-paste(C$`Std Issue`[i])
     }
}
© www.soinside.com 2019 - 2024. All rights reserved.