这是数据
DPS Comodity Std Issue
111 Hard drive No Post
111 MBD NoBoot
111 LCD Flicker
222 MBD No Post
222 LCD No Post
333 MBD No power
我必须得到以下格式
DPS Comodity Std Issue
111 Hard drive,MBD,LCD Hard drive-No Post,MBD-NoBoot,LCD-Flicker
222 MBD,LCD No Post
333 MBD No Power
我试过aggregate(Std Issue~DPS,df,function(x)toString(uniqe(x)))
,但它导致Std问题为
No Post,No Boot, Flicker
No Post
No Power
这不符合我的要求,任何解决此类问题的建议都将非常有用和赞赏。
aggregate(Std Issue~DPS,df,function(x)toString(uniqe(x)))
要么
这是预期的结果
DPS Comodity Std Issue
111 Hard drive,MBD,LCD Hard drive-No Post,MBD-NoBoot,LCD-Flicker
222 MBD,LCD No Post
333 MBD No Power
你可以使用data.table
包来做到这一点 -
> library(data.table)
> setDT(dt)[,Std_Issue:=paste0(Comodity,"-",Std.Issue)]
> setDT(dt)[, list(Comodity = paste(Comodity, collapse=","),
`Std Issue` = paste(Std_Issue, collapse=",")), by = DPS]
输出 -
DPS Comodity Std Issue
1: 111 Hard drive,MBD,LCD Hard drive-No Post,MBD-NoBoot,LCD-Flicker
2: 222 MBD,LCD MBD-No Post,LCD-No Post
3: 333 MBD MBD-No power
输入数据-
dt <- read.table(text="DPS Comodity Std Issue
111 Hard drive No Post
111 MBD NoBoot
111 LCD Flicker
222 MBD No Post
222 LCD No Post
333 MBD No power",header=T,sep="\t")
EDITED-
你可以在没有for loop
-的情况下实现这一点
> setDT(dt)[,Std_Issue:=paste0(Comodity,"-",Std.Issue)]
> setDT(dt)[, list(Std_issue = ifelse(length(unlist(unique(lapply(str_split(Std_Issue,"-"),function(x)x[2]))))<3,paste(unique(`Std.Issue`), collapse=","),paste(Std_Issue, collapse=",")),Commodity=paste(Comodity, collapse=",")), by=DPS]
DPS Std_issue Commodity
1: 111 Hard drive-No Post,MBD-NoBoot,LCD-Flicker Hard drive,MBD,LCD
2: 222 No Post MBD,LCD
3: 333 No power MBD
我们可以使用dplyr
来应用于两个列,即
library(dplyr)
df %>%
group_by(DPS) %>%
summarise_all(funs(toString(unique(.))))
这使,
# A tibble: 3 x 3 DPS Comodity Std_Issue <int> <chr> <chr> 1 111 Hard_drive, MBD, LCD No_Post, NoBoot, Flicker 2 222 MBD, LCD No_Post 3 333 MBD No_power
最后我找到了解决方案:
test_df <- data.frame(DPS=c(111,111,111,222,222,333),comodity =c("HDD","MBD","LCD","MBD","LCD","MBD"),stdIss=c("No Post","No Boot","Flicker","No Post","No Post","No Power"))
A <- data.frame(tapply(test_df$comodity,test_df$DPS,FUN = function(x){toString(x)}))
B <- data.frame(tapply(test_df$stdIss,test_df$DPS,FUN=function(x{toString(unique(x))}))
C <- data.frame(A,B)
colnames(C)[1] <- "comodity"
colnames(C)[2] <- "Std Issue"
C$comodity <- strsplit(C$comodity, split = ",")
C$`Std Issue` <- strsplit(C$`Std Issue`,split = ",")
C$new <- NA
D <- list()
for(i in 1:nrow(C)){
if(length(C$`Std Issue`[[i]])>1){for(j in 1:length(C$`Std Issue`[[i]]))
{
D[j]<- paste(C$comodity[[i]][j],C$`Std Issue`[[i]][j],sep = "-")
}
C$new[i]<-paste(D,collapse = ",")
}
else
{
C$new[i] <-paste(C$`Std Issue`[i])
}
}