我目前正在处理一个数据项目,需要压缩数据,以便数据框中的名称不重复。问题是某些数据行重复。下面我附上了一个示例数据框:
df_have <- data.frame(
Name = c("Maya", "Maya", "Maya," "Sierra", "Sophia", "Sophia", "Sophia",
"Sophia", "Cecilia", "Cecilia"),
ID = c(24, 56, 24, 54, 12, 12, 15, 24, 12, 11)
)
这是所需的数据框:
df_want <- data.frame(
Name = c("Maya", "Sierra", "Sophia", "Cecilia"),
ID1 = c(24, 54, 12, 12),
ID2 = c(56, 0, 15, 11),
ID3 = c(0, 0, 24, 0)
)
我之前发布过一个与此非常相似的问题。由此看来,我当前对数据执行的转换如下:
ids |>
mutate(idno = row_number(), .by = Name) |>
pivot_wider(
values_from = ID,
names_from = idno,
values_fill = 0,
names_prefix = "ID"
)
但是,这并不排除重复值。我正在使用 R 来转换数据。里面唯一的命令
pivot_wider
我所熟悉的重复项仅适用于列名称,而不适用于条目本身。另外,我已经尝试过
duplicated
命令,但这删除了所有重复项,而不仅仅是所需的重复项。预先感谢您的帮助。
您关于使用不同的评论是扭曲的,显然没有意义,因为它产生的数据正是您声明想要的数据。 您的示例是否不能完全代表您的问题?
df_have <- data.frame(
Name = c("Maya", "Maya", "Maya", "Sierra", "Sophia", "Sophia", "Sophia",
"Sophia", "Cecilia", "Cecilia"),
ID = c(24, 56, 24, 54, 12, 12, 15, 24, 12, 11)
)
# And here is the desired data frame:
(df_want <- tibble(
Name = c("Maya", "Sierra", "Sophia", "Cecilia"),
ID1 = c(24, 54, 12, 12),
ID2 = c(56, 0, 15, 11),
ID3 = c(0, 0, 24, 0)
))
# I previously posted a question very similar to this. From that, the transformation I am currently performing on the data is as follows:
df_result <- df_have |> distinct() |>
mutate(idno = row_number(), .by = Name) |>
pivot_wider(
values_from = ID,
names_from = idno,
values_fill = 0,
names_prefix = "ID"
)
identical(df_want,df_result)
否则,快速循环可以工作
df_have$ID_names <- 1
for(i in 2:nrow(df_have)){
if(df_have[i,1]!=df_have[i-1,1]){
df_have[i,3] <- 1
}else {df_have[i,3] <- df_have[i-1,3]+1
}
}
df_have$ID_names <- paste("ID",df_have$ID_names,sep="")
library(tidyr)
df_want <- spread(df_have,key = ID_names,value = ID)
df_want[is.na(df_want)] <- 0