这是一个基于超过3000万行的数据创建的玩具示例。
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), item = c("Job", "DOB", "organization", "info", "Job",
"DOB", "organization", "info", "Job", "DOB", "organization",
"info"), value = c("Assistant", "27395", "ABC", "Inspire others",
"Project manager", "27395", "CDE", "Inspire others", "Project manager",
"27395", "CDE", "Inspire others")), class = "data.frame", row.names = c(NA,
-12L))
我想创建每个id
一行的表,但是我遇到了问题。
table %>% pivot_wider(names_from = item, values_from = value)
上面的代码为我提供了以下结果:
# A tibble: 1 x 5
id Job DOB organization info
<int> <list> <list> <list> <list>
1 1 <chr [3]> <chr [3]> <chr [3]> <chr [3]>
所以,我尝试将它们与values_fn = list(value = paste)
合并为文本,但出现以下错误:
Error in `$<-.data.frame`(`*tmp*`, "val", value = c("Assistant", "Project manager", : replacement has 12 rows, data has 4
解决重复条目可能是字符和数字混合的问题的最佳方法是什么?
我们需要一个序列列来使行唯一,然后使用pivot_wider
library(dplyr)
library(data.table)
library(tidyr)
df %>%
mutate(rn = rowid(item)) %>%
pivot_wider(names_from = item, values_from = value) %>%
select(-rn)