这是来自NHANES处方药数据的数据:
SEQN RXDUSE RXDDRGID RXDCOUNT
1 93703 No NA
2 93704 No NA
3 93705 Yes d03740 3
4 93705 Yes d04532 3
5 93705 Yes d00325 3
6 93706 No NA
7 93707 No NA
8 93708 Yes d00689 3
9 93708 Yes d03821 3
10 93708 Yes d00746 3
我正在尝试采用这种格式:
seqn rxduse rxdcount rxddrgid_1 rxddrgid_2 rxddrgid_3 rxddrgid_4
93703 1 2 d00262 d04113 . .
93705 1 4 d00262 d04538 d00746 d03182
我试过这个:
*staxpiv <- stax2 %>%
group_by(SEQN) %>%
mutate(id = 1:n()) %>%
ungroup() %>%
pivot_wider(values_from = RXDDRGID,
names_from = RXDUSE,
names_prefix = 'id_')'type here'
这是输出:
SEQN RXDCOUNT id id_No id_Yes id_Refused `id_Don't know`
<dbl> <dbl> <int> <fct> <fct> <fct> <fct>
1 93703 NA 1 "" NA NA NA
2 93704 NA 1 "" NA NA NA
3 93705 3 1 NA d03740 NA NA
4 93705 3 2 NA d04532 NA NA
5 93705 3 3 NA d00325 NA NA
我仍然有一些 SEQN 编号出现两次,不知道如何解决这个问题。
你接近了。快速点 - 您不会从输入样本数据中获得输出示例值,但根据您的示例输入数据,这里是一种方法。请注意,这假设 RXDCOUNT 中的值是每个 SEQN 的总 RXDUSE:
library(dplyr)
library(tidyr)
# Sample data
stax2 <- read.table(text = "SEQN RXDUSE RXDDRGID RXDCOUNT
93703 No NA NA
93704 No NA NA
93705 Yes d03740 3
93705 Yes d04532 3
93705 Yes d00325 3
93706 No NA NA
93707 No NA NA
93708 Yes d00689 3
93708 Yes d03821 3
93708 Yes d00746 3", header = T)
staxpiv <- stax2 %>%
filter(RXDUSE == "Yes") %>%
group_by(SEQN) %>%
mutate(id = 1:n()) %>%
pivot_wider(values_from = RXDDRGID,
names_from = id,
names_prefix = 'rxddrgid_')
# # A tibble: 2 × 6
# # Groups: SEQN [2]
# SEQN RXDUSE RXDCOUNT rxddrgid_1 rxddrgid_2 rxddrgid_3
# <int> <chr> <int> <chr> <chr> <chr>
# 1 93705 Yes 3 d03740 d04532 d00325
# 2 93708 Yes 3 d00689 d03821 d00746