我有以下数据:
# Creating the dataframe
df <- data.frame(
patient_id = c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6),
dob = c("6/16/1926", "6/16/1926", "6/16/1926", "12/6/1935", "12/6/1935", "5/18/1938", "5/18/1938", "7/18/1944", "2/3/1949", "2/3/1949", "11/27/1960", "11/27/1960", "11/27/1960"),
sex = c("Female", "Female", "Female", "Female", "Female", "Male", "Male", "Male", "Female", "Female", "Male", "Male", "Male"),
race = c("Black or African American", NA, "White", NA, "White", "Asian", "White", "White", "Other", "White", NA, "White", "White")
)
# Displaying the dataframe
print(df)
有些患者的种族栏存在差异。如果有 2 个或多个条目,其中一个为 NA,我需要将 NA 条目替换为第一个非 NA 值。如果有 2 个或多个不相同的非 NA 条目,我需要将所有条目替换为“混合种族”。我怎样才能在 R tidyverse 中做到这一点?
我已经尝试过:
# Replace NA race values with the other race value if available
df<- df%>%
group_by(patient_id) %>%
mutate(
race = ifelse(
any(!is.na(race) & race != ""),
ifelse(all(is.na(race) | race == ""), NA, first(na.omit(race))),
race)
)
# Update the race column to "Mixed Race" only if multiple races are found for the same patient
df<- df%>%
group_by(patient_id) %>%
mutate(
race = ifelse(
n_distinct(race) > 1,
"Mixed Race",
race)
)
第一个将所有值替换为“白人”,第二个将所有值替换为“混合种族”。
我也尝试过:
# Update the table to replace NA values in the race column
patients_updated <- df%>%
group_by(patient_id) %>%
mutate(race = ifelse(any(!is.na(race)), first(na.omit(race)), race))
和
# Replace NA values in race column with corresponding non-NA race value for each patient
df<- df%>%
group_by(patient_id) %>%
mutate(race = ifelse(any(!is.na(race)), na.omit(race), race))
但我得到了相同的结果。
只要您首先处理
Mixed Race
,您的标记 NA
的方法就会有效。
library(tidyverse)
df |>
group_by(patient_id) |>
fill(race, .direction = 'downup') |>
mutate(
race = ifelse(
n_distinct(race) > 1,
"Mixed Race",
race)
)
#> # A tibble: 13 × 4
#> # Groups: patient_id [6]
#> patient_id dob sex race
#> <dbl> <chr> <chr> <chr>
#> 1 1 6/16/1926 Female Mixed Race
#> 2 1 6/16/1926 Female Mixed Race
#> 3 1 6/16/1926 Female Mixed Race
#> 4 2 12/6/1935 Female White
#> 5 2 12/6/1935 Female White
#> 6 3 5/18/1938 Male Mixed Race
#> 7 3 5/18/1938 Male Mixed Race
#> 8 4 7/18/1944 Male White
#> 9 5 2/3/1949 Female Mixed Race
#> 10 5 2/3/1949 Female Mixed Race
#> 11 6 11/27/1960 Male White
#> 12 6 11/27/1960 Male White
#> 13 6 11/27/1960 Male White
如果您要每
distinct()
查找一行,请按照
patient_id
进行操作
df |>
group_by(patient_id) |>
fill(race, .direction = 'downup') |>
mutate(
race = ifelse(
n_distinct(race) > 1,
"Mixed Race",
race)
) |>
distinct()
#> # A tibble: 6 × 4
#> # Groups: patient_id [6]
#> patient_id dob sex race
#> <dbl> <chr> <chr> <chr>
#> 1 1 6/16/1926 Female Mixed Race
#> 2 2 12/6/1935 Female White
#> 3 3 5/18/1938 Male Mixed Race
#> 4 4 7/18/1944 Male White
#> 5 5 2/3/1949 Female Mixed Race
#> 6 6 11/27/1960 Male White
创建于 2024-05-08,使用 reprex v2.0.2