我有一个数据集,其中包含从 HHID 到 hum_aid 的列,并希望根据汇款、govt_aid 和 hum_aid 列排名前 2 的条件创建列
desired
,即前 2 列应至少是 3 列之一,如以下数据帧所示:
df <- structure(list(HHID = c(1, 2, 3, 4, 5, 6, 7, 8, 9), empl = c(1000,
1750, 480, 630, 1200, 1000, 1700, 500, 300), agric = c(750, 650,
400, 400, 0, 750, 0, 0, 750), remittances = c(0, 200, 1500, 1250,
0, 500, 700, 900, 1200), govt_aid = c(0, 0, 1200, 1000, 0, 0,
0, 900, 1300), loans = c(0, 350, 300, 0, 0, 1200, 500, 250, 500
), assets = c(500, 0, 0, 0, 700, 0, 0, 500, 150), hum_aid = c(0,
0, 800, 1200, 0, 0, NA, 1400, 1500), desired = c(0, 0, 1, 1,
0, 0, 0, 1, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-9L))
PS:我希望有一个不涉及重塑的解决方案。 预先感谢!
这是一个相对简单的方法:
df |>
# for each row,
rowwise() |>
# check if remittances, government aid or humanitarian aid is equal to the top two values. If in more than one column, the answer is yes, then TRUE
mutate(d = sum(map_lgl(c(remittances, govt_aid, hum_aid), ~ .x %in% tail(sort(c_across(empl:hum_aid)), 2))) > 1) |>
ungroup()
输出:
# A tibble: 9 × 10
HHID empl agric remittances govt_aid loans assets hum_aid desired d
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 1 1000 750 0 0 0 500 0 0 FALSE
2 2 1750 650 200 0 350 0 0 0 FALSE
3 3 480 400 1500 1200 300 0 800 1 TRUE
4 4 630 400 1250 1000 0 0 1200 1 TRUE
5 5 1200 0 0 0 0 700 0 0 FALSE
6 6 1000 750 500 0 1200 0 0 0 FALSE
7 7 1700 0 700 0 500 0 NA 0 FALSE
8 8 500 0 900 900 250 500 1400 1 TRUE
9 9 300 750 1200 1300 500 150 1500 1 TRUE
一次性计算所有行的解决方案。我假设您想要
desired=1
如果提到的列中的 any 排名为 1-2。
library(dplyr)
ranks <- df |>
transmute(across(c(remittances, govt_aid, hum_aid),
~ rank(-.x, ties.method = "min") <= 2)) |>
rowSums()
df |>
mutate(desired2 = +(ranks > 0))
# # A tibble: 9 × 10
# HHID empl agric remittances govt_aid loans assets hum_aid desired desired2
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 1 1000 750 0 0 0 500 0 0 0
# 2 2 1750 650 200 0 350 0 0 0 0
# 3 3 480 400 1500 1200 300 0 800 1 1
# 4 4 630 400 1250 1000 0 0 1200 1 1
# 5 5 1200 0 0 0 0 700 0 0 0
# 6 6 1000 750 500 0 1200 0 0 0 0
# 7 7 1700 0 700 0 500 0 NA 0 0
# 8 8 500 0 900 900 250 500 1400 1 1
# 9 9 300 750 1200 1300 500 150 1500 1 1