使用查找表创建新变量

问题描述 投票:0回答:2

我想使用查找表创建一个新变量。数据框如下所示:

  id    sex     age length
   1    Female  1   45
   2    Female  2   54
   3    Female  3   56
   4    Female  4   60
   5    Female  5   60
   6    Female  6   61
   7    Female  7   63
   8    Male    1   55
   9    Male    2   54
   10   Male    3   58
   11   Male    4   61
   12   Male    5   65
   13   Male    6   63
   14   Male    7   65
   15   Male    8   67
   16   Male    9   68
   17   Male    10  69

并且查找表看起来像这样

sex    age  length
Female  1   50
Female  2   53
Female  3   56
Female  4   58
Female  5   60
Female  6   61
Female  7   63
Male    1   50
Male    2   54
Male    3   57
Male    4   60
Male    5   62
Male    6   63
Male    7   65
Male    8   66
Male    9   67
Male    10  69

我想创建一个具有两个级别的新变量growth.rate:“正常”和“低”,所以最终的数据帧看起来像这样,

id   sex   age  length  growth.rate
1   Female  1   45  Low
2   Female  2   54  Normal
3   Female  3   56  Low
4   Female  4   60  Normal
5   Female  5   60  Low
6   Female  6   61  Low
7   Female  7   63  Low
8   Male    1   55  Normal
9   Male    2   54  Low
10  Male    3   58  Normal
11  Male    4   61  Normal
12  Male    5   65  Normal
13  Male    6   63  Low
14  Male    7   65  Low
15  Male    8   67  Normal
16  Male    9   68  Normal
17  Male    10  69  Low

在此示例中,id 1的growth.rate为“ Low”,因为其长度小于1岁女性的查找表中的值。

相反,id 2的growth.rate为“ Normal”,因为她的长度大于2岁女性的查找表中的值。

我试图改编此解决方案,但未成功Getting contextstack overflow error - too many nested ifelse statements within for loop?

非常感谢您的帮助

r lookup sapply
2个回答
0
投票

如果我们在第一个和基于'sex','age的查找数据集之间进行left_join,我们将获得两个'length'列,在这些列之间进行比较,并使用ifelse或[C0创建一个新列]

case_when

library(dplyr) left_join(df1, lookup, by = c('sex', 'age')) %>% transmute(id, sex, age, growth.rate = case_when(length.x <= length.y ~ "Low", TRUE ~ "Normal"), length = length.x) # id sex age growth.rate length #1 1 Female 1 Low 45 #2 2 Female 2 Normal 54 #3 3 Female 3 Low 56 #4 4 Female 4 Normal 60 #5 5 Female 5 Low 60 #6 6 Female 6 Low 61 #7 7 Female 7 Low 63 #8 8 Male 1 Normal 55 #9 9 Male 2 Low 54 #10 10 Male 3 Normal 58 #11 11 Male 4 Normal 61 #12 12 Male 5 Normal 65 #13 13 Male 6 Low 63 #14 14 Male 7 Low 65 #15 15 Male 8 Normal 67 #16 16 Male 9 Normal 68 #17 17 Male 10 Low 69 中,可以使其更紧凑

data.table

或带有索引

library(data.table)
setDT(df1)[lookup, growth.rate := fcase(length <= i.length, "Low", 
           "Normal"), on = .(sex, age)]

数据

setDT(df1)[lookup, growth.rate := 
       c("Normal", "Low")[1 + (length <= i.length)], on = .(sex, age)]

0
投票

在基数R中,我们可以使用df1 <- structure(list(id = 1:17, sex = c("Female", "Female", "Female", "Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male"), age = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L ), length = c(45L, 54L, 56L, 60L, 60L, 61L, 63L, 55L, 54L, 58L, 61L, 65L, 63L, 65L, 67L, 68L, 69L)), class = "data.frame", row.names = c(NA, -17L)) lookup <- structure(list(sex = c("Female", "Female", "Female", "Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male"), age = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L ), length = c(50L, 53L, 56L, 58L, 60L, 61L, 63L, 50L, 54L, 57L, 60L, 62L, 63L, 65L, 66L, 67L, 69L)), class = "data.frame", row.names = c(NA, -17L)) 通过mergesex合并两个数据帧,并通过使用age检查条件来创建新列。

ifelse

您可以删除不需要的列。

© www.soinside.com 2019 - 2024. All rights reserved.