我正在使用 R 中的数据集,并尝试根据每个女性与户主的关系来计算每个女性的孩子数量。该数据集包括家庭 ID、个人 ID、与户主的关系、年龄、性别和收入等变量。
HouseholdID IndividualID Relationshiptothehouseholdhead Age Gender Income
<dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 1 1 C 80 male 150
2 1 2 D 81 female 120
3 1 3 A 60 male 630
4 1 4 B 59 female 500
5 1 5 E3 35 male 380
6 1 6 F3 30 female 220
7 1 7 E5 33 female 170
8 1 8 F5 30 male 160
9 1 9 G32 20 female 290
10 1 10 G51 15 female 200
11 1 11 G52 12 female 100
12 1 12 G55 8 male 80
13 2 1 A 58 male 380
14 2 2 B 55 female 220
15 2 3 E1 35 male 170
16 2 4 F1 37 female 160
17 2 5 E2 33 male 290
18 2 6 F2 30 female 110
19 2 7 G21 17 female 210
20 2 8 G22 15 female 750
21 2 9 G23 12 female 350
表中提供的数据结构包括以下变量:
家庭ID:这是一个家庭的唯一标识符。
个人 ID :这是分配给家庭中每个人的唯一编号。
与户主的关系:用特定符号来表示个人与户主的关系。
年龄:个人的年龄。
性别:个人的性别,用“男”或“女”表示。
收入:个人的收入状况。
请根据表1中的数据生成类似于表2的数据集,并满足以下要求:
需要注意的是,孩子的数量主要是由字母后面的最大数字决定的,而不是简单地计算数据中观察值的数量。例如,在家庭 1 中,ID 等于 4 的个人应被视为有 5 个孩子,而不是 2 个。
结果应该如下:
HouseholdID IndividualID Age Gender Income Numofkids
1 2 81 female 120 1
1 4 59 female 500 5
1 6 30 female 220 2
1 7 33 female 170 3
1 9 35 female 290 0
1 10 15 female 200 0
1 11 12 female 100 0
2 2 55 female 220 2
2 4 37 female 160 0
2 6 30 female 110 3
2 7 17 female 210 0
2 8 15 female 750 0
2 9 12 female 350 0
这是数据
data = structure(list(HouseholdID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2), IndividualID = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9), Relationshiptothehouseholdhead = c("C",
"D", "A", "B", "E3", "F3", "E5", "F5", "G32", "G51", "G52", "G55",
"A", "B", "E1", "F1", "E2", "F2", "G21", "G22", "G23"), Age = c(80,
81, 60, 59, 35, 30, 33, 30, 20, 15, 12, 8, 58, 55, 35, 37, 33,
30, 17, 15, 12), Gender = c("male", "female", "male", "female",
"male", "female", "female", "male", "female", "female", "female",
"male", "male", "female", "male", "female", "male", "female",
"female", "female", "female"), Income = c(150, 120, 630, 500,
380, 220, 170, 160, 290, 200, 100, 80, 380, 220, 170, 160, 290,
110, 210, 750, 350)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -21L))
谢谢!
那么您想检测每个家庭的最高“E”?假设这个“E”是男主女配偶生的最后一个孩子?
也许你可以实现这个uning正则表达式