我有一个名为CSES(选举系统比较研究)的数据集,其中每一行对应于许多国家,在许多不同年份的个人(一次民意调查中的一次访谈)。
我需要创建一个变量来识别每个人投票的一方的意识形态,就像这个人所感知的那样。
然而,数据集通过字母A,B,C等识别每个方的感知意识形态(尽可能多的其他变量)。然后,当识别每个人投票的WHICH PARTY时,它具有唯一的代码编号,不同年份的这些字母不相符(即,同一方在不同年份可能会有不同的字母 - 当然,它不是不同国家的同一方,因为每个国家都有自己的政党)。
虚拟数据有助于澄清,复制和创建代码:
让我们说:
country = c(1,1,1,1,2,2,2,2,3,3,3,3)
年= c(2000,2000,2004,2004,2002,2002,2004,2008,2000,2000,2000,2000)
party_A_number = c(11,11,12,12,21,21,22,23,31,31,31,31)
party_B_number = c(12,12,11,11,22,22,21,22,32,32,32,32)
party_C_number = c(13,13,13,13,23,23,23,21,33,33,33,33)
party_voted = c(12,13,12,11,21,24,23,22,31,32,33,31)
ideology_party_A < - floor(runif(12,min = 1,max = 10))
ideology_party_B < - floor(runif(12,min = 1,max = 10))
ideology_party_C < - floor(runif(12,min = 1,max = 10))
让我们调用我想创建的变量“ideology_voted”:
我需要这样的东西:
IF party_A_number == party_voted THEN ideology_voted = ideology_party_A
IF party_B_number == party_voted,THEN ideology_voted == ideology_party_B
IF party_C_number == party_voted,THEN ideology_voted == ideology_party_C
真实数据集有9封信,每个国家(最多)有9个主要政党,数十个国家和选举年。因此,有一个代码,我可以迭代字母A-I而不是“如果投票的A方,然后...;如果投票的乙方然后.......“
然而,即使我尝试更长的重复代码(每个聚会信件的一个转换 - 这将给我8行代码),我也遇到了麻烦
library(tidyverse)
df <- tibble(
country = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
year = c(2000, 2000, 2004, 2004, 2002, 2002, 2004, 2008, 2000, 2000, 2000, 2000),
party_A_number = c(11, 11, 12, 12, 21, 21, 22, 23, 31, 31, 31, 31),
party_B_number = c(12, 12, 11, 11, 22, 22, 21, 22, 32, 32, 32, 32),
party_C_number = c(13, 13, 13, 13, 23, 23, 23, 21, 33, 33, 33, 33),
party_voted = c(12, 13, 12, 11, 21, 24, 23, 22, 31, 32, 33, 31),
ideology_party_A = floor(runif (12, min = 1, max = 10)),
ideology_party_B = floor(runif (12, min = 1, max = 10)),
ideology_party_C = floor(runif (12, min = 1, max = 10))
)
> df
# A tibble: 12 x 9
country year party_A_number party_B_number party_C_number party_voted ideology_party_A ideology_party_B
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2000 11 12 13 12 9 3
2 1 2000 11 12 13 13 2 6
3 1 2004 12 11 13 12 3 8
4 1 2004 12 11 13 11 7 8
5 2 2002 21 22 23 21 2 7
6 2 2002 21 22 23 24 8 2
7 2 2004 22 21 23 23 1 7
8 2 2008 23 22 21 22 7 7
9 3 2000 31 32 33 31 4 3
10 3 2000 31 32 33 32 7 5
11 3 2000 31 32 33 33 1 6
12 3 2000 31 32 33 31 2 1
# ... with 1 more variable: ideology_party_C <dbl>
看来你正在使用case_when
进行调节:
ideology_voted <- df %>% transmute(
ideology_voted = case_when(
party_A_number == party_voted ~ ideology_party_A,
party_B_number == party_voted ~ ideology_party_B,
party_C_number == party_voted ~ ideology_party_C,
TRUE ~ party_voted
)
)
> ideology_voted
# A tibble: 12 x 1
ideology_voted
<dbl>
1 3
2 7
3 3
4 8
5 2
6 24
7 8
8 7
9 4
10 5
11 6
12 2
请注意,case_when
的评估是懒惰的,因此使用第一个真实条件(如果发生多个实际上是真的,比如说)。