我有一个数据框“alldata”,其中的情况是组织,列中的虚拟变量名称为 Algeria_1954 - Country_year。该虚拟变量上的“1”表示该组织在该年存在于给定国家/地区,例如 1954 年在阿尔及利亚。我想将另一个 df“colonies_df”视为创建虚拟变量的查找表。它有一个列“国家”,其中包含国家名称(例如“阿尔及利亚”),还有一个列表示某个国家在给定年份在殖民帝国中的存在,例如 col..france.1954 - col..empire。年。我想在 alldata 中创建新的虚拟对象,看起来像 colpresence..[empire].[year],基于 alldata 案例(组织)在具有 Country_year 的虚拟对象上得分一分,并且该国家/地区在 col..empire.year 上得分一分具体帝国和年份。我一直在尝试在 R 中编写一个函数,也在 Chat GPT 的帮助下,但没有成功。我真的很感激帮助
These are the variables in colonies_df, with all the empires and years I want to use for my dummy variables:
> head(colonies_df)
# A tibble: 6 x 41
country col..belgium.19~ col..belgium.19~ col..belgium.19~ col..belgium.20~
<chr> <dbl> <dbl> <dbl> <dbl>
1 Afghan~ 0 0 0 0
2 Albania 0 0 0 0
3 Algeria 0 0 0 0
4 Andorra 0 0 0 0
5 Angola 0 0 0 0
6 Antigu~ 0 0 0 0
# ... with 36 more variables: col..belgium.2017 <dbl>, col..britain.1954 <dbl>,
# col..britain.1970 <dbl>, col..britain.1988 <dbl>, col..britain.2003 <dbl>,
# col..britain.2017 <dbl>, col..france.1954 <dbl>, col..france.1970 <dbl>,
# col..france.1988 <dbl>, col..france.2003 <dbl>, col..france.2017 <dbl>,
# col..germany.1954 <dbl>, col..germany.1970 <dbl>, col..germany.1988 <dbl>,
# col..germany.2003 <dbl>, col..germany.2017 <dbl>, col..italy.1954 <dbl>,
# col..italy.1970 <dbl>, col..italy.1988 <dbl>, col..italy.2003 <dbl>,
# col..italy.2017 <dbl>, col..netherlands.1954 <dbl>, col..netherlands.1970 <dbl>,
# col..netherlands.1988 <dbl>, col..netherlands.2003 <dbl>,
# col..netherlands.2017 <dbl>, col..portugal.1954 <dbl>, col..portugal.1970 <dbl>,
# col..portugal.1988 <dbl>, col..portugal.2003 <dbl>, col..portugal.2017 <dbl>,
# col..spain.1954 <dbl>, col..spain.1970 <dbl>, col..spain.1988 <dbl>,
# col..spain.2003 <dbl>, col..spain.2017 <dbl>
我尝试过这样的代码:
# Define a function to create new dummy columns for presence of colonies in an empire for a specific year
create_col_presence <- function(row, colonies_df, empire, year) {
# Check if the country has presence in the specific year in the empire
country <- row$country
if (row[[paste0(country, "_", year)]] == 1 && colonies_df[[paste0("col.", empire, ".", year)]] == 1) {
return(1) # Set presence to 1 if conditions are met
} else {
return(0) # Set presence to 0 otherwise
}
}
# Iterate over each row in alldata and generate new dummy columns for each empire and year
for (empire in c("belgium", "britain", "france", "germany", "italy", "netherlands", "portugal", "spain")) {
for (year in c(1954, 1970, 1988, 2003, 2017)) {
# Define the name of the new dummy column
new_col_name <- paste0("colpresence_", empire, "_", year)
# Create the new dummy column using the defined function
alldata[[new_col_name]] <- sapply(1:nrow(alldata), function(i) create_col_presence(alldata[i, ], colonies_df, empire, year))
}
}
出现此错误:
> # Define a function to create new dummy columns for presence of colonies in an empire for a specific year
> create_col_presence <- function(row, colonies_df, empire, year) {
+ # Check if the country has presence in the specific year in the empire
+ country <- row$country
+ if (row[[paste0(country, "_", year)]] == 1 && colonies_df[[paste0("col.", empire, ".", year)]] == 1) {
+ return(1) # Set presence to 1 if conditions are met
+ } else {
+ return(0) # Set presence to 0 otherwise
+ }
+ }
>
> # Iterate over each row in alldata and generate new dummy columns for each empire and year
> for (empire in c("belgium", "britain", "france", "germany", "italy", "netherlands", "portugal", "spain")) {
+ for (year in c(1954, 1970, 1988, 2003, 2017)) {
+ # Define the name of the new dummy column
+ new_col_name <- paste0("colpresence_", empire, "_", year)
+
+ # Create the new dummy column using the defined function
+ alldata[[new_col_name]] <- sapply(1:nrow(alldata), function(i) create_col_presence(alldata[i, ], colonies_df, empire, year))
+ }
+ }
Error in if (row[[paste0(country, "_", year)]] == 1 && colonies_df[[paste0("col.", :
missing value where TRUE/FALSE needed
我目前无法访问 R 安装,您是否尝试过将打印函数放入 create_col_presence 函数中?
create_col_presence <- function(row, colonies_df, empire, year) {
# Check if the country has presence in the specific year in the empire
country <- row$country
print(country)
print(paste0(country, "_", year))
print(paste0("col.", empire, ".", year))
print(row[[paste0(country, "_", year)]])
print(colonies_df[[paste0("col.", empire, ".", year)]])
if (row[[paste0(country, "_", year)]] == 1 && colonies_df[[paste0("col.", empire, ".", year)]] == 1) {
return(1) # Set presence to 1 if conditions are met
} else {
return(0) # Set presence to 0 otherwise
}
}
我想你会发现其中一个看起来不像你期望的那样。