使用数据框作为查找表,根据另一个 df [R 代码] 的列名称创建虚拟对象

问题描述 投票:0回答:1

我有一个数据框“alldata”,其中的情况是组织,列中的虚拟变量名称为 Algeria_1954 - Country_year。该虚拟变量上的“1”表示该组织在该年存在于给定国家/地区,例如 1954 年在阿尔及利亚。我想将另一个 df“colonies_df”视为创建虚拟变量的查找表。它有一个列“国家”,其中包含国家名称(例如“阿尔及利亚”),还有一个列表示某个国家在给定年份在殖民帝国中的存在,例如 col..france.1954 - col..empire。年。我想在 alldata 中创建新的虚拟对象,看起来像 colpresence..[empire].[year],基于 alldata 案例(组织)在具有 Country_year 的虚拟对象上得分一分,并且该国家/地区在 col..empire.year 上得分一分具体帝国和年份。我一直在尝试在 R 中编写一个函数,也在 Chat GPT 的帮助下,但没有成功。我真的很感激帮助

These are the variables in colonies_df, with all the empires and years I want to use for my dummy variables:
> head(colonies_df)
# A tibble: 6 x 41
  country col..belgium.19~ col..belgium.19~ col..belgium.19~ col..belgium.20~
  <chr>              <dbl>            <dbl>            <dbl>            <dbl>
1 Afghan~                0                0                0                0
2 Albania                0                0                0                0
3 Algeria                0                0                0                0
4 Andorra                0                0                0                0
5 Angola                 0                0                0                0
6 Antigu~                0                0                0                0
# ... with 36 more variables: col..belgium.2017 <dbl>, col..britain.1954 <dbl>,
#   col..britain.1970 <dbl>, col..britain.1988 <dbl>, col..britain.2003 <dbl>,
#   col..britain.2017 <dbl>, col..france.1954 <dbl>, col..france.1970 <dbl>,
#   col..france.1988 <dbl>, col..france.2003 <dbl>, col..france.2017 <dbl>,
#   col..germany.1954 <dbl>, col..germany.1970 <dbl>, col..germany.1988 <dbl>,
#   col..germany.2003 <dbl>, col..germany.2017 <dbl>, col..italy.1954 <dbl>,
#   col..italy.1970 <dbl>, col..italy.1988 <dbl>, col..italy.2003 <dbl>,
#   col..italy.2017 <dbl>, col..netherlands.1954 <dbl>, col..netherlands.1970 <dbl>,
#   col..netherlands.1988 <dbl>, col..netherlands.2003 <dbl>,
#   col..netherlands.2017 <dbl>, col..portugal.1954 <dbl>, col..portugal.1970 <dbl>,
#   col..portugal.1988 <dbl>, col..portugal.2003 <dbl>, col..portugal.2017 <dbl>,
#   col..spain.1954 <dbl>, col..spain.1970 <dbl>, col..spain.1988 <dbl>,
#   col..spain.2003 <dbl>, col..spain.2017 <dbl>

我尝试过这样的代码:


# Define a function to create new dummy columns for presence of colonies in an empire for a specific year
create_col_presence <- function(row, colonies_df, empire, year) {
  # Check if the country has presence in the specific year in the empire
  country <- row$country
  if (row[[paste0(country, "_", year)]] == 1 && colonies_df[[paste0("col.", empire, ".", year)]] == 1) {
    return(1)  # Set presence to 1 if conditions are met
  } else {
    return(0)  # Set presence to 0 otherwise
  }
}

# Iterate over each row in alldata and generate new dummy columns for each empire and year
for (empire in c("belgium", "britain", "france", "germany", "italy", "netherlands", "portugal", "spain")) {
  for (year in c(1954, 1970, 1988, 2003, 2017)) {
    # Define the name of the new dummy column
    new_col_name <- paste0("colpresence_", empire, "_", year)
    
    # Create the new dummy column using the defined function
    alldata[[new_col_name]] <- sapply(1:nrow(alldata), function(i) create_col_presence(alldata[i, ], colonies_df, empire, year))
  }
}

出现此错误:

> # Define a function to create new dummy columns for presence of colonies in an empire for a specific year
> create_col_presence <- function(row, colonies_df, empire, year) {
+   # Check if the country has presence in the specific year in the empire
+   country <- row$country
+   if (row[[paste0(country, "_", year)]] == 1 && colonies_df[[paste0("col.", empire, ".", year)]] == 1) {
+     return(1)  # Set presence to 1 if conditions are met
+   } else {
+     return(0)  # Set presence to 0 otherwise
+   }
+ }
> 
> # Iterate over each row in alldata and generate new dummy columns for each empire and year
> for (empire in c("belgium", "britain", "france", "germany", "italy", "netherlands", "portugal", "spain")) {
+   for (year in c(1954, 1970, 1988, 2003, 2017)) {
+     # Define the name of the new dummy column
+     new_col_name <- paste0("colpresence_", empire, "_", year)
+     
+     # Create the new dummy column using the defined function
+     alldata[[new_col_name]] <- sapply(1:nrow(alldata), function(i) create_col_presence(alldata[i, ], colonies_df, empire, year))
+   }
+ }
Error in if (row[[paste0(country, "_", year)]] == 1 && colonies_df[[paste0("col.",  : 
  missing value where TRUE/FALSE needed
r dataframe lookup
1个回答
0
投票

我目前无法访问 R 安装,您是否尝试过将打印函数放入 create_col_presence 函数中?

 create_col_presence <- function(row, colonies_df, empire, year) {
   # Check if the country has presence in the specific year in the empire
   country <- row$country
   print(country)
   print(paste0(country, "_", year))
   print(paste0("col.", empire, ".", year))
   print(row[[paste0(country, "_", year)]])
   print(colonies_df[[paste0("col.", empire, ".", year)]])
   if (row[[paste0(country, "_", year)]] == 1 && colonies_df[[paste0("col.", empire, ".", year)]] == 1) {
     return(1)  # Set presence to 1 if conditions are met
   } else {
     return(0)  # Set presence to 0 otherwise
   }
 }

我想你会发现其中一个看起来不像你期望的那样。

© www.soinside.com 2019 - 2024. All rights reserved.