合并来自不同索引平台的两个数据集(具有不同标题的相同变量)

问题描述 投票:0回答:1

这可能很愚蠢,但我在合并两个数据集(在一个新文件中)时遇到了麻烦;我想要具有 Excel A 的结构(我将其称为 BehaviouralEconomicsTourism.xlsx)以及来自变量的信息(来自 scopus.csv 的相同变量,但拼写不同)。因此,我希望它们显示为新行,共同添加这些变量的信息。有人有什么建议吗?我尝试遵循以下代码(我之前尝试过一些其他变体但没有保存)。但它并没有达到预期的目的

# Install and load necessary packages if not already installed
if (!requireNamespace("tidyverse", quietly = TRUE)) install.packages("tidyverse")
if (!requireNamespace("readxl", quietly = TRUE)) install.packages("readxl")
if (!requireNamespace("writexl", quietly = TRUE)) install.packages("writexl")
if (!requireNamespace("janitor", quietly = TRUE)) install.packages("janitor")

library(tidyverse)
library(readxl)
library(writexl)
library(janitor)

# Load datasets from Sheet2 in BehaviouralEconomicsTourism.xlsx
behavioral_economics_df <- read_excel("/Users/s5320381/Downloads/BehaviouralEconomicsTourism.xlsx", sheet = "Sheet2")
scopus_df <- read.csv("/Users/s5320381/Downloads/scopus.csv")

# Clean column names
scopus_df <- janitor::clean_names(scopus_df)
behavioral_economics_df <- janitor::clean_names(behavioral_economics_df)

# Mapping columns between the two datasets
column_mapping <- c(
  "authors" = "authors",
  "author_full_names" = "author_full_names",
  "article_title" = "title",
  "publication_year" = "year",
  "volume" = "volume",
  "issue" = "issue",
  "publisher" = "publisher",
  "source_title" = "source_title",
  "author_keywords" = "author_keywords",
  "abstract" = "abstract",
  "issn" = "issn",
  "isbn" = "isbn",
  "start_page" = "page_start",
  "end_page" = "page_end",
  "doi" = "doi",
  "language" = "language_of_original_document",
  "doi_link" = "link",
  "affiliations" = "affiliations"
)

# Convert relevant columns to character in scopus_df
scopus_df <- scopus_df %>%
  mutate(across(c(volume, issue, start_page, end_page), as.character))

# Convert relevant columns to double in behavioral_economics_df
behavioral_economics_df <- behavioral_economics_df %>%
  mutate(across(c(volume, issue, start_page, end_page), as.double))

# Merge datasets based on common columns
merged_df <- bind_rows(behavioral_economics_df, scopus_df %>% select(column_mapping, references))

# Save the merged dataset to a new Excel file
writexl::write_xlsx(merged_df, "/Users/s5320381/Downloads/test201223.xlsx")

我尝试合并两个具有相同变量的数据集,这些变量拥有不同的标头。

r tidyverse rstudio janitor
1个回答
0
投票
  1. 在行绑定之前,您不能将相同的列
    volume, issue, start_page, end_page
    设置为不同类型(
    character
    double
    )。
  2. 合并数据集时,您也没有定义
    references
    是什么。删除它会改变什么吗?如果没有,您可以使用以下方法覆盖每个数据集的名称(假设它们的列顺序完全相同):
library(tidyverse)
library(readxl)
library(writexl)
library(janitor)

# Load datasets from Sheet2 in BehaviouralEconomicsTourism.xlsx
behavioral_economics_df <- read_excel("/Users/s5320381/Downloads/BehaviouralEconomicsTourism.xlsx", sheet = "Sheet2")
scopus_df <- read.csv("/Users/s5320381/Downloads/scopus.csv")

column_names <- c(
  "authors",
  "author_full_names",
  "article_title",
  "publication_year",
  "volume",
  "issue",
  "publisher",
  "source_title",
  "author_keywords",
  "abstract",
  "issn",
  "isbn",
  "start_page",
  "end_page",
  "doi",
  "language",
  "doi_link",
  "affiliations"
)

# Rename columns in each dataset
names(scopus_df) <- column_names
names(behavioral_economics_df) <- column_names

# Merge datasets based on common columns and mutate relevant columns
merged_df <- bind_rows(behavioral_economics_df, select(.data = scopus_df, column_mapping)) %>% 
  mutate(across(c(volume, issue, start_page, end_page), as.numeric))
© www.soinside.com 2019 - 2024. All rights reserved.