这可能很愚蠢,但我在合并两个数据集(在一个新文件中)时遇到了麻烦;我想要具有 Excel A 的结构(我将其称为 BehaviouralEconomicsTourism.xlsx)以及来自变量的信息(来自 scopus.csv 的相同变量,但拼写不同)。因此,我希望它们显示为新行,共同添加这些变量的信息。有人有什么建议吗?我尝试遵循以下代码(我之前尝试过一些其他变体但没有保存)。但它并没有达到预期的目的
# Install and load necessary packages if not already installed
if (!requireNamespace("tidyverse", quietly = TRUE)) install.packages("tidyverse")
if (!requireNamespace("readxl", quietly = TRUE)) install.packages("readxl")
if (!requireNamespace("writexl", quietly = TRUE)) install.packages("writexl")
if (!requireNamespace("janitor", quietly = TRUE)) install.packages("janitor")
library(tidyverse)
library(readxl)
library(writexl)
library(janitor)
# Load datasets from Sheet2 in BehaviouralEconomicsTourism.xlsx
behavioral_economics_df <- read_excel("/Users/s5320381/Downloads/BehaviouralEconomicsTourism.xlsx", sheet = "Sheet2")
scopus_df <- read.csv("/Users/s5320381/Downloads/scopus.csv")
# Clean column names
scopus_df <- janitor::clean_names(scopus_df)
behavioral_economics_df <- janitor::clean_names(behavioral_economics_df)
# Mapping columns between the two datasets
column_mapping <- c(
"authors" = "authors",
"author_full_names" = "author_full_names",
"article_title" = "title",
"publication_year" = "year",
"volume" = "volume",
"issue" = "issue",
"publisher" = "publisher",
"source_title" = "source_title",
"author_keywords" = "author_keywords",
"abstract" = "abstract",
"issn" = "issn",
"isbn" = "isbn",
"start_page" = "page_start",
"end_page" = "page_end",
"doi" = "doi",
"language" = "language_of_original_document",
"doi_link" = "link",
"affiliations" = "affiliations"
)
# Convert relevant columns to character in scopus_df
scopus_df <- scopus_df %>%
mutate(across(c(volume, issue, start_page, end_page), as.character))
# Convert relevant columns to double in behavioral_economics_df
behavioral_economics_df <- behavioral_economics_df %>%
mutate(across(c(volume, issue, start_page, end_page), as.double))
# Merge datasets based on common columns
merged_df <- bind_rows(behavioral_economics_df, scopus_df %>% select(column_mapping, references))
# Save the merged dataset to a new Excel file
writexl::write_xlsx(merged_df, "/Users/s5320381/Downloads/test201223.xlsx")
我尝试合并两个具有相同变量的数据集,这些变量拥有不同的标头。
volume, issue, start_page, end_page
设置为不同类型(character
和double
)。references
是什么。删除它会改变什么吗?如果没有,您可以使用以下方法覆盖每个数据集的名称(假设它们的列顺序完全相同):library(tidyverse)
library(readxl)
library(writexl)
library(janitor)
# Load datasets from Sheet2 in BehaviouralEconomicsTourism.xlsx
behavioral_economics_df <- read_excel("/Users/s5320381/Downloads/BehaviouralEconomicsTourism.xlsx", sheet = "Sheet2")
scopus_df <- read.csv("/Users/s5320381/Downloads/scopus.csv")
column_names <- c(
"authors",
"author_full_names",
"article_title",
"publication_year",
"volume",
"issue",
"publisher",
"source_title",
"author_keywords",
"abstract",
"issn",
"isbn",
"start_page",
"end_page",
"doi",
"language",
"doi_link",
"affiliations"
)
# Rename columns in each dataset
names(scopus_df) <- column_names
names(behavioral_economics_df) <- column_names
# Merge datasets based on common columns and mutate relevant columns
merged_df <- bind_rows(behavioral_economics_df, select(.data = scopus_df, column_mapping)) %>%
mutate(across(c(volume, issue, start_page, end_page), as.numeric))