我正在通过将从两个不同的GitHub存储库中获得的两个数据帧绑定在一起来产生一个新的数据帧。两个数据集都有一个Date列。当我在计算机上执行此操作时,一切都很好,并且可以使用函数rbind()
或bind_rows()
将数据帧绑定在一起。另一个用户尝试了相同的代码,结果却不同。特别地,Date列被拆分。第一个数据框的日期位于第一列(称为Date)下,而第二个数据框的日期位于该数据框的末尾,位于新列(我尚未创建)中,该列称为< [XUFEFF.Date。
library(dplyr)
library(RCurl)
setwd(dir = "YOUR_WORKING_DIRECTORY")
#####===== FIRST DATAFRAME =====#####
cases <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/openZH/covid_19/master/COVID19_Cases_Cantons_CH_total.csv"),
header = TRUE,
stringsAsFactors = FALSE,
na.strings = c("", "NA"),
encoding = "UTF-8")
# Removed data for whole Switzerland and Leichtenstein
cases <- subset(x = cases,
!is.element(el = canton,
set = c("CH", "FL")),
select = c("date",
"canton",
"tested_pos"))
names(cases)[1] <- "Date"
# Dataset restructured according to the cases dataset format
cases <- reshape(data = cases,
idvar = "Date",
timevar = "canton",
v.names = "tested_pos",
direction = "wide",
)
names(cases) <- gsub(pattern = "tested_pos.",
replacement = "",
x = names(cases))
cases[is.na(cases)] <- 0
cases <- cases[order(cases$Date,
decreasing = FALSE), ]
#####===== SECOND DATAFRAME =====#####
cases2 <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/covid19_cases_switzerland.csv"),
header = TRUE,
stringsAsFactors = FALSE,
na.strings = c("", "NA"),
encoding = "UTF-8")
# Remove total daily cases for Switzerland
cases2 <- subset(x = cases2,
select = -c(CH))
# rbind between two cases datasets
cases_tot <- bind_rows(cases[1:7, ],
cases2)
write.csv(x = cases_tot,
file = paste0(getwd(),
"/cases_tot.csv"),
row.names = FALSE,
quote = FALSE)
对于其他用户,功能rbind()
只是失败,而功能bind_rows()
产生此image中显示的输出。我不知道如何解决此问题,因为我无法在计算机上重现它。关于导致此问题的原因有什么想法?非常感谢。
将read.csv()
更改为read_csv()
以获得更强大的csv解析!