我有一个名为Recovered_df的数据框,它显示了在特定日期不同国家中的COVID 19的恢复案例数。数据框的尺寸为185 x 106,即有185个不同的国家/地区和106列日期,从2020.01.22到2020.05.05。
我想将数据框转换为只有3列:国家,日期和国家/地区中每个日期的RecoveredNumbers。我这样写
Recovered_df <- gather(Recovered_df,Date,Recovered, 2020.01.22:2020.05.05)
view(Recovered_df)
并且我收到此错误
Recovered_df <- gather(Recovered_df,Date,Recovered, 2020.01.22:2020.05.05)
Error: unexpected numeric constant in "Recovered_df <- gather(Recovered_df,Date,Recovered, 2020.01.22
即使带引号('2020.01.22:2020.05.05')也无效,并显示错误。
Recovered_df <- gather(Recovered_df,Date,Recovered, '2020.01.22:2020.05.05')
>Can't subset columns that don't exist.
x The column `2020.01.22:2020.05.05` doesn't exist
如何使用gather()
中的tidyr
来执行此操作。我是初学者,需要帮助。
这里是我为受感染创建的示例代码。您可以使用它并更改为已恢复:
# Uses
# - pipes
# - filter to filter rows
# - select to filter columns
# - melt to transform columns to rows, typically to present multiple columns in same graph
# - mutate to add new columns, reorder columns and rename columns (select may be needed to remove some columns after mutate)
# - aggregate with the option to aggregate multiple unnamed columns
rm(list=ls())
library("tidyverse")
library("readxl")
library("lubridate")
library("reshape2")
library("dplyr")
library("data.table")
scaling_factor <- 1000
Covid.US <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv")
Covid.Global <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")
States.Data <- function (x) {
Data <- Covid.US %>%
filter(Province_State == state.name[grep(x,state.abb)] ) %>%
select(Province_State, ends_with("20")) %>%
aggregate(. ~ Province_State, ., FUN=sum) %>%
reshape2::melt() %>%
filter(!value == 0) %>%
mutate(Area = Province_State,
Date = mdy(variable),
Count = value/scaling_factor
) %>%
select(-c(variable, value, Province_State)) %>%
filter(Date >= "2020-03-15")
return(Data)
}
Country.Data <- function (x) {
Data <- Covid.Global %>%
filter(toupper(`Country/Region`) %like% toupper(x)) %>%
select(`Country/Region`, ends_with("20")) %>%
reshape2::melt() %>%
filter(!value == 0) %>%
mutate(Area = `Country/Region`,
Date = mdy(variable),
Count = value/scaling_factor
) %>%
select(-c(variable, value, `Country/Region`)) %>%
filter(Date >= "2020-03-15")
return(Data)
}
#States.comp <- data.frame(Area = str, Date = date, Count = double)
States <- c("NY", "NJ")
#States <- select.list(sort(state.abb), multiple = T)
States.comp <- bind_rows(lapply(States, States.Data))
Countries <- c("India", "Italy", "Spain", "Germany", "Brazil")
Country.comp <- bind_rows(lapply(Countries, Country.Data))
final.data = rbind(States.comp, Country.comp)
comp.graph <- ggplot(data = final.data, aes(x = Date, y = Count, colour = factor(Area))) +
geom_line(size=5) +
ylab("Count 1K")
comp.graph