如何使用tidyr的gather()从宽范围的数据帧中生成较长的数据帧[关闭]

问题描述 投票:0回答:1

我有一个名为Recovered_df的数据框,它显示了在特定日期不同国家中的COVID 19的恢复案例数。数据框的尺寸为185 x 106,即有185个不同的国家/地区和106列日期,从2020.01.22到2020.05.05。

“现在的数据帧外观”

我想将数据框转换为只有3列:国家,日期和国家/地区中每个日期的RecoveredNumbers。我这样写

Recovered_df <- gather(Recovered_df,Date,Recovered, 2020.01.22:2020.05.05)
view(Recovered_df)

并且我收到此错误

Recovered_df <- gather(Recovered_df,Date,Recovered, 2020.01.22:2020.05.05)
Error: unexpected numeric constant in "Recovered_df <- gather(Recovered_df,Date,Recovered, 2020.01.22

即使带引号('2020.01.22:2020.05.05')也无效,并显示错误。

Recovered_df <- gather(Recovered_df,Date,Recovered, '2020.01.22:2020.05.05')

>Can't subset columns that don't exist.
x The column `2020.01.22:2020.05.05` doesn't exist

如何使用gather()中的tidyr来执行此操作。我是初学者,需要帮助。

r tidyr data-cleaning
1个回答
0
投票

这里是我为受感染创建的示例代码。您可以使用它并更改为已恢复:

# Uses
# - pipes
# - filter to filter rows
# - select to filter columns
# - melt to transform columns to rows, typically to present multiple columns in same graph
# - mutate to add new columns, reorder columns and rename columns (select may be needed to remove some columns after mutate)
# - aggregate with the option to aggregate multiple unnamed columns

rm(list=ls())

library("tidyverse")
library("readxl")
library("lubridate")
library("reshape2")
library("dplyr")
library("data.table")


scaling_factor <- 1000

Covid.US <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv")
Covid.Global <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")

States.Data <- function (x) {

  Data <- Covid.US %>%
    filter(Province_State == state.name[grep(x,state.abb)] ) %>%
    select(Province_State, ends_with("20")) %>%
    aggregate(. ~ Province_State, ., FUN=sum) %>%
    reshape2::melt() %>%
    filter(!value == 0) %>%
    mutate(Area = Province_State,
           Date = mdy(variable),
           Count = value/scaling_factor
    ) %>%
    select(-c(variable, value, Province_State)) %>%
    filter(Date >= "2020-03-15")
  return(Data)
}

Country.Data <- function (x) {

  Data <- Covid.Global %>%
    filter(toupper(`Country/Region`) %like% toupper(x)) %>%
    select(`Country/Region`, ends_with("20")) %>%
    reshape2::melt() %>%
    filter(!value == 0) %>%
    mutate(Area = `Country/Region`,
           Date = mdy(variable),
           Count = value/scaling_factor
    ) %>%
    select(-c(variable, value, `Country/Region`)) %>%
    filter(Date >= "2020-03-15")
  return(Data)
}

#States.comp <- data.frame(Area = str, Date = date, Count = double)

States <- c("NY", "NJ")
#States <- select.list(sort(state.abb), multiple = T)
States.comp <- bind_rows(lapply(States, States.Data))



Countries <- c("India", "Italy", "Spain", "Germany", "Brazil")
Country.comp <- bind_rows(lapply(Countries, Country.Data))

final.data = rbind(States.comp, Country.comp)

comp.graph <- ggplot(data = final.data, aes(x = Date, y = Count, colour = factor(Area))) +
          geom_line(size=5) +
          ylab("Count 1K")
comp.graph
© www.soinside.com 2019 - 2024. All rights reserved.