如何用R绘制时间序列分析的线图

问题描述 投票:0回答:1

我想用日期-时间和在该日期和时间段的推文数量在R中绘制一个线图。

library(ggplot2)
df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Label = c("2020-03-12", 
            "2020-03-13"), class = "factor"), Time = structure(c(1L, 1L, 2L, 
            3L, 4L, 5L), .Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z", 
            "00:25:12Z", "01:00:02Z"), class = "factor"), Text = structure(c(5L, 
            3L, 6L, 4L, 2L, 1L), .Label = c("The images of demonstrations and gathering", "Premium policy get activate by company abc", 
            "Launches of rocket", "Premium policy get activate by company abc", 
            "Technology makes trend", "The images of demonstrations and gatherings", 
            "Weather forecasting by xyz"), class = "factor")), class = "data.frame", row.names = c(NA, 
            -6L))
ggplot(df1, aes(x = Date, y = text(count)) + geom_line(aes(color = variable), size = 1)

我试着用上面的代码来绘制想要的结果,但得到了一个错误。数据集给出这样的csv格式。

Date         Time                     Text
2020-03-12   00:00:00Z                The images of demonstrations and gatherings
2020-03-12   00:00:00Z                Premium policy get activate by company abc
2020-03-12   00:00:01Z                Weather forecasting by xyz 
2020-03-12   00:10:04Z                Technology makes trend
2020-03-12   00:25:12Z                Launches of rocket 
2020-03-12   01:00:02Z                Government launch new policy to different sector improvement

我有一个近15天的数据集,并希望绘制线图,以可视化的推文数量(给定的文本列),看看不同时间和日期的推文趋势。

r ggplot2 visualization linechart
1个回答
0
投票
df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Label = c("3/12/2020", 
            "3/13/2020"), class = "factor"), Time = structure(c(1L, 1L, 2L, 
            3L, 4L, 5L), .Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z", 
            "00:25:12Z", "01:00:02Z"), class = "factor"), Text = structure(c(5L, 
            3L, 6L, 4L, 2L, 1L), .Label = c("Government launch new policy to different sector", 
            "Launches of rocket", "Premium policy get activate by company abc", 
            "Technology makes trend", "The images of demonstrations and gatherings", 
            "Weather forecasting by xyz"), class = "factor"), X = structure(c(1L, 
            1L, 1L, 1L, 1L, 2L), .Label = c("", "improvement"), class = "factor")), class = "data.frame", row.names = c(NA, 
            -6L))                                                      

如上所述,创建数据集df1,然后运行这个,你就可以得到所需的小时图。

library(tidyverse)
library(lubridate)

df1 %>% 
  mutate(Time=hms(Time),
         Date=mdy(Date),
    hour=hour(Time)) %>% 
  count(hour) %>% 
  ggplot(aes(hour,n,group=1))+geom_line()+geom_point()

0
投票

这是你所追求的吗?


library(dplyr)
library(lubridate)
library(stringr)
library(ggplot2)

请用您的数据回答

要展示数据的缠绵。


# your data; 
df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L), 
                                       .Label = c("2020-03-12","2020-03-13"), 
                                       class = "factor"), 
                      Time = structure(c(1L, 1L, 2L,3L, 4L, 5L), 
                                       .Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z","00:25:12Z", "01:00:02Z"),
                                       class = "factor"),
                      Text = structure(c(5L,3L, 6L, 4L, 2L, 1L),
                                       .Label = c("The images of demonstrations and gathering", "Premium policy get activate by company abc",
                                                  "Launches of rocket", "Premium policy get activate by company abc",
                                                  "Technology makes trend", "The images of demonstrations and gatherings", "Weather forecasting by xyz"), class = "factor")),
                 class = "data.frame", row.names = c(NA,-6L))

# data wrangle
df2 <- 
  df1 %>% 
  # change all variables from factors to character
  mutate_all(as.character) %>%
  mutate(Time = str_remove(Time, "Z$"), #remove the trailing 'Z' from Time values 
         dt = ymd_hms(paste(Date, Time, sep = " ")), # change text into datetime format using lubridtate::ymd_hms
         dt = ceiling_date(dt, unit="hour")) %>% # round to the end of the named hour, separated for clarity
  group_by(dt) %>%  
  summarise(nr_tweets = n())

# plot

p1 <- ggplot(df2, aes(dt, nr_tweets))+
        geom_line()+
        scale_x_datetime(date_breaks = "1 day", date_labels = "%d/%m")+
        ggtitle("Data from question `df1`")


用编造的大数据集回答

tib <- tibble(dt = sample(seq(ISOdate(2020,05,01), ISOdate(2020,05,15), by = "sec"), 10000, replace = TRUE),
             text = sample(c(letters[1:26], LETTERS[1:26]), 10000, replace = TRUE))


tib1 <- 
  tib %>% 
  mutate(dt = round_date(dt, unit="hour"))%>% 
  group_by(dt) %>%  
  summarise(nr_tweets = n())


p2 <- ggplot(tib1, aes(dt, nr_tweets))+
        geom_line()+
        scale_x_datetime(date_breaks = "1 day", date_labels = "%d/%m")+
        ggtitle("Result using `tib` data made up to answer the question")
  

p1/p2

创建于2020-05-13 重读包 (v0.3.0)

© www.soinside.com 2019 - 2024. All rights reserved.