使用时间条件而不是季度/月 (dplyr) 改变情绪指标

问题描述 投票:0回答:2

我有一个 reddit 数据集,其中每一行代表一个 reddit 帖子,我有一个给定用户名的每个 reddit 帖子的情绪分数。我还有一个变量,用于捕获由同一用户名撰写的所有帖子的平均情绪。

我正在尝试创建一个与最低工资政策时间表相关的情绪指标,我想根据三个时期对每个用户名的情绪进行分类:

1- 在政策公布之前,假设它在“2021-03-01” 2- 政策公布后但实施前,即在“2021-03-01”之后但在“2021-09-01”之前 3-政策实施后,“2021-09-01”

我已经能够按月或按季度计算每个用户名的情绪,如下所示,但我想根据上面的特定政策时间表为每个用户名创建情绪,但我不确定该怎么做。

上传包

library(tidyverse)
library(lubridate)
library(zoo)

打印特定列的数据示例

dput(df[1:5,c(3,4,21, 22, 23)])

输出:

structure(list(date = structure(c(15149, 15150, 15150, 15150, 
15150), class = "Date"), username = c("ax", "aa", 
"cartman", "abc", "aff"
), quarter_yr = c("2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2", 
"2011 Q2"), sentiment_score = c("0", "-1", "1", "-1", "-1"), 
    avg_sentiment = c(0.0666666666666667, -0.777777777777778, 
    1, -1, -1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), groups = structure(list(username = c("ax", 
"cartman", "abc", "aff"), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))

创建季度/年度变量

sentiment_df <- sentiment_df %>% 
  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)))

根据他们拥有的许多观察结果/帖子,计算每个用户名的平均情绪分数:

sentiment_df <-
df %>% group_by(username, quarter_yr) %>% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

用户名的季度情绪:

dput(sentiment_df[1:2,c(1,8)])

输出

structure(list(username = c("cartman","aa"
), `2014 Q2` = c(NA_real_, NA_real_)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), groups = structure(list(
    username = c("cartman","aa"), .rows = structure(list(
        1L, 2L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))
r dplyr lubridate
2个回答
0
投票
sentiment_df <- sentiment_df %>% 
  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)),
         phase = case_when(date < ymd(20210301) ~ "1 Before announcement",
                           date < ymd(20210901) ~ "2 Before implementation",
                           TRUE ~ "3 After implementation"))

sentiment_df <-
df %>% 
  group_by(username, phase) %>% 
  summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

0
投票

看起来您只是使用

mutate()
case_when()
创建一个新变量,然后按新变量分组。这是我的尝试。这就是你想要的吗?

library(dplyr)
library(lubridate)
library(zoo)
sentiment_df<-structure(list(date = structure(c(15149, 15150, 15150, 15150, 
                                  15150), class = "Date"), username = c("ax", "aa", 
                                                                        "cartman", "abc", "aff"
                                  ), quarter_yr = c("2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2", 
                                                    "2011 Q2"), sentiment_score = c("0", "-1", "1", "-1", "-1"), 
               avg_sentiment = c(0.0666666666666667, -0.777777777777778, 
                                 1, -1, -1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
                                 ), row.names = c(NA, -5L), groups = structure(list(username = c("ax", 
                                                                                                 "cartman", "abc", "aff"), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of", 
                                                                                                                                                                                                     "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
                                                                                                                                                                                                     ), row.names = c(NA, -5L), .drop = TRUE))
sentiment_df <- sentiment_df %>%  mutate(date = ymd(date),
         quarter_yr = paste(year(date), quarters(date)),
         implementation_period = case_when(date < as.Date("2021-03-01") ~ "Before",
                            date >= as.Date("2021-03-01") & date < as.Date("2021-09-01") ~ "Pre_Implementation",
                            TRUE ~ "After"))

sentiment_df <-
  sentiment_df %>% group_by(username, implementation_period) %>% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))

快速说明,在您提供的数据中只有“之前”日期。但我认为它应该适用于整个数据集。

© www.soinside.com 2019 - 2024. All rights reserved.