我有一个 reddit 数据集,其中每一行代表一个 reddit 帖子,我有一个给定用户名的每个 reddit 帖子的情绪分数。我还有一个变量,用于捕获由同一用户名撰写的所有帖子的平均情绪。
我正在尝试创建一个与最低工资政策时间表相关的情绪指标,我想根据三个时期对每个用户名的情绪进行分类:
1- 在政策公布之前,假设它在“2021-03-01” 2- 政策公布后但实施前,即在“2021-03-01”之后但在“2021-09-01”之前 3-政策实施后,“2021-09-01”
我已经能够按月或按季度计算每个用户名的情绪,如下所示,但我想根据上面的特定政策时间表为每个用户名创建情绪,但我不确定该怎么做。
library(tidyverse)
library(lubridate)
library(zoo)
dput(df[1:5,c(3,4,21, 22, 23)])
输出:
structure(list(date = structure(c(15149, 15150, 15150, 15150,
15150), class = "Date"), username = c("ax", "aa",
"cartman", "abc", "aff"
), quarter_yr = c("2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2",
"2011 Q2"), sentiment_score = c("0", "-1", "1", "-1", "-1"),
avg_sentiment = c(0.0666666666666667, -0.777777777777778,
1, -1, -1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), groups = structure(list(username = c("ax",
"cartman", "abc", "aff"), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))
sentiment_df <- sentiment_df %>%
mutate(date = ymd(date),
quarter_yr = paste(year(date), quarters(date)))
sentiment_df <-
df %>% group_by(username, quarter_yr) %>% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))
dput(sentiment_df[1:2,c(1,8)])
输出
structure(list(username = c("cartman","aa"
), `2014 Q2` = c(NA_real_, NA_real_)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), groups = structure(list(
username = c("cartman","aa"), .rows = structure(list(
1L, 2L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))
sentiment_df <- sentiment_df %>%
mutate(date = ymd(date),
quarter_yr = paste(year(date), quarters(date)),
phase = case_when(date < ymd(20210301) ~ "1 Before announcement",
date < ymd(20210901) ~ "2 Before implementation",
TRUE ~ "3 After implementation"))
sentiment_df <-
df %>%
group_by(username, phase) %>%
summarise(avg_sentiment = mean(as.numeric(sentiment_score)))
看起来您只是使用
mutate()
和case_when()
创建一个新变量,然后按新变量分组。这是我的尝试。这就是你想要的吗?
library(dplyr)
library(lubridate)
library(zoo)
sentiment_df<-structure(list(date = structure(c(15149, 15150, 15150, 15150,
15150), class = "Date"), username = c("ax", "aa",
"cartman", "abc", "aff"
), quarter_yr = c("2011 Q2", "2011 Q2", "2011 Q2", "2011 Q2",
"2011 Q2"), sentiment_score = c("0", "-1", "1", "-1", "-1"),
avg_sentiment = c(0.0666666666666667, -0.777777777777778,
1, -1, -1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), groups = structure(list(username = c("ax",
"cartman", "abc", "aff"), .rows = structure(list(5L, 4L, 1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))
sentiment_df <- sentiment_df %>% mutate(date = ymd(date),
quarter_yr = paste(year(date), quarters(date)),
implementation_period = case_when(date < as.Date("2021-03-01") ~ "Before",
date >= as.Date("2021-03-01") & date < as.Date("2021-09-01") ~ "Pre_Implementation",
TRUE ~ "After"))
sentiment_df <-
sentiment_df %>% group_by(username, implementation_period) %>% summarise(avg_sentiment = mean(as.numeric(sentiment_score)))
快速说明,在您提供的数据中只有“之前”日期。但我认为它应该适用于整个数据集。