根据另一个变量在 R 中创建一个变量

问题描述 投票:0回答:1

实际上,我有一个 Tableau 仪表板,用于与课程讲师共享学生调查数据。我只会在讲师收到至少 10 条回复后才会分享数据。在收集到另外 10 个回复之前,他们无法看到超过 10 个回复的其他数据。这是出于保密目的。

我正在 R 中开发数据管道,并希望将上述逻辑编程为一个名为“ReleasedtoDashboard”的单个变量,当教师有足够的资源可以释放时,该变量给出“是”,而没有足够的资源时给出“否”。我不知道从哪里开始,非常感谢一些帮助!

因变量: SurveyEndDate - 日期变量;该变量包含有关学生何时完成调查的信息 课程名称: - 字符串变量;该变量包含课程名称

更具体地说,我需要在 R 中创建一个名为“ReleasedToDashboard”的计算变量,当基于“SurveyEndDate”按时间顺序排列有 10 个相同的“CourseName”字符串值时,该变量给出“Yes”。在达到同一课程的前 10 个(或至少达到 10 个之前)后,“ReleasedToDashboard”应为“否”,除非已达到 10 的倍数(例如 20),此时它应为“是”。对于 10 的所有倍数,应重复此模式。

example resulting ReleasedToDashboard variable

如果您有任何疑问,请告诉我。这是我第一次在这里发帖,自从我使用 R 以来已经有一段时间了。提前感谢您的帮助!

我尝试使用以下代码,但它仅成功地将“是”分配给按“SurveyEndDate”排序的特定“课程名称”值的每一个倍数 10。对于所有不是十的倍数只要是最近的十的向上倍数的观察(假设我们正在查看一个观察,一个基于 SurveyEndDate 的时间点,其中已完成 4 个观察/调查响应,那么只要该“CourseName”至少有 10 个观察/响应,“ReleasedToDashboard”就应该被分配“是”,对于 16 和 20,或 31 和 40 也是如此。)是指特定值的“CourseName”的计数。

   `library(dplyr)
    library(tidyr)
    DF <- DF %>%
  arrange(CourseName, SurveyEndDate) %>%
  group_by(CourseName) %>%
  mutate(
    count = row_number(),
    nearest_mult_10 = ceiling(count / 10) * 10,
    ReleasedToDashboard = ifelse(nearest_mult_10 <= count, "Yes", "No")
  ) %>%
  ungroup() %>%
  select(-count, -nearest_mult_10)`


structure(list(ReleasedToDashboard = c("Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", 
"No", "No", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "No", "No", "No", "No", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", 
"No", "No"), CourseName = c("Anatomy", "Anatomy", "Anatomy", 
"Anatomy", "Anatomy", "Anatomy", "Anatomy", "Anatomy", "Anatomy", 
"Anatomy", "Anatomy", "Anatomy", "Anatomy", "Anatomy", "Anatomy", 
"Anatomy", "Anatomy", "Anatomy", "Anatomy", "Anatomy", "Anatomy", 
"Biology", "Biology", "Biology", "Biology", "Biology", "Chemistry", 
"Chemistry", "Chemistry", "Chemistry", "Chemistry", "Chemistry", 
"Chemistry", "Chemistry", "Chemistry", "Chemistry", "Chemistry", 
"Physics", "Physics", "Physics", "Psychology", "Psychology", 
"Psychology", "Psychology", "Psychology", "Psychology", "Psychology", 
"Psychology", "Psychology", "Psychology", "Psychology", "Psychology", 
"Psychology"), SurveyEndDate = structure(c(1705449600, 1705536000, 
1705622400, 1705708800, 1706659200, 1706745600, 1706832000, 1707955200, 
1708041600, 1708128000, 1708214400, 1708300800, 1708387200, 1708473600, 
1708560000, 1708646400, 1708732800, 1708819200, 1708905600, 1708992000, 
1709078400, 1706054400, 1706140800, 1706227200, 1706313600, 1706400000, 
1705795200, 1705881600, 1706140800, 1706227200, 1706572800, 1706918400, 
1707004800, 1707264000, 1707350400, 1707436800, 1707523200, 1705363200, 
1705968000, 1706054400, 1705017600, 1705104000, 1705190400, 1705276800, 
1706313600, 1706400000, 1706486400, 1707091200, 1707177600, 1707609600, 
1707696000, 1707782400, 1707868800), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-53L))
r date if-statement conditional-statements
1个回答
0
投票

我用下面的代码到达了那里!这不是最佳实践,但我在底部添加了replace.na清理操作,因为我相信'tenth_acr_yes[nearest_mult_10]==“Yes”'当没有“Yes”可供参考时会吐出“NA”,即,它基本上是试图索引一个不存在的值。不知道如何编程,所以我只是设置 NA =“否”。如果有人有改进的话,我愿意接受改进!谢谢!

library(dplyr)
library(tidyr)

acrs.survey.except.med <- acrs.survey.except.med %>%
  arrange(acr, EndDate) %>%
  group_by(acr) %>%
  mutate(
    count = row_number(),
    nearest_mult_10 = ceiling(count / 10) * 10,
    tenth_acr_yes = ifelse(nearest_mult_10 <= count, "Yes", "No"),
    ReleasedToDashboard = ifelse(tenth_acr_yes[nearest_mult_10]=="Yes" & count <= nearest_mult_10, "Yes", "No")
  ) %>%
  ungroup() %>%
  select(-tenth_acr_yes, -count, -nearest_mult_10)

acrs.survey.except.med$ReleasedToDashboard <- replace_na(acrs.survey.except.med$ReleasedToDashboard, "No")
© www.soinside.com 2019 - 2024. All rights reserved.