基于多个条件使用sqldf进行计数

Question

大家好，我正在使用sqldf在R上编写sql查询，似乎遇到了障碍。我有一个带有ID列，两个日期列和一个按列分组的表。

AlertDate  AppointmentDate  ID  Branch
01/01/20   04/01/20         1   W1
01/01/20   09/01/20         1   W1
08/01/20   09/01/20         1   W2
01/01/20   23/01/20         1   W1

我正在写的查询是

sqldf('select Branch,count(ID) from df where AlertDate <= AppointmentDate 
and AppointmentDate <AlertDate+7 group by Branch')

通过此查询，我得到的结果是

Branch Count
W1      1
W2      1

根据查询正确的答案。我要实现的是第二个条件为假，即AppointmentDate小于AlertDate + 7。与其删除计数，不如根据日期将其计入下一组。例如，如果警报日期为01/01/20，约会日期为23/01/20，则应将其计入W4。 ceil（（Appointmentdate-alertdate）/ 7）最后，我希望结果为

Branch  Count
W1      1
W2      2
W4      1

第二行应计入W2，第四行应计入W4，而不是被丢弃。我试图在R中使用sqldf在sql中实现这一目标。任何使用R或Sql的可能解决方案都对我有用。

输出dput（测试）

structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class = 
"Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems = 
structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null", 
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list(  cols = list(AlertDate = 
structure(list(format = "%d/%m/%y"), class = c("collector_date", 
"collector")), AppointmentDate = structure(list(format = "%d/%m/%y"), class = c("collector_date",  "collector")), ID = structure(list(), class = c("collector_double", "collector")), Branch = structure(list(), class = 
c("collector_character",  "collector"))), default = structure(list(), class = c("collector_guess",  "collector")), skip = 1), class = "col_spec"))

Answer 1

假设需要的是

添加列nextBranch。该问题未在问题中定义，因此我们假定它是大于当前分支的最小分支。
如果使用AppointmentDate> AlertDate + 7，则使用（1）的结果，然后使用nextBranch，如果Branch为空，则使用nextBranch，并在AppointmentDate > AlertDate处计算每个修订的分支的行数。

代码-

library(sqldf)
library(tibble)

sqldf("select 
    case 
      when AppointmentDate > AlertDate + 7 then coalesce(nextBranch, Branch)
      else Branch
    end as Branch,
    count(*) as 'Count'
  from (select a.*, min(b.Branch) nextBranch
    from df a
    left join df b
    on b.Branch > a.Branch
    group by a.rowid)
  where AlertDate < AppointmentDate
  group by 1")

给予：

  Branch Count
1     W1     1
2     W2     2

注意

回答后，问题中的dput输出已更改，并且在任何情况下都与预期输出不一致，因为预期输出具有W4，但修订的dput输出中没有W4。因此，我们使用了问题中显示的原始dput输出。

df <-
structure(list(AlertDate = structure(c(18262, 18262, 18269), class = "Date"), 
AppointmentDate = structure(c(18265, 18270, 18270), class = "Date"), 
ID = c(1, 1, 1), Branch = c("W1", "W1", "W2")), class = c("spec_tbl_df","tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), problems = structure(list(
row = 3L, col = "Branch", expected = "", actual = "embedded null", 
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L),
 class = c("tbl_df", "tbl", "data.frame")), spec = structure(list(
cols = list(AlertDate = structure(list(format = "%d/%m/%y"), class = c("collector_date","collector")), AppointmentDate = structure(list(format = "%d/%m/%y"), class = c("collector_date","collector")), ID = structure(list(), class = c("collector_double", "collector")), Branch = structure(list(), class = c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1), class = "col_spec"))

给予：

> library(tibble)
> df
# A tibble: 3 x 4
  AlertDate  AppointmentDate    ID Branch
  <date>     <date>          <dbl> <chr> 
1 2020-01-01 2020-01-04          1 W1    
2 2020-01-01 2020-01-09          1 W1    
3 2020-01-08 2020-01-09          1 W2

Answer 2

这是使用data.table的一种方法

df <- structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class = 
                                                                                                                     "Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems = 
              structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null", 
                              file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list(  cols = list(AlertDate = 
                                                                                                                                                                                      structure(list(format = "%d/%m/%y"), class = c("collector_date",

我正在将其转换为data.table并为您的逻辑创建一个新列。现在，您只能在第一个条件下使用表格，才能在表格上进行分组。

df <- data.table(df) 
df[, new_branch:= ifelse(as.numeric(AppointmentDate-AlertDate)>=7
        ,paste0("W", as.character(ceiling(as.numeric(AppointmentDate-AlertDate)/7))),Branch)]

这是结果表

AlertDate AppointmentDate ID Branch new_branch newcol1
1: 2020-01-01      2020-01-04  1     W1         W1       1
2: 2020-01-01      2020-01-09  1     W1         W2       1
3: 2020-01-08      2020-01-09  1     W2         W2       1
4: 2020-01-01      2020-01-23  1     W1         W4       1

基于多个条件使用sqldf进行计数

问题描述投票：0回答：2

2个回答

注意

最新问题

基于多个条件使用sqldf进行计数

问题描述 投票：0回答：2

2个回答

注意

最新问题

问题描述投票：0回答：2