大家好,我正在使用sqldf在R上编写sql查询,似乎遇到了障碍。我有一个带有ID列,两个日期列和一个按列分组的表。
AlertDate AppointmentDate ID Branch
01/01/20 04/01/20 1 W1
01/01/20 09/01/20 1 W1
08/01/20 09/01/20 1 W2
01/01/20 23/01/20 1 W1
我正在写的查询是
sqldf('select Branch,count(ID) from df where AlertDate <= AppointmentDate
and AppointmentDate <AlertDate+7 group by Branch')
通过此查询,我得到的结果是
Branch Count
W1 1
W2 1
根据查询正确的答案。我要实现的是第二个条件为假,即AppointmentDate小于AlertDate + 7。与其删除计数,不如根据日期将其计入下一组。例如,如果警报日期为01/01/20,约会日期为23/01/20,则应将其计入W4。 ceil((Appointmentdate-alertdate)/ 7)最后,我希望结果为
Branch Count
W1 1
W2 2
W4 1
第二行应计入W2,第四行应计入W4,而不是被丢弃。我试图在R中使用sqldf在sql中实现这一目标。任何使用R或Sql的可能解决方案都对我有用。
输出dput(测试)
structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class =
"Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems =
structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null",
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list( cols = list(AlertDate =
structure(list(format = "%d/%m/%y"), class = c("collector_date",
"collector")), AppointmentDate = structure(list(format = "%d/%m/%y"), class = c("collector_date", "collector")), ID = structure(list(), class = c("collector_double", "collector")), Branch = structure(list(), class =
c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1), class = "col_spec"))
假设需要的是
nextBranch
。该问题未在问题中定义,因此我们假定它是大于当前分支的最小分支。nextBranch
,如果Branch
为空,则使用nextBranch
,并在AppointmentDate > AlertDate
处计算每个修订的分支的行数。代码-
library(sqldf)
library(tibble)
sqldf("select
case
when AppointmentDate > AlertDate + 7 then coalesce(nextBranch, Branch)
else Branch
end as Branch,
count(*) as 'Count'
from (select a.*, min(b.Branch) nextBranch
from df a
left join df b
on b.Branch > a.Branch
group by a.rowid)
where AlertDate < AppointmentDate
group by 1")
给予:
Branch Count
1 W1 1
2 W2 2
回答后,问题中的dput
输出已更改,并且在任何情况下都与预期输出不一致,因为预期输出具有W4,但修订的dput
输出中没有W4。因此,我们使用了问题中显示的原始dput
输出。
df <-
structure(list(AlertDate = structure(c(18262, 18262, 18269), class = "Date"),
AppointmentDate = structure(c(18265, 18270, 18270), class = "Date"),
ID = c(1, 1, 1), Branch = c("W1", "W1", "W2")), class = c("spec_tbl_df","tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), problems = structure(list(
row = 3L, col = "Branch", expected = "", actual = "embedded null",
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L),
class = c("tbl_df", "tbl", "data.frame")), spec = structure(list(
cols = list(AlertDate = structure(list(format = "%d/%m/%y"), class = c("collector_date","collector")), AppointmentDate = structure(list(format = "%d/%m/%y"), class = c("collector_date","collector")), ID = structure(list(), class = c("collector_double", "collector")), Branch = structure(list(), class = c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1), class = "col_spec"))
给予:
> library(tibble)
> df
# A tibble: 3 x 4
AlertDate AppointmentDate ID Branch
<date> <date> <dbl> <chr>
1 2020-01-01 2020-01-04 1 W1
2 2020-01-01 2020-01-09 1 W1
3 2020-01-08 2020-01-09 1 W2
这是使用data.table的一种方法
df <- structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class =
"Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems =
structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null",
file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list( cols = list(AlertDate =
structure(list(format = "%d/%m/%y"), class = c("collector_date",
我正在将其转换为data.table并为您的逻辑创建一个新列。现在,您只能在第一个条件下使用表格,才能在表格上进行分组。
df <- data.table(df)
df[, new_branch:= ifelse(as.numeric(AppointmentDate-AlertDate)>=7
,paste0("W", as.character(ceiling(as.numeric(AppointmentDate-AlertDate)/7))),Branch)]
这是结果表
AlertDate AppointmentDate ID Branch new_branch newcol1
1: 2020-01-01 2020-01-04 1 W1 W1 1
2: 2020-01-01 2020-01-09 1 W1 W2 1
3: 2020-01-08 2020-01-09 1 W2 W2 1
4: 2020-01-01 2020-01-23 1 W1 W4 1