有没有办法在R中使用sqldf Group By按滚动周期返回结果?

问题描述 投票:0回答:1

我有一组数据,其中受访者可以在每个月回答多次。

structure(list(Month = c("Jan 2016", "Jan 2016", "Feb 2016", 
"Feb 2016", "Mar 2016", "Apr 2016", "May 2016", "Jun 2016", "Jun 2016", 
"Jul 2016", "Aug 2016", "Aug 2016", "Sep 2016", "Sep 2016", "Oct 2016", 
"Nov 2016", "Dec 2016", "Dec 2016", "Jan 2016", "Feb 2016", "Feb 2016", 
"Feb 2016", "Mar 2016", "Mar 2016", "Apr 2016", "May 2016", "May 2016", 
"Jun 2016", "Jun 2016", "Jul 2016", "Aug 2016", "Aug 2016", "Oct 2016", 
"Oct 2016", "Dec 2016", "Mar 2016", "Mar 2016", "Apr 2016", "Apr 2016", 
"May 2016", "Jun 2016", "Aug 2016", "Sep 2016", "Jan 2016", "Jan 2016", 
"Feb 2016", "Feb 2016", "Feb 2016", "Feb 2016", "Feb 2016"), 
    PhysicianID = c(4263, 4263, 4263, 4263, 4263, 4263, 4263, 
    4263, 4263, 4263, 4263, 4263, 4263, 4263, 4263, 4263, 4263, 
    4263, 4278, 4278, 4278, 4278, 4278, 4278, 4278, 4278, 4278, 
    4278, 4278, 4278, 4278, 4278, 4278, 4278, 4278, 4282, 4282, 
    4282, 4282, 4282, 4282, 4282, 4282, 4309, 4309, 4309, 4309, 
    4309, 4309, 4309)), row.names = c(NA, -50L), class = c("tbl_df", 
"tbl", "data.frame"))

我需要知道3个月滚动期的唯一受访者人数。得到每个月的结果不是问题。

sqldf("SELECT Month,COUNT(distinct(PhysicianID)) FROM Data_for_R GROUP BY Month")
      Month COUNT(distinct(PhysicianID))
1  Apr 2016                            3
2  Aug 2016                            3
3  Dec 2016                            2
4  Feb 2016                            3
5  Jan 2016                            3
6  Jul 2016                            2
7  Jun 2016                            3
8  Mar 2016                            3
9  May 2016                            3
10 Nov 2016                            1
11 Oct 2016                            2
12 Sep 2016                            2

我需要的是一种方法来返回结果,看起来更像

1 Jan 2016 to March 2016              xxx
2 Feb 2016 to April 2016              xxx
3 March 2016 to May 2016              xxx
etc...
r group-by sqldf
1个回答
0
投票

将yearmonth转换为yearmon类(发送至SQL时,会出现Jan的year+0,Feb的112等),然后将3个尾月的数据匹配的月份进行左联接,并考虑到浮点近似。对当前月份进行分组,并进行计数。只保留有三个月的行--你可能想要,也可能不想要。 name__class方法将指定的类分配给后缀为__的变量名和类名。

library(sqldf)
library(zoo)

DFR <- transform(Data_for_R, Month = as.yearmon(Month, "%b %Y"))
Mos <- data.frame(Month = seq(min(DFR$Month), max(DFR$Month), 1/12))

sqldf("select 
    min(b.Month) From__yearmon, 
    max(b.Month) To__yearmon, 
    count(distinct b.PhysicianID) Num
  from 
    Mos a 
    left join DFR b 
      on b.Month between a.Month - 2./12.- 0.001 and a.Month + 0.001
  group by a.Month
  having count(distinct b.Month) = 3
", method = "name__class")

给出

       From       To Num
1  Jan 2016 Mar 2016   4
2  Feb 2016 Apr 2016   4
3  Mar 2016 May 2016   3
4  Apr 2016 Jun 2016   3
5  May 2016 Jul 2016   3
6  Jun 2016 Aug 2016   3
7  Jul 2016 Sep 2016   3
8  Aug 2016 Oct 2016   3
9  Sep 2016 Nov 2016   3
10 Oct 2016 Dec 2016   2

更新

已更新,以处理有缺失月份的情况。 关于问题中的样本数据,由于没有缺失月份,所以答案与以前相同。

© www.soinside.com 2019 - 2024. All rights reserved.