在PostgreSQL中获取具有不一致数据的移动平均值

问题描述 投票:0回答:2

我有一个名为answers的表,列created_atresponse,响应是整数0(表示'no'),1(表示'是'),或2(表示'不知道')。我希望获得响应值的移动平均值,每天过滤掉2个,只考虑前30天。我知道你可以做ROWS BETWEEN 29 AND PRECEDING AND CURRENT ROW,但只有你每天都有数据才有效,而在我的情况下,可能没有一周或更长时间的数据。

我目前的查询是这样的:

SELECT answers.created_at, answers.response,
    AVG(answers.response)
      OVER(ORDER BY answers.created_at::date ROWS 
        BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_average
  FROM answers
  WHERE answers.user_id = 'insert_user_id'' 
    AND (answers.response = 0 OR answers.response = 1)
  GROUP BY answers.created_at, answers.response
  ORDER BY answers.created_at::date

但是这将返回基于之前行的平均值,如果用户在1上使用2018-3-30并在0上使用2018-5-15,则2018-5-15的滚动平均值将是0.5而不是0。如何创建一个查询,该查询仅考虑在过去30天内为滚动平均值创建的响应?

sql postgresql window-functions moving-average
2个回答
0
投票

尝试这样的事情:

SELECT * FROM ( SELECT d.created_at, d.response, Avg(d.response) OVER(ORDER BY d.created_at::date rows BETWEEN 29 PRECEDING AND CURRENT row) AS rolling_average FROM ( SELECT COALESCE(a.created_at, d.dates) AS created_at, response, a.user_id FROM (SELECT generate_series('2018-01-01'::date, '2018-05-31'::date, '1day'::interval)::date AS dates) d LEFT JOIN (SELECT * FROM answers WHERE answers.user_id = 'insert_user_id' AND ( answers.response = 0 OR answers.response = 1)) a ON d.dates = a.created_at::date ) d GROUP BY d.created_at, d.response ) agg WHERE agg.response IS NOT NULL ORDER BY agg.created_at::date

  • generate_series创建日期列表 - 您必须设置合理的边界
  • 这个日期列表是LEFT JOINED和预先选择的答案
  • 该结果用于滚动平均计算
  • 在它之后我只选择带有响应的记录,我得到:

created_at | response | rolling_averagte 2018-03-30 | 1 | 1.00000000000000000000 2018-05-15 | 0 | 0.00000000000000000000


0
投票

从Postgres 11开始,您可以这样做:

SELECT created_at, 
       response,
       AVG(response) OVER (ORDER BY created_at 
                           RANGE BETWEEN '29 day' PRECEDING AND current row) AS rolling_average 
FROM answers
WHERE user_id = 1
  AND response in (0,1)
ORDER BY created_at;
© www.soinside.com 2019 - 2024. All rights reserved.