我有一个名为answers
的表,列created_at
和response
,响应是整数0
(表示'no'),1
(表示'是'),或2
(表示'不知道')。我希望获得响应值的移动平均值,每天过滤掉2个,只考虑前30天。我知道你可以做ROWS BETWEEN 29 AND PRECEDING AND CURRENT ROW
,但只有你每天都有数据才有效,而在我的情况下,可能没有一周或更长时间的数据。
我目前的查询是这样的:
SELECT answers.created_at, answers.response,
AVG(answers.response)
OVER(ORDER BY answers.created_at::date ROWS
BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_average
FROM answers
WHERE answers.user_id = 'insert_user_id''
AND (answers.response = 0 OR answers.response = 1)
GROUP BY answers.created_at, answers.response
ORDER BY answers.created_at::date
但是这将返回基于之前行的平均值,如果用户在1
上使用2018-3-30
并在0
上使用2018-5-15
,则2018-5-15
的滚动平均值将是0.5
而不是0
。如何创建一个查询,该查询仅考虑在过去30天内为滚动平均值创建的响应?
尝试这样的事情:
SELECT * FROM (
SELECT
d.created_at, d.response,
Avg(d.response) OVER(ORDER BY d.created_at::date rows BETWEEN 29 PRECEDING AND CURRENT row) AS rolling_average
FROM (
SELECT
COALESCE(a.created_at, d.dates) AS created_at, response, a.user_id
FROM
(SELECT generate_series('2018-01-01'::date, '2018-05-31'::date, '1day'::interval)::date AS dates) d
LEFT JOIN
(SELECT * FROM answers WHERE answers.user_id = 'insert_user_id' AND ( answers.response = 0 OR answers.response = 1)) a
ON d.dates = a.created_at::date
) d
GROUP BY d.created_at, d.response
) agg WHERE agg.response IS NOT NULL
ORDER BY agg.created_at::date
created_at | response | rolling_averagte
2018-03-30 | 1 | 1.00000000000000000000
2018-05-15 | 0 | 0.00000000000000000000
从Postgres 11开始,您可以这样做:
SELECT created_at,
response,
AVG(response) OVER (ORDER BY created_at
RANGE BETWEEN '29 day' PRECEDING AND current row) AS rolling_average
FROM answers
WHERE user_id = 1
AND response in (0,1)
ORDER BY created_at;