SQLite中的移动平均值

问题描述 投票:2回答:2

我想在SQLite表中计算数据的移动平均值。我在MySQL中发现了几种方法,但在SQLite中找不到有效的方法。

在SQL中,我认为这样的事情应该这样做(但是,我无法尝试...):

SELECT date, value, 
avg(value) OVER (ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) as MovingAverageWindow7
FROM t ORDER BY date;

但是,我看到两个缺点:

  • 这似乎不适用于sqlite
  • 如果数据在前一行/后一行上的几个日期不连续,则它计算窗口上的移动平均值,该移动平均值比我实际想要的更宽,因为它仅基于周围行的数量。因此,应添加日期条件

实际上,我希望计算每个日期的“价值”平均值,超过+/- 3天(每周移动平均值)或+/- 15天(每月移动平均值)

这是一个示例数据集:

CREATE TABLE t ( date DATE, value INTEGER );

INSERT INTO t (date, value) VALUES ('2018-02-01', 8);
INSERT INTO t (date, value) VALUES ('2018-02-02', 2);
INSERT INTO t (date, value) VALUES ('2018-02-05', 5);
INSERT INTO t (date, value) VALUES ('2018-02-06', 4);
INSERT INTO t (date, value) VALUES ('2018-02-07', 1);
INSERT INTO t (date, value) VALUES ('2018-02-10', 6);
INSERT INTO t (date, value) VALUES ('2018-02-11', 0);
INSERT INTO t (date, value) VALUES ('2018-02-12', 2);
INSERT INTO t (date, value) VALUES ('2018-02-13', 1);
INSERT INTO t (date, value) VALUES ('2018-02-14', 3);
INSERT INTO t (date, value) VALUES ('2018-02-15', 11);
INSERT INTO t (date, value) VALUES ('2018-02-18', 4);
INSERT INTO t (date, value) VALUES ('2018-02-20', 1);
INSERT INTO t (date, value) VALUES ('2018-02-21', 5);
INSERT INTO t (date, value) VALUES ('2018-02-28', 10);
INSERT INTO t (date, value) VALUES ('2018-03-02', 6);
INSERT INTO t (date, value) VALUES ('2018-03-03', 7);
INSERT INTO t (date, value) VALUES ('2018-03-04', 3);
INSERT INTO t (date, value) VALUES ('2018-03-08', 5);
INSERT INTO t (date, value) VALUES ('2018-03-09', 6);
INSERT INTO t (date, value) VALUES ('2018-03-15', 1);
INSERT INTO t (date, value) VALUES ('2018-03-16', 3);
INSERT INTO t (date, value) VALUES ('2018-03-25', 5);
INSERT INTO t (date, value) VALUES ('2018-03-31', 1);
sqlite moving-average
2个回答
3
投票

我想我实际上找到了一个解决方案:

SELECT date, value, 
  (SELECT AVG(value) FROM t t2 
   WHERE datetime(t1.date, '-3 days') <= datetime(t2.date) AND datetime(t1.date, '+3 days') >= datetime(t2.date)
   ) AS MAVG
FROM t t1
GROUP BY strftime('%Y-%m-%d', date); 

enter image description here

我不知道它是否是最有效的方式,但似乎有效

编辑:应用于包含20 000行的真实数据库,超过两个参数的每周移动平均值需要大约1分钟才能计算出来。

我看到两个选择:

  • 使用SQLite有一种更有效的方法来计算它
  • 我从SQLite中提取数据后计算Python中的移动平均值

0
投票

一种方法是创建一个中间表,将每个日期映射到它所属的组。

CREATE TABLE groups (date DATE, daygroup DATE);
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '-1 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '-2 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '-3 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '+1 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '+2 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, strftime('%Y-%m-%d', datetime(date, '+3 days')) AS daygroup
  FROM t;  
INSERT INTO groups 
  SELECT date, date AS daygroup FROM t;

你得到的例子,

SELECT * FROM groups WHERE date = '2018-02-05'

    date        daygroup
    2018-02-05  2018-02-04
    2018-02-05  2018-02-03
    2018-02-05  2018-02-02
    2018-02-05  2018-02-06
    2018-02-05  2018-02-07
    2018-02-05  2018-02-08
    2018-02-05  2018-02-05

表示'2018-02-05'属于'2018-02-02'到'2018-02-08'组。如果日期属于某个组,则数据的值将加入该组的移动平均值的计算。

有了这个,计算移动平均线变得简单:

SELECT
  d.date, d.value, c.ma
FROM
  t AS d
INNER JOIN 
  (SELECT 
    b.daygroup,
    avg(a.value) AS ma
  FROM 
    t AS a 
  INNER JOIN
    groups AS b
  ON a.date = b.date
  GROUP BY b.daygroup) AS c
ON
  d.date = c.daygroup

请注意,中间表的行数是原始表的7倍,它随着窗口的增大而成比例增长。这应该是可以接受的,除非你有更大的表。

我还试验了20 000行。插入查询花了1.5秒,选择查询花了0.5秒在我的笔记本电脑上。

ADDED, perhaps better.

一种不需要中间表的替代方案。下面的查询将表与自身合并,以允许3天滞后的方式,然后取平均值。

SELECT
  t1.date, avg(t2.value) AS MVG
FROM 
  t AS t1
INNER JOIN
  t AS t2
ON
  datetime(t1.date, '-3 days') <= datetime(t2.date) 
  AND 
  datetime(t1.date, '+3 days') >= datetime(t2.date)
GROUP BY
  t1.date
;
© www.soinside.com 2019 - 2024. All rights reserved.