我有一组指数值随着时间的推移而增加和减少。我希望确定价值上升和价值下降的时间段。数据如下所示:
我尝试按范围划分值,我绝对不认为我做得对。这是我写的查询,充其量只是给我订购日期
SELECT
date,
MAX(index) OVER (PARTITION BY MAX(CAST(index AS numeric))
ORDER BY
date)
FROM (
SELECT
(value1 - value2) AS index,
date
FROM
`project.dataset.table` )
GROUP BY
date,
index
ORDER BY
date
我的最终目的是,当我被要求获得最小值时,我想要实现一个类似于此类的查询,同样也适用于最大值
Row | date | minimas
-------------------------------------
1 | 2017-02-04 | 0.3149100257069409
2 | 2017-12-05 | 0.5784622847441183
处理相邻的重复值很棘手。您没有指定希望如何处理这些内容。如果您只想要第一个这样的值,那么过滤工作:
对于当地最小值:
SELECT Row, date, f0 AS minimal
FROM (SELECT t.*,
LEAD(f0) OVER (ORDER BY DATE) as f0_lead
FROM (SELECT t.*,
LAG(f0) OVER (ORDER BY date) AS f0_lag
FROM `project.dataset.table` t
) t
WHERE f0_lag IS NULL or f0_lag <> f0
) t
WHERE (f0 < f0_lag or f0_lag is null) and
(f0 < f0_lead or f0_lead is null);
或者,如果您愿意,可以简化比较:
SELECT Row, date, f0 AS minimal
FROM (SELECT t.*,
LEAD(f0) OVER (ORDER BY DATE) as f0_lead
FROM (SELECT t.*,
LAG(f0) OVER (ORDER BY date) AS f0_lag
FROM t
) t
WHERE f0_lag IS NULL or f0 < f0_lag
) t
WHERE f0 < f0_lead or f0_lead is null;
当<
s改为>
s时,局部最大值可以遵循相同的逻辑。
Here是一个db <>小提琴(使用Postgres,但这没关系)。
编辑:
连续返回所有最小值/最大值更具挑战性。以下适用于BigQuery:
WITH t AS (
SELECT 1 AS Row, '2017-01-19' AS date, 0.3904 AS f0 UNION ALL
SELECT 2, '2017-02-04', 0.3149 UNION ALL
SELECT 2.5, '2017-02-05', 0.3149 UNION ALL
SELECT 3, '2017-03-24', 0.3302 UNION ALL
SELECT 4, '2017-04-09', 0.5339 UNION ALL
SELECT 5, '2017-05-11', 0.7753 UNION ALL
SELECT 6, '2017-05-27', 0.8539 UNION ALL
SELECT 7, '2017-09-16', 0.8803 UNION ALL
SELECT 7.5, '2017-09-17', 0.8803 UNION ALL
SELECT 7.7, '2017-09-18', 0.8803 UNION ALL
SELECT 8, '2017-10-02', 0.8570 UNION ALL
SELECT 9, '2017-11-03', 0.7744 UNION ALL
SELECT 10, '2017-11-19', 0.6092 UNION ALL
SELECT 11, '2017-12-05', 0.5785
)
SELECT t.*
FROM (SELECT t.*,
MAX(f0_lag) OVER (PARTITION BY grp) as grp_f0_lag,
MAX(f0_lead) OVER (PARTITION BY grp) as grp_f0_lead
FROM (SELECT t.*,
COUNTIF(f0_lag <> f0) OVER (ORDER BY DATE) as grp,
LEAD(f0) OVER (ORDER BY DATE) as f0_lead
FROM (SELECT t.*,
LAG(f0) OVER (ORDER BY date) AS f0_lag
FROM t
) t
) t
) t
WHERE (f0 < grp_f0_lag or grp_f0_lag is null) and
(f0 < grp_f0_lead or grp_f0_lead is null) ;
基本上,这是识别相邻值的组。然后它通过组传播最大lag()
和lead()
值(最大值,你想传播最小值)。
然后将整个组视为一个单元并在结果集中处理。
以下是BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(prev, next),
CASE
WHEN prev < next THEN 'min'
WHEN prev > next THEN 'max'
WHEN prev IS NULL THEN 'start'
WHEN next IS NULL THEN 'finish'
END extremum
FROM (
SELECT *,
SIGN(index - LAG(index) OVER(ORDER BY DAY)) prev,
SIGN(LEAD(index) OVER(ORDER BY DAY) - index) next
FROM `project.dataset.table`
)
WHERE IFNULL(prev != next, TRUE)
您可以使用问题中的示例数据进行测试,使用上面的示例,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2017-01-19' day, 0.39 index UNION ALL
SELECT '2017-02-04', 0.31 UNION ALL
SELECT '2017-03-24', 0.33 UNION ALL
SELECT '2017-04-09', 0.53 UNION ALL
SELECT '2017-05-11', 0.77 UNION ALL
SELECT '2017-05-27', 0.85 UNION ALL
SELECT '2017-09-16', 0.88 UNION ALL
SELECT '2017-10-02', 0.85 UNION ALL
SELECT '2017-11-03', 0.77 UNION ALL
SELECT '2017-11-19', 0.61 UNION ALL
SELECT '2017-12-05', 0.57
)
SELECT * EXCEPT(prev, next),
CASE
WHEN prev < next THEN 'min'
WHEN prev > next THEN 'max'
WHEN prev IS NULL THEN 'start'
WHEN next IS NULL THEN 'finish'
END extremum
FROM (
SELECT *,
SIGN(index - LAG(index) OVER(ORDER BY DAY)) prev,
SIGN(LEAD(index) OVER(ORDER BY DAY) - index) next
FROM `project.dataset.table`
)
WHERE IFNULL(prev != next, TRUE)
-- ORDER BY day
结果
Row day index extremum
1 2017-01-19 0.39 start
2 2017-02-04 0.31 min
3 2017-09-16 0.88 max
4 2017-12-05 0.57 finish
我们可以将局部最小值定义为x轴上的时间点,其中前后的响应值都大于最小点的值。如果两端都有端点,则只需要一个值。我们可以尝试在这里使用LEAD
和LAG
函数:
SELECT Row, date, f0 AS minimal
FROM
(
SELECT Row, date, f0,
LAG(f0, 1, f0 + 0.1) OVER (ORDER BY date) AS f0_lag,
LEAD(f0, 1, f0 + 0.1) OVER (ORDER BY date) AS f0_lead
FROM project.dataset.table
) t
WHERE f0 < f0_lag AND f0 < f0_lead;
这是使用您的样本数据的demo in SQL Server。由于我的答案是基于SQL Server,因为我无法访问BigQuery,您可能需要稍微调整我使用的语法。