PostgresSql 回填数据(如果在开始期间不存在)

问题描述 投票:0回答:1

假设我有一个包含 3 列的表:Id、Value、日期时间。我需要找到提供的开始时间和结束时间之间每个 1 分钟窗口的最小值、最大值、第一个和最后一个值。如果开始时间不存在行,我需要在开始之前回填最后一个值,甚至对于任何丢失的窗口也使用最后一个值。

例如让我们假设一个表格,简化为只包含时间:


    Id Time    Value
    1 10:10:05 3
    1 10:11:06 4
    1 10:13:13 5
    1 10:13:19 9
    1 10:13:32 8
    1 10:14:35 2 

如果我想要从开始时间 10:12:00 到结束时间 10:14:00 的结果。

它将给出 12-13 分钟的窗口,如下信息:

start value = 4 (backfill from the last value in data from the preceeding value) 
end value = 4
min value = 4
max value = 4
minute start = 10:12:00
minute end = 10:13:00

对于 13-14 分钟的窗口,请参阅以下信息:

start value = 4 (backfill from the last value in data from the preceeding value) 
end value = 8
min value = 4
max value = 9
minute start = 10:13:00
minute end = 10:14:00

基本上任何窗口都使用最后一个窗口值。

有查询可以做到吗?如果上面的查询非常复杂,我们至少可以做一个查询,简单地给出所提供的开始时间和结束时间之间的值,但是如果开始时间不存在行,它简单地给出开始时间之前的最后一个值。基本上,我总是有一个开始时间的值。

例如,如果我询问开始时间和结束时间 10:12:00 和 10:14:00 之间的值,它将给出以下值:


    1 10:12:00 4 (back fill from last value)
    1 10:13:13 5
    1 10:13:19 9
    1 10:13:32 8

剩下的我会以编程方式完成。

sql postgresql time-series
1个回答
0
投票

结果:

最简单的方法可能是概述每分钟哪些值适用,然后对它们进行分组以获取最小值/最大值,并根据开始值和结束值进行区分(排序为降序/升序)。

了解概述有一些问题需要克服,所以我试图阐明为什么要在 SQL 中使用注释来采取某些操作。

启动表:

我将问题中给出的表命名为“main”,因为它缺少名称。

-- INIT table
CREATE TABLE main (
  id integer,
  "time" timestamp without time zone,
  value integer
);

-- Fill table
INSERT INTO main values 
    (1,'2024-01-01 10:10:05', 3),
    (1,'2024-01-01 10:11:06', 4),
    (1,'2024-01-01 10:13:13', 5),
    (1,'2024-01-01 10:13:19', 9),
    (1,'2024-01-01 10:13:32', 8),
    (1,'2024-01-01 10:14:35', 2);

对结果进行概述:

-- overview is a CTE to create an overview of all the time durations a value spans. So from the variable time, till the time of the next record, renamed end. This is also declared in minutes so we can use it later on.

with overview as (
    SELECT 
        distinct on (a.time) 
        a.id, 
        a.time, 
        b.time as "end", 
        a.value, 
        date_trunc('minute', a.time) as minute_start, 
        date_trunc('minute', b.time) as minute_end 
    FROM 
        main a 
    left join 
        main b 
    on 
        a."time"<b."time" and 
        a.id = b.id 
    order by 
        a.time, b.time asc
    ),

--overview 2 makes an overview of records which are applicable for each minute. This means any record which was originaly in main, unioned with any record which spans into another minute. These will be added once with the original time (they count in their original minute) and once with a time equal to the new minute (they also count in the followup minute)

overview2 as (
    select 
        id, 
        date_trunc('minute', "end") as time, 
        date_trunc('minute', "end") as minute, 
        value, 
        true as backfill 
    from 
        overview 
    where 
        minute_start <> minute_end
    UNION ALL
    select 
        id, 
        time, 
        date_trunc('minute', time) as minute, 
        value, 
        false as backfill 
    from 
        overview
    ),  

-- then we have an issue with minutes which are jumped by all values. so there is for example no time in minute 12. Eventhough minute 13 gets the startdate from minute 11 in overview2, minute 12 still has no backfilled startvalue. These are filled in overview3. This is done, by generateing a list of all minutes. Checking which are missing, fetching the right values for them by joining on main on the condition that the main time is smaller then the current time, and then picking the value where the time is maximzed (via distinct on)

overview3 as (
    select 
        * 
    from 
        overview2 
    UNION ALL (
        Select 
            distinct on (a.missingminute) 
            c.id, 
            a.missingminute as time, 
            a.missingminute as minute, 
            c.value, 
            true as backfill 
        from (
            SELECT 
                date_trunc('minute', time.time) as missingminute
            FROM 
                generate_series((select min(minute) from overview2),(select max(minute) from overview2),'1 minute'::interval) time 
            left join (
                select distinct 
                    minute 
                from 
                    overview2
                ) b 
            on 
                date_trunc('minute', time) = b.minute 
            where 
                b.minute isnull
            ) a 
        left join 
            main c 
        on 
            a.missingminute > c.time 
        order by 
            a.missingminute, 
            c.time desc
        ) 
    order by 
        time
    )

-- now a full overview is generated, it just comes down to generaing the wanted values, and placing them in a result table
select 
    t1.id, 
    t1.minute as minute_start, 
    t1.minute + interval '1 minute' as minute_end, 
    t1.backfill as start_backfill,
    t1.start, 
    t2.end, 
    -- below values are coalesce with startvalue since if a minute is missing, then it will only have a startvalue. In that case min = max = start. So if min or mox are null then they are start
    coalesce(t3.min, t1.start) as min, 
    coalesce(t3.max, t1.start) as max 
from 
    (select distinct on (id, minute) id, minute, value as start, backfill from overview3 order by id, minute, time asc) t1 
left join
    (select distinct on (id, minute) id, minute, value as end from overview3 order by id, minute, time desc) t2 on t1.id = t2.id and t1.minute = t2.minute 
left join
    (select id, minute, min(value) min, max(value) max from overview2 group by id,minute) t3 on t1.id = t3.id and t1.minute = t3.minute
© www.soinside.com 2019 - 2024. All rights reserved.