不确定问这个问题的正确用词,所以我会分解它。
我有一个表格如下:
date_time | a | b | c
最后 4 行:
15/10/2013 11:45:00 | null | 'timtim' | 'fred'
15/10/2013 13:00:00 | 'tune' | 'reco' | null
16/10/2013 12:00:00 | 'abc' | null | null
16/10/2013 13:00:00 | null | 'died' | null
我如何获取最后一条记录,但该值忽略空值,而是从上一条记录中获取值。
在我提供的示例中,返回的行是
16/10/2013 13:00:00 | 'abc' | 'died' | 'fred'
如您所见,如果列的值为空,那么它将转到具有该列值的最后一条记录并使用该值。
这应该是可能的,我只是想不通。到目前为止我只想到:
select
last_value(a) over w a
from test
WINDOW w AS (
partition by a
ORDER BY ts asc
range between current row and unbounded following
);
但这仅适用于单个列......
“最后一行”和排序顺序需要明确定义。集合(或表)中没有自然顺序。我假设
ORDER BY ts
,其中 ts
是时间戳列。ts
不是UNIQUE
,我们需要在ORDER BY
中添加决胜局以使排序顺序确定。主键很好用。
要获取每行的结果:
SELECT ts
, max(a) OVER (PARTITION BY grp_a) AS a
, max(b) OVER (PARTITION BY grp_b) AS b
, max(c) OVER (PARTITION BY grp_c) AS c
FROM (
SELECT *
, count(a) OVER (ORDER BY ts) AS grp_a
, count(b) OVER (ORDER BY ts) AS grp_b
, count(c) OVER (ORDER BY ts) AS grp_c
FROM tbl
) sub;
聚合函数
count()
在计数时忽略NULL值。用作聚合窗口函数,它根据默认窗口定义计算列的运行计数,即RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
。 NULL 值不会增加计数,因此这些行与最后一个非空值属于同一对等组。max()
或 min() 可以轻松提取每组唯一的非空值。
WITH cte AS (
SELECT *
, count(a) OVER w AS grp_a
, count(b) OVER w AS grp_b
, count(c) OVER w AS grp_c
FROM tbl
WINDOW w AS (ORDER BY ts)
)
SELECT ts
, max(a) OVER (PARTITION BY grp_a) AS a
, max(b) OVER (PARTITION BY grp_b) AS b
, max(c) OVER (PARTITION BY grp_c) AS c
FROM cte
ORDER BY ts DESC
LIMIT 1;
SELECT ts
, COALESCE(a, (SELECT a FROM tbl WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS a
, COALESCE(b, (SELECT b FROM tbl WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS b
, COALESCE(c, (SELECT c FROM tbl WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS c
FROM tbl
ORDER BY ts DESC
LIMIT 1;
或者:
SELECT (SELECT ts FROM tbl ORDER BY ts DESC LIMIT 1) AS ts
, (SELECT a FROM tbl WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1) AS a
, (SELECT b FROM tbl WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1) AS b
, (SELECT c FROM tbl WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1) AS c
虽然这应该相当快,但如果性能是您的首要要求,请考虑 plpgsql 函数。从最后一行开始,按降序循环,直到所需的每一列都具有非空值。沿着这些思路:
这里我创建了一个聚合函数,将列收集到数组中。然后只需删除 NULL 并从每个数组中选择最后一个元素即可。
样本数据
CREATE TABLE T (
date_time timestamp,
a text,
b text,
c text
);
INSERT INTO T VALUES ('2013-10-15 11:45:00', NULL, 'timtim', 'fred'),
('2013-10-15 13:00:00', 'tune', 'reco', NULL ),
('2013-10-16 12:00:00', 'abc', NULL, NULL ),
('2013-10-16 13:00:00', NULL, 'died', NULL );
解决方案
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
WITH latest_nonull AS (
SELECT MAX(date_time) As MaxDateTime,
array_remove(array_accum(a), NULL) AS A,
array_remove(array_accum(b), NULL) AS B,
array_remove(array_accum(c), NULL) AS C
FROM T
ORDER BY date_time
)
SELECT MaxDateTime, A[array_upper(A, 1)], B[array_upper(B,1)], C[array_upper(C,1)]
FROM latest_nonull;
结果
maxdatetime | a | b | c
---------------------+-----+------+------
2013-10-16 13:00:00 | abc | died | fred
(1 row)
这应该可行,但请记住这是一个丑陋的解决方案
select * from
(select dt from
(select rank() over (order by ctid desc) idx, dt
from sometable ) cx
where idx = 1) dtz,
(
select a from
(select rank() over (order by ctid desc) idx, a
from sometable where a is not null ) ax
where idx = 1) az,
(
select b from
(select rank() over (order by ctid desc) idx, b
from sometable where b is not null ) bx
where idx = 1) bz,
(
select c from
(select rank() over (order by ctid desc) idx, c
from sometable where c is not null ) cx
where idx = 1) cz
在小提琴上查看:http://sqlfiddle.com/#!15/d5940/40
结果将会是
DT A B C
October, 16 2013 00:00:00+0000 abc died fred