因此,我使用 GreenPlum 来处理一个包含超过 400 万行的大表名称 purchases。这是该表的一个示例:
userId | purchaseTime | timeDiff
------------------------------------------
17 | 2016-02-01 11:01:02 |
17 | 2016-02-01 13:24:58 |
17 | 2016-02-01 21:12:36 |
67 | 2016-02-01 17:04:49 |
84 | 2016-02-01 16:13:20 |
94 | 2016-02-01 05:46:13 |
94 | 2016-02-01 21:33:19 |
该表按用户 ID 和购买时间排序,以帮助了解我的目标
我的目标是通过包含每个用户当前行与上次购买时间之间的时间差来更新此表。
使它看起来像这样:
userId | purchaseTime | timeDiff
------------------------------------------
17 | 2016-02-01 11:01:02 | NULL
17 | 2016-02-01 13:24:58 | 2:23:56
17 | 2016-02-01 21:12:36 | 8:12:38
67 | 2016-02-01 17:04:49 | NULL
84 | 2016-02-01 16:13:20 | NULL
94 | 2016-02-01 05:46:13 | NULL
94 | 2016-02-01 21:33:19 | 16:13:06
从您的答案中选择的一个对我有帮助。现在我需要执行更新,但在执行更新时遇到语法错误:
WITH tmp_table AS
(
SELECT userId ,
purchaseTime ,
purchaseTime - LAG(purchaseTime )
OVER (PARTITION BY userId ORDER BY purchaseTime) AS timeDiff
FROM purchases
)
UPDATE purchases SET timeDiff = tmp_table.timeDiff
FROM tmp_table
WHERE userId = tmp_table.userId
AND purchaseTime = tmp_table.purchaseTime;
任何人都可以帮我更新我的表格吗?
您可以使用
lag
窗口函数查找之前的购买日期,只需将两者相减即可:
SELECT userId,
purchaseTime,
purchaseTime -
LAG(purchaseTime) OVER
(PARTITION BY userId ORDER BY purchaseTime) AS timeDiff
FROM purchases
因此,根据@mureinik 的查询,为了进行更新,您必须执行以下操作:
UPDATE purchases
SET timeDiff = tmp_table.timeDiff
FROM (SELECT userId, purchaseTime ,
(EXTRACT(epoch FROM purchaseTime - LAG(purchaseTime) OVER
(PARTITION BY userId ORDER BY purchaseTime))/60)::integer AS timeDiff
FROM purchases) AS tmp_table
WHERE purchases.userId = tmp_table.userId
AND purchases.timeDiff = tmp_table.timeDiff;
在更新中,您将拥有
EXTRACT
和 epoch FROM
语句,这是为了返回间隔中的秒数。如果您希望它们以分钟为单位,则将其除以 60 \60
,最后,如果您想对其进行四舍五入,只需将其转换为 integer
。