我有一个数据集,它是这样的
doc date value
2345 201902 470942
2345 201903 470044
2345 201904 470
2345 201905 35000 ...
而我想变成这样
doc date value value_1m value_2m value_3m
2345 201905 35000 470 470044 470942
2345 201904 470 470044 470942 ...
你可以看到新的列value_1m,value_2m,value_3m是前几个月的值201904,201903,201902等。
我试过用(CASE WHEN键),但我的变量 "date "是一个数字,而且每个月都有,所以我不能用它。
我是这个论坛的新成员,所以请原谅,如果它不是太清楚,提前感谢。
在Impala中,你可以尝试像这样的东西
您的数据表
+---------------------+-----------------------+-----------------------+--+
| doc_date_value.doc | doc_date_value.cdate | doc_date_value.value |
+---------------------+-----------------------+-----------------------+--+
| 2345 | 201902 | 470942 |
| 2345 | 201903 | 470044 |
| 2345 | 201904 | 470 |
| 2345 | 201905 | 35000 |
+---------------------+-----------------------+-----------------------+--+
查询和多个带窗口功能的子查询,导致
WITH t2 AS(
WITH t AS(
SELECT *, LEAD(value,1,0) OVER(PARTITION BY doc ORDER BY cdate DESC) as value_1m
FROM doc_date_value
ORDER BY cdate DESC)
SELECT doc, cdate,value, value_1m,
LEAD(value_1m,1,0) OVER(PARTITION BY doc ORDER BY cdate DESC) as value_2m
FROM t)
SELECT doc, cdate,value, value_1m, value_2m,
LEAD(value_2m,1,0) OVER(PARTITION BY doc ORDER BY cdate DESC) as value_3m
FROM t2;
预期产出
+------+--------+--------+----------+----------+----------+
| doc | cdate | value | value_1m | value_2m | value_3m |
+------+--------+--------+----------+----------+----------+
| 2345 | 201905 | 35000 | 470 | 470044 | 470942 |
| 2345 | 201904 | 470 | 470044 | 470942 | 0 |
| 2345 | 201903 | 470044 | 470942 | 0 | 0 |
| 2345 | 201902 | 470942 | 0 | 0 | 0 |
+------+--------+--------+----------+----------+----------+
你可以在Impala或Hive上运行这个查询。