问题:
我的表格结构如下:
+----+------+-------------------+-------+
| id | code | size | count |
+----+------+-------------------+-------+
| 1 | CA | TwentyToFortyNine | 65 |
+----+------+-------------------+-------+
| 1 | CA | FiveToNinteen | 385 |
+----+------+-------------------+-------+
| 1 | CA | OneToFour | 492 |
+----+------+-------------------+-------+
| 1 | DK | OneToFour | 38 |
+----+------+-------------------+-------+
| 1 | DK | TwentyToFortyNine | 1 |
+----+------+-------------------+-------+
| 2 | CA | FiveToNinteen | 389 |
+----+------+-------------------+-------+
| 2 | CA | OneToFour | 494 |
+----+------+-------------------+-------+
| 2 | DK | FiveToNinteen | 10 |
+----+------+-------------------+-------+
| 2 | DK | OneToFour | 38 |
+----+------+-------------------+-------+
size
列中的不同值是:OneToFour
、FiveToNinteen
和 TwentyToFortyNine
。但是,并非所有代码条目都包含所有这些值。例如,id: 2
没有 size: TwentyToFortyNine
的任何数据。
目标:
我的目标是为每个标有
id
的 code: TOTAL
生成新行,展示 size
列中的所有不同值及其相应的总计 count
。
输出示例:
+----+-------+-------------------+-------+
| id | code | size | count |
+----+-------+-------------------+-------+
| 1 | TOTAL | OneToFour | 530 |
+----+-------+-------------------+-------+
| 1 | TOTAL | FiveToNinteen | 385 |
+----+-------+-------------------+-------+
| 1 | TOTAL | TwentyToFortyNine | 66 |
+----+-------+-------------------+-------+
| 1 | CA | TwentyToFortyNine | 65 |
+----+-------+-------------------+-------+
| 1 | CA | FiveToNinteen | 385 |
+----+-------+-------------------+-------+
| 1 | CA | OneToFour | 492 |
+----+-------+-------------------+-------+
| 1 | DK | OneToFour | 38 |
+----+-------+-------------------+-------+
| 1 | DK | TwentyToFortyNine | 1 |
+----+-------+-------------------+-------+
| 2 | TOTAL | OneToFour | 532 |
+----+-------+-------------------+-------+
| 2 | TOTAL | FiveToNinteen | 399 |
+----+-------+-------------------+-------+
| 2 | TOTAL | TwentyToFortyNine | 0 |
+----+-------+-------------------+-------+
| 2 | CA | FiveToNinteen | 389 |
+----+-------+-------------------+-------+
| 2 | CA | OneToFour | 494 |
+----+-------+-------------------+-------+
| 2 | DK | FiveToNinteen | 10 |
+----+-------+-------------------+-------+
| 2 | DK | OneToFour | 38 |
+----+-------+-------------------+-------+
是否可以使用AWS Athena来实现?如果是,那么你能告诉我如何做吗?或者我需要为此使用 Pandas 吗?
我终于弄清楚了。如果有人来寻找答案,那么这可能会有所帮助:
WITH all_sizes AS (
SELECT id, code, size, SUM(count) as total_count
FROM your_table_name
GROUP BY id, code, size
),
all_combinations AS (
SELECT id, 'TOTAL' as code, size, SUM(total_count) as count
FROM all_sizes
GROUP BY id, size
UNION
SELECT id, code, size, total_count as count
FROM all_sizes
)
SELECT * FROM all_combinations
ORDER BY id, code, size;