我有一个包含 Parquet 文件的 AWS S3 数据湖,其结构如下:
s3://bucket/device/table_x/year=2000/month=01/day=02/xyz.parquet
我的目标是使用 AWS Athena 查询数据以在 Grafana 仪表板中显示。我的挑战是,为了创建任何时间段的动态面板,同时也利用我的分区,我需要找到一种方法将我的数据限制在我的
WHERE
部分中的相关时间段 - 但要以某种方式做到这一点这将适用于数年、数月和数天 - 无需根据查询构建 SQL 语句。
我现在最好的建议是下面的查询,它应该有效 - 但它很复杂。对于这样的声明有推荐的最佳实践吗?
SELECT
Count(a1) as AVG_a1
FROM
tbl_11111111_a
WHERE
(
-- Same year, same month
(year = 'START_YEAR' AND month = 'START_MONTH' AND day BETWEEN 'START_DAY' AND 'END_DAY')
OR
-- Same year, different months
(year = 'START_YEAR' AND month = 'START_MONTH' AND day >= 'START_DAY')
OR
(year = 'START_YEAR' AND month > 'START_MONTH' AND month < 'END_MONTH' AND day BETWEEN '01' AND '31')
OR
(year = 'START_YEAR' AND month = 'END_MONTH' AND day <= 'END_DAY')
OR
-- Different years
(year > 'START_YEAR' AND year < 'END_YEAR')
OR
(year = 'END_YEAR' AND month < 'END_MONTH' AND day BETWEEN '01' AND '31')
OR
(year = 'END_YEAR' AND month = 'END_MONTH' AND day <= 'END_DAY')
)
AND
t BETWEEN TIMESTAMP 'START_YEAR-START_MONTH-START_DAY 00:00:00' AND TIMESTAMP 'END_YEAR-END_MONTH-END_DAY 00:00:00'
我现在最好的建议是下面的查询,它应该有效 - 但它很复杂。对于这样的声明有推荐的最佳实践吗?
我认为您还可以使用
CASE
表达式将数据过滤到 START_YEAR、START_MONTH、START_DAY、END_YEAR、END_MONTH 和 END_DAY 参数指定的时间段。
SELECT
Count(a1) as AVG_a1
FROM
tbl_11111111_a
WHERE
CASE
WHEN year = START_YEAR AND month = START_MONTH AND day BETWEEN START_DAY AND END_DAY THEN 1
WHEN year = START_YEAR AND month = START_MONTH AND day >= START_DAY THEN 1
WHEN year = START_YEAR AND month > START_MONTH AND month < END_MONTH AND day BETWEEN '01' AND '31' THEN 1
WHEN year = START_YEAR AND month = END_MONTH AND day <= END_DAY THEN 1
WHEN year > START_YEAR AND year < END_YEAR THEN 1
WHEN year = END_YEAR AND month < END_MONTH AND day BETWEEN '01' AND '31' THEN 1
WHEN year = END_YEAR AND month = END_MONTH AND day <= END_DAY THEN 1
ELSE 0
END = 1
AND
t BETWEEN TIMESTAMP CONCAT(START_YEAR, '-', START_MONTH, '-', START_DAY, ' 00:00:00') AND TIMESTAMP CONCAT(END_YEAR, '-', END_MONTH, '-', END_DAY, ' 00:00:00')
附注
希望一切顺利。请告诉我它是否适合您。