SQL:查询某个时间段的年/月/日分区数据

问题描述 投票:0回答:1

我有一个包含 Parquet 文件的 AWS S3 数据湖,其结构如下:

s3://bucket/device/table_x/year=2000/month=01/day=02/xyz.parquet

我的目标是使用 AWS Athena 查询数据以在 Grafana 仪表板中显示。我的挑战是,为了创建任何时间段的动态面板,同时也利用我的分区,我需要找到一种方法将我的数据限制在我的

WHERE
部分中的相关时间段 - 但要以某种方式做到这一点这将适用于数年、数月和数天 - 无需根据查询构建 SQL 语句。

我现在最好的建议是下面的查询,它应该有效 - 但它很复杂。对于这样的声明有推荐的最佳实践吗?

SELECT
    Count(a1) as AVG_a1                 
FROM
    tbl_11111111_a
WHERE
    (
        -- Same year, same month
        (year = 'START_YEAR' AND month = 'START_MONTH' AND day BETWEEN 'START_DAY' AND 'END_DAY')
        OR
        -- Same year, different months
        (year = 'START_YEAR' AND month = 'START_MONTH' AND day >= 'START_DAY')
        OR
        (year = 'START_YEAR' AND month > 'START_MONTH' AND month < 'END_MONTH' AND day BETWEEN '01' AND '31')
        OR
        (year = 'START_YEAR' AND month = 'END_MONTH' AND day <= 'END_DAY')
        OR
        -- Different years
        (year > 'START_YEAR' AND year < 'END_YEAR')
        OR
        (year = 'END_YEAR' AND month < 'END_MONTH' AND day BETWEEN '01' AND '31')
        OR
        (year = 'END_YEAR' AND month = 'END_MONTH' AND day <= 'END_DAY')
    )
    AND
    t BETWEEN TIMESTAMP 'START_YEAR-START_MONTH-START_DAY 00:00:00' AND TIMESTAMP 'END_YEAR-END_MONTH-END_DAY 00:00:00'
sql hive
1个回答
0
投票

我现在最好的建议是下面的查询,它应该有效 - 但它很复杂。对于这样的声明有推荐的最佳实践吗?

我认为您还可以使用

CASE
表达式将数据过滤到 START_YEARSTART_MONTHSTART_DAYEND_YEAREND_MONTHEND_DAY 参数指定的时间段。

SELECT
    Count(a1) as AVG_a1                 
FROM
    tbl_11111111_a
WHERE
    CASE
        WHEN year = START_YEAR AND month = START_MONTH AND day BETWEEN START_DAY AND END_DAY THEN 1
        WHEN year = START_YEAR AND month = START_MONTH AND day >= START_DAY THEN 1
        WHEN year = START_YEAR AND month > START_MONTH AND month < END_MONTH AND day BETWEEN '01' AND '31' THEN 1
        WHEN year = START_YEAR AND month = END_MONTH AND day <= END_DAY THEN 1
        WHEN year > START_YEAR AND year < END_YEAR THEN 1
        WHEN year = END_YEAR AND month < END_MONTH AND day BETWEEN '01' AND '31' THEN 1
        WHEN year = END_YEAR AND month = END_MONTH AND day <= END_DAY THEN 1
        ELSE 0
    END = 1
    AND
    t BETWEEN TIMESTAMP CONCAT(START_YEAR, '-', START_MONTH, '-', START_DAY, ' 00:00:00') AND TIMESTAMP CONCAT(END_YEAR, '-', END_MONTH, '-', END_DAY, ' 00:00:00')

附注

希望一切顺利。请告诉我它是否适合您。

© www.soinside.com 2019 - 2024. All rights reserved.