获取本地化日期的毫秒数,同时考虑夏令时

问题描述 投票:1回答:2

我在Google BigQuery中有数据如下所示:


sample_date_time_UTC     time_zone       milliseconds_between_samples
--------                 ---------       ----------------------------
2019-03-31 01:06:03 UTC  Europe/Paris    60000
2019-03-31 01:16:03 UTC  Europe/Paris    60000
...

预计数据样本会定期,由milliseconds_between_samples字段的值表示:

time_zone是代表Google Cloud Supported Timezone Value的字符串


然后,我检查实际样本数与任何特定日期的预期数量的比率,对于任何一天的范围(表示为给定time_zone的本地日期):

with data as 
  ( 
    select 
      -- convert sample_date_time_UTC to equivalent local datetime for the timezone
      DATETIME(sample_date_time_UTC,time_zone) as localised_sample_date_time, 
      milliseconds_between_samples 
    from  `mytable` 
    where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
  ) 

select date(localised_sample_date_time) as localised_date, count(*)/(86400000/avg(milliseconds_between_samples)) as ratio_of_daily_sample_count_to_expected 
from data 
group by localised_date 
order by localised_date 

问题是这有一个错误,因为我已经将一天中预期的毫秒数硬编码到86400000。这是不正确的,因为当夏令时在指定的time_zoneEurope/Paris)开始时,一天缩短1小时。当夏令时结束时,这一天会延长1小时。

所以,上面的查询是不正确的。它在Europe/Paris时区(即在该时区开始夏令时)查询今年3月31日的数据。那天的毫秒应该是82800000

在查询中,如何获得指定localised_date的正确毫秒数?

更新:

我尝试这样做以查看它返回的内容:

select DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000+02:00', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000+01:00', 'Europe/Paris'), MILLISECOND)

这没用 - 我得到了86400000

datetime google-bigquery dst
2个回答
1
投票

您可以通过删除+01:00+02:00来获得两个时间戳的差异(以毫秒为单位)。请注意,这给出了UTC中时间戳的差异:90000000,它与传递的实际毫秒数不同。

你可以做这样的事情来获得一天的毫秒数:

select 86400000 + (86400000 - DATETIME_DIFF(DATETIME('2019-04-01 00:00:00.000000', 'Europe/Paris'), DATETIME('2019-03-31 00:00:00.000000', 'Europe/Paris'), MILLISECOND))

1
投票

感谢@Juta,提示使用UTC时间进行计算。由于我按照本地化日期对每天的数据进行分组,我发现通过获取开始和结束日期时间(以UTC为单位),我可以使用以下逻辑计算我的“本地化”日期,从而计算出每天的毫秒数:

-- get UTC start datetime for localised date
-- get UTC end datetime for localised date

-- this then gives the milliseconds for that localised date:
datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND);

所以,我的完整查询变为:

with daily_sample_count as (
  with data as 
    ( 
      select 
        -- get the date in the local timezone, for sample_date_time_UTC
        DATE(sample_date_time_UTC,time_zone) as localised_date, 
        milliseconds_between_samples 
      from  `mytable` 
      where sample_date_time between '2019-03-31 00:00:00.000000+01:00' and '2019-04-01 00:00:00.000000+02:00'
    ) 

  select
    localised_date,
    count(*) as daily_record_count,
    avg(milliseconds_between_samples) as daily_avg_millis_between_samples,
    datetime(timestamp(localised_date, time_zone)) as utc_start_datetime,
    datetime(timestamp(date_add(localised_date, interval 1 day), time_zone)) as utc_end_datetime
  from data 
)

select
  localised_date,
  -- apply calculation for ratio_of_daily_sample_count_to_expected
  -- based on the actual vs expected number of samples for the day
  -- no. of milliseconds in the day changes, when transitioning in/out of daylight saving - so we calculate milliseconds in the day
  daily_record_count/(datetime_diff(utc_end_datetime, utc_start_datetime, MILLISECOND)/daily_avg_millis_between_samples) as ratio_of_daily_sample_count_to_expected
from
  daily_sample_count
© www.soinside.com 2019 - 2024. All rights reserved.