我有一个数据集,其日期以 UTC 形式存储在数据库中,但是,时区实际上是不同的。
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
我想将时区应用于整个列中保存的 UTC 时间戳。
我查看了 lubridate 包中的
with_tz
函数,但我不知道如何引用“时区”列,而不是在值中进行硬编码。
如果我尝试
with_tz(mydat$time_stamp, tzone = mydat$timezone)
我收到以下错误
Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value`
但是,如果我尝试
mydat$time_stamp2 <- with_tz(mydat$time_stamp,"America/New_York")
这将毫无问题地呈现一个新列。如何仅引用列值来做到这一点?
以下应该满足您的要求:
mydat <- data.frame(time_stamp=c("2022-08-01 05:00:00 UTC",
"2022-08-01 17:00:00 UTC",
"2022-08-02 22:30:00 UTC",
"2022-08-04 05:00:00 UTC",
"2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago", "America/New_York",
"America/Los_Angeles", "America/Denver",
"America/New_York"))
mydat$utc <- anytime::utctime(mydat$time_stamp, tz="UTC")
mydat$format <- ""
for (i in seq_len(nrow(mydat)))
mydat[i, "format"] <- strftime(mydat[i,"utc"],
"%Y-%m-%d %H:%M:%S",
tz=mydat[i,"timezone"])
> mydat
time_stamp timezone utc format
1 2022-08-01 05:00:00 UTC America/Chicago 2022-08-01 05:00:00 2022-08-01 00:00:00
2 2022-08-01 17:00:00 UTC America/New_York 2022-08-01 17:00:00 2022-08-01 13:00:00
3 2022-08-02 22:30:00 UTC America/Los_Angeles 2022-08-02 22:30:00 2022-08-02 15:30:00
4 2022-08-04 05:00:00 UTC America/Denver 2022-08-04 05:00:00 2022-08-03 23:00:00
5 2022-08-05 02:00:00 UTC America/New_York 2022-08-05 02:00:00 2022-08-04 22:00:00
>
我们首先将您的数据解析为 UTC,我曾经在我的 anytime 包中为此编写了一个辅助函数(还有其他选择,但这就是我的做法......)。然后,我们需要从给定的(数字!!)UTC 表示形式格式化为给定的时区。我们需要一个循环,因为
tz
的 strftime()
参数没有矢量化。
Dirk 给出了一个很好的答案,它使用(大部分)基础 R 工具,如果这是您的要求的话。我还想添加一个使用我开发的 clock 包的答案,因为它不需要在数据帧上按行工作。时钟有一个名为
sys_time_info()
的函数,它检索有关特定时区的 UTC 时间点的低级信息。它是少数几个具有向量化 zone
参数(您在此处需要)并从 UTC 返回 offset
有意义的函数之一,这对于转换为“本地”时间非常有用。
正如其他人提到的,您将无法构造一个在其中存储多个时区的日期时间向量,但如果您只需要查看这些区域的当地时间,这仍然很有用.
library(clock)
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
# Parse into a "sys-time" type, which can be thought of as a UTC time point
mydat$time_stamp <- sys_time_parse(mydat$time_stamp, format = "%Y-%m-%d %H:%M:%S")
mydat
#> time_stamp timezone
#> 1 2022-08-01T05:00:00 America/Chicago
#> 2 2022-08-01T17:00:00 America/New_York
#> 3 2022-08-02T22:30:00 America/Los_Angeles
#> 4 2022-08-04T05:00:00 America/Denver
#> 5 2022-08-05T02:00:00 America/New_York
# "Low level" information about DST, the time zone abbreviation,
# and offset from UTC in that zone. This is one of the few functions where
# it makes sense to have a vectorized `zone` argument.
info <- sys_time_info(mydat$time_stamp, mydat$timezone)
info
#> begin end offset dst abbreviation
#> 1 2022-03-13T08:00:00 2022-11-06T07:00:00 -18000 TRUE CDT
#> 2 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
#> 3 2022-03-13T10:00:00 2022-11-06T09:00:00 -25200 TRUE PDT
#> 4 2022-03-13T09:00:00 2022-11-06T08:00:00 -21600 TRUE MDT
#> 5 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
# Add the offset to the sys-time and then convert to a character column
# (these times don't really represent sys-time anymore since they are now localized)
mydat$localized <- as.character(mydat$time_stamp + info$offset)
mydat
#> time_stamp timezone localized
#> 1 2022-08-01T05:00:00 America/Chicago 2022-08-01T00:00:00
#> 2 2022-08-01T17:00:00 America/New_York 2022-08-01T13:00:00
#> 3 2022-08-02T22:30:00 America/Los_Angeles 2022-08-02T15:30:00
#> 4 2022-08-04T05:00:00 America/Denver 2022-08-03T23:00:00
#> 5 2022-08-05T02:00:00 America/New_York 2022-08-04T22:00:00