我有一个列表,其中每个元素都是具有相同列名的数据框,其中一列属于Interval类(来自lubridate包)。我想将列表中的所有单个数据帧绑定到一个数据帧中。不幸的是,使用rbind和bind_rows将interval列强制转换为数字,并且我收到以下警告。
警告消息:1:在bind_rows_(x,.id)中:向量化“间隔”元素可能不会保留其属性
library(dplyr)
library(lubridate)
#Create sample list length 2 actually list length ~18,000
test <- list(BGC119AP01 = structure(list(participant_code = "BGC119AP01",
interval_1 = new("Interval", .Data = 34128000, start = structure(1479427200, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), tzone = "UTC")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -1L), groups = structure(list(
participant_code = "BGC119AP01", .rows = list(1L)), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"), .drop = FALSE)),
BGC119AP02 = structure(list(participant_code = "BGC119AP02",
interval_1 = new("Interval", .Data = 34128000, start = structure(1479427200, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), tzone = "UTC")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -1L), groups = structure(list(
participant_code = "BGC119AP02", .rows = list(1L)), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"), .drop = FALSE)))
#Attempt bind rows both ending in the above warning.
do.call(rbind, test)
do.call(bind_rows, test)
输出请注意,interval_1已被强制变为double并丢失了其属性
# A tibble: 2 x 2
# Groups: participant_code [2]
participant_code interval_1
<chr> <dbl>
1 BGC119AP01 34128000
2 BGC119AP02 34128000
Warning messages:
1: In bind_rows_(x, .id) :
Vectorizing 'Interval' elements may not preserve their attributes
2: In bind_rows_(x, .id) :
Vectorizing 'Interval' elements may not preserve their attributes
大概是因为类间隔的列不是原子向量。我知道我可以通过保留原始的开始日期和结束日期,然后在绑定行之后创建间隔列来解决此问题,但是我想要一个解决方案,使我可以绑定列表中的所有单个数据帧,同时保持完整性类间隔的列的最大值,并且解决方案可扩展到18,000行。提前非常感谢
有提示,当您在加载do.call(rbind, test)
的情况下执行dplyr
并收到警告时:
Warning messages:
1: In bind_rows_(x, .id) :
Vectorizing 'Interval' elements may not preserve their attributes
dplyr::bind_rows()
实际上是被调用而不是base::rbind()
,并且间隔属性已删除。当对象是小对象(tbl
或tbl_df
类)时,似乎会发生这种情况。
您可以通过使用rbind.data.frame()
来避免这种情况:
do.call(rbind.data.frame, test)
# A tibble: 2 x 2
# Groups: participant_code [1]
participant_code interval_1
* <chr> <Interval>
1 BGC119AP01 2016-11-18 UTC--2017-12-18 UTC
2 BGC119AP02 2016-11-18 UTC--2017-12-18 UTC