使用 data.table 扩展 R 中的列

问题描述 投票:0回答:1

我正在尝试找到一种使用 data.table 使我的数据变成“长”格式的有效方法。请考虑以下示例:


library(data.table)

# Create a data.table with obfuscated data but the same structure
fake_dt <- data.table(
  street_address = paste0("Street ", 1:6),
  city = c("CityA", "CityB", "CityC", "CityD", "CityE", "CityF"),
  region = c("RA", "RB", "RC", "RD", "RE", "RF"),
  date_range_start = rep(as.IDate("2023-01-02"), 6),
  date_range_end = rep(as.IDate("2023-01-08"), 6),
  visits_by_day = rep("[7,7,7,6,0,6,5]", 6),  # Ensure this has 6 elements
  location_name = rep("locationX", 6)
)

head(fake_dt)

 street_address   city region date_range_start date_range_end   visits_by_day
           <char> <char> <char>           <IDat>         <IDat>          <char>
1:       Street 1  CityA     RA       2023-01-02     2023-01-08 [7,7,7,6,0,6,5]
2:       Street 2  CityB     RB       2023-01-02     2023-01-08 [7,7,7,6,0,6,5]
3:       Street 3  CityC     RC       2023-01-02     2023-01-08 [7,7,7,6,0,6,5]
4:       Street 4  CityD     RD       2023-01-02     2023-01-08 [7,7,7,6,0,6,5]
5:       Street 5  CityE     RE       2023-01-02     2023-01-08 [7,7,7,6,0,6,5]
6:       Street 6  CityF     RF       2023-01-02     2023-01-08 [7,7,7,6,0,6,5]
   location_name
          <char>
1:     locationX
2:     locationX
3:     locationX
4:     locationX
5:     locationX
6:     locationX

请注意 date_range_start 和 date_range_end,始终为一周。 Visits_by_day 包含从 2023-01-02 开始到 2023-01-09 的值。第一个值对应于 2023-01-02,第二个值对应于 2023-01-03,依此类推。我希望最终数据集处于日期级别,而不是周级别。因此,每个观察值应有 7 行(考虑到某些日期没有访客的事实)。例如,第一行应扩展为:

street_address   city region       date visits location_name
           <char> <char> <char>     <char>  <num>        <char>
1:       Street 1  CityA     RA 2023-01-02      7     locationX
2:       Street 1  CityA     RA 2023-01-03      7     locationX
3:       Street 1  CityA     RA 2023-01-04      7     locationX
4:       Street 1  CityA     RA 2023-01-05      6     locationX
5:       Street 1  CityA     RA 2023-01-06      0     locationX
6:       Street 1  CityA     RA 2023-01-07      6     locationX
7:       Street 1  CityA     RA 2023-01-08      5     locationX

谢谢!

r data.table
1个回答
0
投票

使用以下代码:

fake_dt[, .(date = seq(date_range_start, by=1, length=7),
            visits = reticulate::py_eval(visits_by_day)), 
          .(street_address, city, region)]

    street_address  city region       date visits
 1:       Street 1 CityA     RA 2023-01-02      7
 2:       Street 1 CityA     RA 2023-01-03      7
 3:       Street 1 CityA     RA 2023-01-04      7
 4:       Street 1 CityA     RA 2023-01-05      6
 5:       Street 1 CityA     RA 2023-01-06      0
 6:       Street 1 CityA     RA 2023-01-07      6
 7:       Street 1 CityA     RA 2023-01-08      5
 8:       Street 2 CityB     RB 2023-01-02      7
 9:       Street 2 CityB     RB 2023-01-03      7
10:       Street 2 CityB     RB 2023-01-04      7
11:       Street 2 CityB     RB 2023-01-05      6
12:       Street 2 CityB     RB 2023-01-06      0
13:       Street 2 CityB     RB 2023-01-07      6
14:       Street 2 CityB     RB 2023-01-08      5
15:       Street 3 CityC     RC 2023-01-02      7
16:       Street 3 CityC     RC 2023-01-03      7
© www.soinside.com 2019 - 2024. All rights reserved.