在数据表中,基于其他几个列,将某些值替换为同一列中的其他值,无需循环

问题描述 投票:0回答:1

我有一个很大的数据表,其中一部分看起来像这样(有更多的列和数千行):

                stop_id   path changed_event_status changed_time
                  <i64> <char>               <char>        <i64>
 1: 4398037956893976209      S                 <NA>   2405071040
 2: 1500925206899141237     RT                 <NA>   2405071041
 3: 2333532852925690131      S                 <NA>   2405071105
 4: 4636036529075799544     TÜ                 <NA>   2405071044
 5: 4680830034956468939      S                 <NA>   2405071046
 6: 7584560746915960683      S                    c   2405071049  <- 1a: replace 2405071049
 7: 2333532852925690131     RT                 <NA>   2405071116
 8: 4747322524233582527      S                 <NA>   2405071100  <- 1b: with 2405071100
 9:  285273127640529713      S                 <NA>   2405071103
10: 6134967434625106066      S                 <NA>   2405071101
11: 3684003552999415659     RT                 <NA>   2405071103  <- 2b: with 2405071103
12: 7584560746915960683     RT                    c   2405071058  <- 2a: replace 2405071058
13: 4680830034956468939     TÜ                 <NA>   2405071103
14: 8123621717351038368      S                 <NA>   2405071113
15: 8702942397103782624     TÜ                 <NA>   2405071114
16: 6134967434625106066     TÜ                 <NA>   2405071114
17: 4138386908727054325      S                 <NA>   2405071115
18:  285273127640529713     RT                 <NA>   2405071123
19: 2445758245483744446      S                 <NA>   2405071119
20: 8153934371487726263     TÜ                 <NA>   2405071132
21: 4138386908727054325     RT                 <NA>   2405071126
22:  310332233182112225      S                 <NA>   2405071127
                stop_id   path changed_event_status changed_time

对于其中

changed_event_status == c
的每一行,在具有相同
path
的所有行中,我需要找到
changed_time
列中等于或大于当前行中的最小值。例如,我需要将第 6 行中的
2405071049
替换为第 8 行中的
2405071100
,将第 12 行中的
2405071058
替换为第 11 行中的
2405071103

我可以使用

foreach()
循环来做到这一点,按
path
子集,按
changed_time
排序,然后找到等于或大于当前行的第一个值,但我想知道是否有更快的解决方案,而无需循环。

我查看了一些相关问题,但发现自己无法将它们适应我的具体情况。

样本数据:

library(data.table)
bahn <- fread("
stop_id path changed_event_status changed_time
4398037956893976209 S NA 2405071040
1500925206899141237 RT NA 2405071041
2333532852925690131 S NA 2405071105
4636036529075799544 TÜ NA 2405071044
4680830034956468939 S NA 2405071046
7584560746915960683 S c 2405071049
2333532852925690131 RT NA 2405071116
4747322524233582527 S NA 2405071100
285273127640529713 S NA 2405071103
6134967434625106066 S NA 2405071101
3684003552999415659 RT NA 2405071103
7584560746915960683 RT c 2405071058
4680830034956468939 TÜ NA 2405071103
8123621717351038368 S NA 2405071113
8702942397103782624 TÜ NA 2405071114
6134967434625106066 TÜ NA 2405071114
4138386908727054325 S NA 2405071115
285273127640529713 RT NA 2405071123
2445758245483744446 S NA 2405071119
8153934371487726263 TÜ NA 2405071132
4138386908727054325 RT NA 2405071126
310332233182112225 S NA 2405071127
")

循环解法:

for(i in 1:nrow(bahn)) {
    if(!is.na(bahn[i, changed_event_status]) & bahn[i, changed_event_status] == "c") {
        bahn[i, ]$changed_time <- sort(
                                       bahn[
                                            is.na(changed_event_status)
                                            &
                                            changed_time >= bahn[i, changed_time]
                                            &
                                            path == bahn[i, path]
                                           ]$changed_time
                                      )[1]
    }
}
r data.table
1个回答
0
投票

如果您不介意重新排列表格:

setorder(bahn, path, changed_time, -changed_event_status)[
  changed_event_status == "c", changed_time := NA
]
setnafill(bahn, "nocb", cols = "changed_time")
bahn
#> Index: <changed_event_status>
#>                 stop_id   path changed_event_status changed_time
#>                   <i64> <char>               <char>        <i64>
#>  1: 1500925206899141237     RT                 <NA>   2405071041
#>  2: 7584560746915960683     RT                    c   2405071103
#>  3: 3684003552999415659     RT                 <NA>   2405071103
#>  4: 2333532852925690131     RT                 <NA>   2405071116
#>  5:  285273127640529713     RT                 <NA>   2405071123
#>  6: 4138386908727054325     RT                 <NA>   2405071126
#>  7: 4398037956893976209      S                 <NA>   2405071040
#>  8: 4680830034956468939      S                 <NA>   2405071046
#>  9: 7584560746915960683      S                    c   2405071100
#> 10: 4747322524233582527      S                 <NA>   2405071100
#> 11: 6134967434625106066      S                 <NA>   2405071101
#> 12:  285273127640529713      S                 <NA>   2405071103
#> 13: 2333532852925690131      S                 <NA>   2405071105
#> 14: 8123621717351038368      S                 <NA>   2405071113
#> 15: 4138386908727054325      S                 <NA>   2405071115
#> 16: 2445758245483744446      S                 <NA>   2405071119
#> 17:  310332233182112225      S                 <NA>   2405071127
#> 18: 4636036529075799544     TÜ                 <NA>   2405071044
#> 19: 4680830034956468939     TÜ                 <NA>   2405071103
#> 20: 8702942397103782624     TÜ                 <NA>   2405071114
#> 21: 6134967434625106066     TÜ                 <NA>   2405071114
#> 22: 8153934371487726263     TÜ                 <NA>   2405071132
#>                 stop_id   path changed_event_status changed_time
© www.soinside.com 2019 - 2024. All rights reserved.