为进程中的跳过阶段添加行

问题描述 投票:2回答:2

我希望在R中获取一个数据框,并根据我在两个列V1和V2中看到的内容对其进行扩充。简而言之,我有S1-S6阶段,它们是字符串。

对于阶段中存在间隙的每一行,我需要添加行。看看下面的数据框,如果我在同一行看到“S 3”和“S 3”,我就不需要做任何事了。同样,如果我在同一行看到'S 3'和'S 4',我也不需要做任何事情。

例1

输入:

----------------------------------
|Var1               | V1   | V2  |    
----------------------------------
|0060a00000fUbAnAAK |'S 2' |'S 5'|
----------------------------------

输出:

----------------------------------
|Var1               | V1   | V2  |    
----------------------------------
|0060a00000fUbAnAAK |'S 2' |'S 3'|
----------------------------------
|0060a00000fUbAnAAK |'S 3' |'S 4'|
----------------------------------
|0060a00000fUbAnAAK |'S 4' |'S 5'|
----------------------------------

例2

输入:

----------------------------------
|Var1               | V1   | V2  |    
----------------------------------
|0060a00000fUbAnAAK |'S 5' |'S 3'|
----------------------------------

输出:

----------------------------------
|Var1               | V1   | V2  |    
----------------------------------
|0060a00000fUbAnAAK |'S 5' |'S 4'|
----------------------------------
|0060a00000fUbAnAAK |'S 4' |'S 3'|
----------------------------------
r dataframe analytics
2个回答
0
投票

使用tidyverse的想法是转换为长格式,将数字与S分开并完成序列。一旦我们有了这个,我们将列粘贴在一起(Svalues)并转换回宽格式。最后,我们采用V1的滞后变量,并删除NAs,即

library(tidyverse)

df %>% 
 gather(var, val, -1) %>% 
 separate(val, into = c('char', 'number'), sep = ' ') %>% 
 mutate(number = as.numeric(number)) %>% 
 complete(nesting(var, Var1, char), number = full_seq(min(number):max(number), 1)) %>%
 unite('V1_2', c('char', 'number'), sep = ' ') %>% 
 group_by(var) %>% 
 mutate(new = row_number()) %>% 
 spread(var, V1_2) %>% 
 mutate(V1 = lag(V1)) %>% 
 na.omit() %>% 
 select(-new)

这使,

# A tibble: 3 x 3
   Var1  V1    V2   
  <chr> <chr> <chr>
1 xxx   S 2   S 3  
2 xxx   S 3   S 4  
3 xxx   S 4   S 5 

0
投票

updated answer

此更新还考虑了减少的阶段

样本数据

library(data.table)
DT <- fread("Var1               | V1   | V2
  0060a00000fUbAnAAK |S 2 |S 5
  0060a00000fUbAnAAK_ |S 5 |S 3")

#                   Var1  V1  V2
# 1:  0060a00000fUbAnAAK S 2 S 5
# 2: 0060a00000fUbAnAAK_ S 5 S 3

#determine order of stages
DT[ as.numeric( gsub("[^0-9]", "", V2 ) ) < as.numeric( gsub("[^0-9]", "", V1 ) ), order := "desc" ]
DT[ is.na( order) , order := "asc" ]
#melt DT to long format
DT <- melt( DT, id.vars = c("Var1","order"), value.name = "stage")
#get stage as numeric and clean up unwanted columns
DT[, `:=`(stage = as.numeric( gsub("[^0-9]", "", stage)))]
#create new stages based on minimum and maximum stage per Var1-value
#use different methodes of ascending and descneding stages, then bind the rows together
rbind(
  DT[order == "asc", .( V1 = paste0( "S ", min(stage): (max(stage) - 1 ) ), 
                        V2 = paste0( "S ", (min(stage)+1):max(stage) ) ), by = .(Var1)],
  DT[order == "desc", .( V1 = paste0( "S ", max(stage): (min(stage) + 1 ) ), 
                         V2 = paste0( "S ", (max(stage)-1):min(stage) ) ), by = .(Var1)]
)

产量

#                   Var1  V1  V2
# 1:  0060a00000fUbAnAAK S 2 S 3
# 2:  0060a00000fUbAnAAK S 3 S 4
# 3:  0060a00000fUbAnAAK S 4 S 5
# 4: 0060a00000fUbAnAAK_ S 5 S 4
# 5: 0060a00000fUbAnAAK_ S 4 S 3

previous answer

`data.table` solution

**sample data**

    library(data.table)
    DT <- fread("Var1               | V1   | V2
      0060a00000fUbAnAAK |S 2 |S 5")

**code**

    #melt DT to long format
    DT <- melt( DT, id.vars = "Var1", value.name = "stage")
    #get stage as numeric and clean up unwanted columns
    DT[, `:=`(variable = NULL, stage = as.numeric( gsub("[^0-9]", "", stage)))]
    #create new stages based on minimum and maximum stage per Var1-value
    DT[, .( V1 = paste0( "S ", min(stage):(max(stage)-1) ),
            V2 = paste0( "S ", (min(stage)+1):max(stage) ) ), by = .(Var1)][]

**output**

    #                  Var1  V1  V2
    # 1: 0060a00000fUbAnAAK S 2 S 3
    # 2: 0060a00000fUbAnAAK S 3 S 4
    # 3: 0060a00000fUbAnAAK S 4 S 5
© www.soinside.com 2019 - 2024. All rights reserved.