在两个字符向量中截断和合并值

问题描述 投票:-1回答:1

我有一个特征向量V1

V1 <- c("377 Peninsula St. Ogden,UT","8532 West Lyme St. Chesterfield, 
VA","43 E. Hilltop Street Hilliard,OH","95 Newcastle St. 
Hendersonville,NC","7276 Rose St. Greenville,NC")

另一个矢量为V2

V2 <- c(84404,23832,43026,28792,27834)

现在我有这些条件:

1)在V1角色中打破24th中的每个项目:


a)如果第24个字符是comma,那么在那里打破字符串并且剩余应该被添加到V2中的相应字符串。例如V1有“377 Peninsula St. Ogden, UT”,其中我们在第24个索引处有逗号,因此我们需要在两个“377 Peninsula St. Ogden”“UT”中打破这个(记住逗号本身被省略)然后V1得到“377 Peninsula St. Ogden”部分并且剩余被添加到相应的PIN在V2因此在84404的“V2”成为“UT 84404

b)如果第24个字符是non-comma并且non-whitespaceV1中的逗号之前发现最后的空格,并且V1指数保持不变,剩下的就是V2。例如V1有“8532 West Lyme St. Chesterfield, VA”,其中我们在第24个索引处有“t”因此我们需要在“St.”之后将其从空白区域中打破,因此V1保持“8532 West Lyme St.”并且V2获得“Chesterfield, VA 23832”。


在操作结束时,我们应该:

V1 <- c("377 Peninsula St. Ogden","8532 West Lyme St.",...)
V2 <- c("UT 84404","Chesterfield, VA 23832")

编辑:

我尝试在V1上跟随函数来知道第24个字符是否是逗号:

unlist(lapply(lapply(V1, function(z){substr(z,24,24)}),function(y){y==","}))

返回:

真假,错误,错误

现在我已经解决了问题的一部分,我需要一种方法来根据上面的结果应用格式化逻辑。

即我想做:

unlist(lapply(lapply(V1, function(z){substr(z,24,24)}),function(y){if(y==","){something1} else if(y==" "){something2}else {something3}}))

这里有1/2/3来自上面的1a和1b。需要知道如何编写这个逻辑。

r vector sapply mapply
1个回答
1
投票

考虑使用ifelsesubstrregexpr的矢量化方法(即,没有应用循环):

newV1 <- ifelse(substr(V1, 24, 24) == ",",         # CONDITIONALLY CHECK 24TH CHARACTER
                substr(V1, 1, regexpr(",", V1)-1), # EXTRACT UNTIL 24TH CHARACTER
                substr(V1, 1, 
                       regexpr(" (?=[^ ]+$)", 
                               substr(V1, 1, 24), 
                               perl=TRUE)-1)     # EXTRACT UNTIL LAST SPACE BEFORE 24TH CHAR
                )
newV1
# [1] "377 Peninsula St. Ogden" "8532 West Lyme St."     
# [3] "43 E. Hilltop Street"    "95 Newcastle St."       
# [5] "7276 Rose St."        

newV2 <- paste(ifelse(substr(V1, 24, 24) == ",",   # CONDITIONALLY CHECK 24TH CHARACTER
               substr(V1, regexpr(",", V1)+1, 
                      nchar(V1)),                  # EXTRACT AFTER 24TH CHARACTER
               substr(V1, 
                      regexpr(" (?=[^ ]+$)", 
                              substr(V1, 1, 24), 
                              perl=TRUE)+1, 
                      nchar(V1))),               # EXTRACT AFTER LAST SPACE BEFORE 24TH CHAR
               V2)                               # PASTE V2 VECTOR ELEMENTWISE
newV2
# [1] "UT 84404"                "Chesterfield, VA 23832" 
# [3] "Hilliard,OH 43026"       "Hendersonville,NC 28792"
# [5] "Greenville,NC 27834"   

Rextester Demo

© www.soinside.com 2019 - 2024. All rights reserved.