在R中提取字符串,其中包含一个字符串

问题描述 投票:0回答:1
ex02ChildrenInverse <- function(sentence) {
  
assertString(sentence)
  
matches <- regmatches(
    
    sentence,
    
    regexec('^(.*?) is the (father|mother) of "(.*?)"', sentence))[[1]]
  
parent <- matches[[2]]
  
male <- matches[[3]] == "father"
  
child <- matches[[4]]
  child <- gsub('".*"', '', matches[4])
  
return(list(parent = parent, male = male, child = child))
}

这是我的代码。我的问题是我想输出孩子的名字,即使他的名字中有双引号。 F.e:

输入:“Gudrun 是“Rosamunde (“Rosi”)”的母亲。'

我的输出:

$家长

[1]“古德伦”

$男

[1] 错误

$孩子

[1]“罗莎蒙德(”

但是我想要

$家长

[1]“古德伦”

$男

[1] 错误

$孩子

[1]“罗莎蒙德(“罗西”)”

我尝试了我的代码,但它没有像我想要的那样工作。

我想改变孩子<- gsub(.......)

r regex string gsub
1个回答
0
投票

全新代码方法是使用

gsub
grepl
来获取所需的相关信息,而不是
regmatches
:

freshCode <- function(sentence) {
  parent <- gsub("(\\w+).*", "\\1", sentence)
  male <- grepl("father", sentence)
  child <- gsub("\\.", "", substring(sentence, regexpr('"', sentence) + 1))
  list(parent = parent, male = male, child = child)
}

freshCode('Gudrun is the mother of "Rosamunde ("Rosi")".')

# $parent
# [1] "Gudrun"
# 
# $male
# [1] FALSE
# 
# $child
# [1] "Rosamunde (\"Rosi\")\""

# Note the "\" in the above are not truly "visible: 
# > cat(freshCode('Gudrun is the mother of "Rosamunde ("Rosi")".')[[3]])
# Rosamunde ("Rosi")"

或者稍微修改您现有的代码:

ex02ChildrenInverse <- function(sentence) {
  matches <- regmatches(
    sentence,
    regexec('^(.*?) is the (father|mother) of "(.*?)"', sentence))[[1]]
  parent <- matches[[2]]
  male <- matches[[3]] == "father"
  child <- gsub("\\.", "", substring(sentence, regexpr('"', sentence) + 1))
  
  return(list(parent = parent, male = male, child = child))
}

这将返回与上面相同的输出。

© www.soinside.com 2019 - 2024. All rights reserved.