在R中提取字符串，其中包含一个字符串

Question

ex02ChildrenInverse <- function(sentence) {
  
assertString(sentence)
  
matches <- regmatches(
    
    sentence,
    
    regexec('^(.*?) is the (father|mother) of "(.*?)"', sentence))[[1]]
  
parent <- matches[[2]]
  
male <- matches[[3]] == "father"
  
child <- matches[[4]]
  child <- gsub('".*"', '', matches[4])
  
return(list(parent = parent, male = male, child = child))
}

这是我的代码。我的问题是我想输出孩子的名字，即使他的名字中有双引号。 F.e:

输入：“Gudrun 是“Rosamunde (“Rosi”)”的母亲。'

我的输出：

$家长

[1]“古德伦”

$男

[1] 错误

$孩子

[1]“罗莎蒙德（”

但是我想要

$家长

[1]“古德伦”

$男

[1] 错误

$孩子

[1]“罗莎蒙德（“罗西”）”

我尝试了我的代码，但它没有像我想要的那样工作。

我想改变孩子<- gsub(.......)

Answer 1

全新代码方法是使用

gsub

和

grepl

来获取所需的相关信息，而不是

regmatches

:

freshCode <- function(sentence) {
  parent <- gsub("(\\w+).*", "\\1", sentence)
  male <- grepl("father", sentence)
  child <- gsub("\\.", "", substring(sentence, regexpr('"', sentence) + 1))
  list(parent = parent, male = male, child = child)
}

freshCode('Gudrun is the mother of "Rosamunde ("Rosi")".')

# $parent
# [1] "Gudrun"
# 
# $male
# [1] FALSE
# 
# $child
# [1] "Rosamunde (\"Rosi\")\""

# Note the "\" in the above are not truly "visible: 
# > cat(freshCode('Gudrun is the mother of "Rosamunde ("Rosi")".')[[3]])
# Rosamunde ("Rosi")"

或者稍微修改您现有的代码：

ex02ChildrenInverse <- function(sentence) {
  matches <- regmatches(
    sentence,
    regexec('^(.*?) is the (father|mother) of "(.*?)"', sentence))[[1]]
  parent <- matches[[2]]
  male <- matches[[3]] == "father"
  child <- gsub("\\.", "", substring(sentence, regexpr('"', sentence) + 1))
  
  return(list(parent = parent, male = male, child = child))
}

这将返回与上面相同的输出。

在R中提取字符串，其中包含一个字符串

问题描述投票：0回答：1

1个回答

最新问题

在R中提取字符串，其中包含一个字符串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1