我正在处理具有两个分隔符“*”和“|”的字符串,它们用于以下字符串:
"3\*4|2\*7.4|8\*3.2"
“*”前的数字表示频率,“*”后的浮点数或整数表示值。这些值频率对使用“|”分隔。
所以从
"3\*4|2\*7.4|8\*3.2"
,我想得到以下向量:
"4","4","4","7.4","7.4","3.2","3.2","3.2","3.2","3.2","3.2","3.2","3.2"
我想出了以下语法,没有错误和警告,但最终结果与预期不同:
strsplit("3*4|2*7.4|8*3.2", "[*|]") %>% #Split into a vector with two different separator characters
unlist %>% #strsplit returns a list, so let's unlist it
mapply(FUN = rep,
x = .[seq(from = 2, to = length(.), by = 2)], #these sequences mean even and odd index in this respect
times = .[seq(from = 1, to = length(.), by = 2)], #rep() flexibly accepts times argument also as string
USE.NAMES = FALSE) %>%
unlist #mapply returns a list, so let's unlist it
[1] "4" "4" "4" "7.4" "7.4" "7.4" "7.4" "3.2" "3.2" "4" "4" "4" "4" "4" "4" "4" "7.4" "7.4" "7.4" "7.4" "7.4" "7.4" "7.4" "7.4" "3.2" "3.2" "3.2"
如您所见,发生了一些奇怪的事情。 “4”重复了三次,这是正确的,但是“7.4”重复了四次(错误)等等。
这是怎么回事?
你可以分两步使用
lapply
:
"3*4|2*7.4|8*3.2" %>% strsplit("[|]") %>%
unlist %>%
strsplit("[*]") %>%
lapply(function(x) rep(x[2],x[1])) %>%
unlist
# [1] "4" "4" "4" "7.4" "7.4" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2"
您可以用
|
代替换行符,将数据读入数据框并将其传递给 rep()
。
do.call(rep,
read.delim(text = gsub("\\|", "\n", "3*4|2*7.4|8*3.2"),
sep = "*",
header = FALSE,
col.names = c("times", "x"))
)
[1] 4.0 4.0 4.0 7.4 7.4 3.2 3.2 3.2 3.2 3.2 3.2 3.2 3.2
1) 下面的一行匹配两个数字,并将它们作为单独的参数传递给使用公式表示法指定的匿名函数,返回函数的输出。输入
x
来自问题,并在最后的注释中明确定义。
library (gsubfn)
strapply(x, "([0-9]+)\\*([0-9.]+)", n + x ~ rep(x, as.numeric(n)))[[1]]
## [1] "4" "4" "4" "7.4" "7.4" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2"
如果我们有一个像
x
这样的字符串的字符向量,那么它也可以通过删除 [[1]]
来工作。在这种情况下,它将返回结果列表。
xx <- c(x, x)
strapply(xx, "([0-9]+)\\*([0-9.]+)", n + x ~ rep(x, as.numeric(n)))
## [[1]]
## [1] "4" "4" "4" "7.4" "7.4" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2"
##
## [[2]]
## [1] "4" "4" "4" "7.4" "7.4" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2"
2)另一种方法是分别提取重复次数和值,并将每个这样的向量传递给
rep
.
library(gsubfn)
rep(strapplyc(x, "\\*([0-9.]+)")[[1]],
strapply(x, "(\\d+)\\*", as.numeric)[[1]])
## [1] "4" "4" "4" "7.4" "7.4" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2" "3.2"
使用的输入是:
x <- "3*4|2*7.4|8*3.2"