R:为了既字母和数字与字符和数字值串的矢量

问题描述 投票:2回答:3

我有一个包含字符和数字值的字符串的向量。例如:

a=c("ILLUMINA:420:C2D7UACXX:1:1102:14591:91480","ILLUMINA:420:C2D7UACXX:1:1102:14592:3881","ILLUMINA:420:C2D7UACXX:1:1102:14592:37103","ILLUMINA:420:C2D7UACXX:1:1102:14592:37356")

我想订购的载体,这样的字符是字母顺序排序和数字数字。字符串的结构始终的格式:"ILLUMINA:420:C2D7UACXX:1:<number>:<number>:<number>",所以实际上该命令只适用于最后三个冒号分隔数字。

我也尝试mixedsort {gtools}但结果却是一样的使用sort

sort.int,其是:

> mixedsort(a)
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"

显然,正确的顺序应该是:

[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881" 
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"

是否有任何直接的解决办法?

r sorting vector
3个回答
3
投票

编辑彻底改变OP澄清后的溶液

您可以提取最后3种元素和秩序,与您共创data.frame:

dat = read.table(text=sub('.*:1:([0-9]+):([0-9]+):([0-9]+)','\\1|\\2|\\3',a),sep='|')
 dat
    V1    V2    V3
1 1102 14591 91480
2 1102 14592  3881
3 1102 14592 37103
4 1102 14592 37356

然后你为了使用3列:

 a[with(dat,order(V1,V2,V3))]
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881" 
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"

1
投票

gtools :: mixedsort你的情况确实工作,实际上是:

> a=c("ILLUMINA:420:C2D7UACXX:1:1102:14591:91480",
      "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881",
      "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103",
      "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356")
> 
> mixedsort(a)
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480"
[2] "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881" 
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[4] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"

我使用gtools_3.4.2和R-3.2.0


0
投票

这里有一个更快的解决方案:

fields.list = strsplit(a,split=":")
sort.dt = data.table(t(sapply(fields.list,function(x) as.numeric(c(x[5],x[6],x[7])))))
sorted.a = v[with(sort.dt,order(V1,V2,V3))]
> sorted.a
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"  "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[4] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"
© www.soinside.com 2019 - 2024. All rights reserved.