按R中字符串中最后一个数字排列的字符串列表

问题描述 投票:1回答:2

我有以下列表:

datalist <- c("20191107_1545_28.xlsx","20191108_1520_95.xlsx",""20191108_1104_99.xlsx"","20200127_1505_28.xlsx", "20200124_1505_41B.xlsx", "20200122_1505_1.xlsx", "20191102_1520_102.xlsx")

我要按最后一个数字排序,然后按第一个数字(日期)排序,所以看起来像:

“ 20200122_1505_1.xlsx”“ 20191107_1545_28.xlsx”“ 20200127_1505_28.xlsx”“ 20200124_1505_41B.xlsx”“ 20191108_1520_95.xlsx”“ 20191104_1106_99.xlsx”“ 20191102_1520_102.xlsx”

我一直在玩StrReverse,所以我可以正常订购它,但是不幸的是,它当然也可以反转数字。我试图先拆分字符串:

split=str_split(datalist, "_")

但是我不知道如何继续。我要订购的数字可以是1、2或3位数字,也可以包含B(如示例中所示)。有谁知道如何解决这一问题?预先感谢!

r sorting character
2个回答
1
投票

一个stringr选项可以是:

datalist[str_order(str_extract_all(datalist, "\\d+", simplify = TRUE)[, 3], numeric = TRUE)]

[1] "20200122_1505_1.xlsx"   "20191107_1545_28.xlsx"  "20200127_1505_28.xlsx" 
[4] "20200124_1505_41B.xlsx" "20191108_1520_95.xlsx"  "20191108_1104_99.xlsx" 
[7] "20191102_1520_102.xlsx"

或更灵活的选择:

datalist[str_order(sapply(str_extract_all(datalist, "\\d+"), tail, 1), numeric = TRUE)]

如果您确实要根据多个数字进行订购,请加上dplyr

bind_cols(datalist = datalist, 
          as.data.frame(str_extract_all(datalist, "\\d+", simplify = TRUE))) %>%
 mutate_at(vars(starts_with("V")), ~ as.numeric(as.character(.))) %>%
 arrange(V3, V1)

  datalist                     V1    V2    V3
  <chr>                     <dbl> <dbl> <dbl>
1 20200122_1505_1.xlsx   20200122  1505     1
2 20191107_1545_28.xlsx  20191107  1545    28
3 20200127_1505_28.xlsx  20200127  1505    28
4 20200124_1505_41B.xlsx 20200124  1505    41
5 20191108_1520_95.xlsx  20191108  1520    95
6 20191108_1104_99.xlsx  20191108  1104    99
7 20191102_1520_102.xlsx 20191102  1520   102

0
投票

我认为这可以解决问题。请注意,它仅按实际数字排序而忽略字母。它对最后一个数字末尾的字母不敏感,因为这就是数据的外观,但是可以修改正则表达式以适应任何需要。

library(data.table)
datalist <- c("20191107_1545_28.xlsx","20191108_1520_95.xlsx","20191108_1104_99.xlsx","20200127_1505_28.xlsx", "20200124_1505_41B.xlsx", "20200122_1505_1.xlsx", "20191102_1520_102.xlsx")


dt <- data.table('datalist' = datalist)
dt[, 'num1' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\1'))]
dt[, 'num2' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\3'))]
dt[, 'num3' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\5'))]

setkey(dt, num3, num1)
print(dt$datalist)

编辑:忘了强制转换为数字。固定。

© www.soinside.com 2019 - 2024. All rights reserved.