我有以下列表:
datalist <- c("20191107_1545_28.xlsx","20191108_1520_95.xlsx",""20191108_1104_99.xlsx"","20200127_1505_28.xlsx", "20200124_1505_41B.xlsx", "20200122_1505_1.xlsx", "20191102_1520_102.xlsx")
我要按最后一个数字排序,然后按第一个数字(日期)排序,所以看起来像:
“ 20200122_1505_1.xlsx”“ 20191107_1545_28.xlsx”“ 20200127_1505_28.xlsx”“ 20200124_1505_41B.xlsx”“ 20191108_1520_95.xlsx”“ 20191104_1106_99.xlsx”“ 20191102_1520_102.xlsx”
我一直在玩StrReverse,所以我可以正常订购它,但是不幸的是,它当然也可以反转数字。我试图先拆分字符串:
split=str_split(datalist, "_")
但是我不知道如何继续。我要订购的数字可以是1、2或3位数字,也可以包含B(如示例中所示)。有谁知道如何解决这一问题?预先感谢!
一个stringr
选项可以是:
datalist[str_order(str_extract_all(datalist, "\\d+", simplify = TRUE)[, 3], numeric = TRUE)]
[1] "20200122_1505_1.xlsx" "20191107_1545_28.xlsx" "20200127_1505_28.xlsx"
[4] "20200124_1505_41B.xlsx" "20191108_1520_95.xlsx" "20191108_1104_99.xlsx"
[7] "20191102_1520_102.xlsx"
或更灵活的选择:
datalist[str_order(sapply(str_extract_all(datalist, "\\d+"), tail, 1), numeric = TRUE)]
如果您确实要根据多个数字进行订购,请加上dplyr
:
bind_cols(datalist = datalist,
as.data.frame(str_extract_all(datalist, "\\d+", simplify = TRUE))) %>%
mutate_at(vars(starts_with("V")), ~ as.numeric(as.character(.))) %>%
arrange(V3, V1)
datalist V1 V2 V3
<chr> <dbl> <dbl> <dbl>
1 20200122_1505_1.xlsx 20200122 1505 1
2 20191107_1545_28.xlsx 20191107 1545 28
3 20200127_1505_28.xlsx 20200127 1505 28
4 20200124_1505_41B.xlsx 20200124 1505 41
5 20191108_1520_95.xlsx 20191108 1520 95
6 20191108_1104_99.xlsx 20191108 1104 99
7 20191102_1520_102.xlsx 20191102 1520 102
我认为这可以解决问题。请注意,它仅按实际数字排序而忽略字母。它对最后一个数字末尾的字母不敏感,因为这就是数据的外观,但是可以修改正则表达式以适应任何需要。
library(data.table)
datalist <- c("20191107_1545_28.xlsx","20191108_1520_95.xlsx","20191108_1104_99.xlsx","20200127_1505_28.xlsx", "20200124_1505_41B.xlsx", "20200122_1505_1.xlsx", "20191102_1520_102.xlsx")
dt <- data.table('datalist' = datalist)
dt[, 'num1' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\1'))]
dt[, 'num2' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\3'))]
dt[, 'num3' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\5'))]
setkey(dt, num3, num1)
print(dt$datalist)
编辑:忘了强制转换为数字。固定。