我想根据特定的原理对大型数据框(大约14000个变量)的列进行排序。
列名具有以下结构(Condition_Sleepstage_Parameter_Electrode_Nightpart):
[1] "Adapt_N2_negLengthLoc_C3_firstHour" "Adapt_N3_negLengthLoc_C3_firstHour"
[3] "Adapt_NREM_negLengthLoc_C3_firstHour" "Book_N2_negLengthLoc_C3_firstHour"
[5] "Book_N3_negLengthLoc_C3_firstHour" "Book_NREM_negLengthLoc_C3_firstHour"
R具有以纯字母顺序排列的列,但是我希望基于以下系统以逻辑顺序排列它们:
首先,变量应以块形式显示每个参数。 (顺序:“ negLengthLoc”,“ posLength”,“ wholeLength”,“ negPeak”,“ nbnegPeaks”,“ initialMeannegSlope”,“ finalMeannegSlope”,“ initialMaxnegslope”,“ finalMaxnegslope”,“ posPeinit”,“ posPeinitMe”,“ nbposPeaks” ,“ finalMeanposSlope”,“ initialMaxposSlope”,“ PeaktoPeak”,“ Number”,“ Density”)
在这些块中,最高级别的层次应归因于条件。 (顺序:“ Adapt”,“ NoFilter”,“ Filter”,“ Book”)。
此后,下一个层次结构应由Electrode定义。 (顺序:“ F3”,“ Fz”,“ F4”,“ C3”,“ Cz”,“ C4”,“ P3”,“ Pz”,“ P4”,“ O1”,“ O2”)。
此后按Nightpart(顺序:“ firstHour”,“ firstQuarter”,“ secondQuarter”,“ thirdQuarter”,“ fourthQuarter”,“ wholeNight”),最后按Sleepstage(“顺序:“ N2”,“ N3”,“ NREM “)。
生成的订单应类似于:
[1] "Adapt_N2_negLengthLoc_F3_firstHour" "Adapt_N3_negLengthLoc_F3_firstHour"
[3] "Adapt_NREM_negLengthLoc_F3_firstHour" "Adapt_N2_negLengthLoc_F3_firstQuarter"
[5] "Adapt_N3_negLengthLoc_F3_firstQuarter" "Adapt_NREM_negLengthLoc_F3_firstQuarter"
[7] "Adapt_N2_negLengthLoc_F3_secondQuarter" "Adapt_N3_negLengthLoc_F3_secondQuarter"
[9] "Adapt_NREM_negLengthLoc_F3_secondQuarter" "Adapt_N2_negLengthLoc_F3_thirdQuarter"
[11] "Adapt_N3_negLengthLoc_F3_thirdQuarter" "Adapt_NREM_negLengthLoc_F3_thirdQuarter"
[13] "Adapt_N2_negLengthLoc_F3_fourthQuarter" "Adapt_N3_negLengthLoc_F3_fourthQuarter"
[15] "Adapt_NREM_negLengthLoc_F3_fourthQuarter" "Adapt_N2_negLengthLoc_F3_wholeNight"
[17] "Adapt_N3_negLengthLoc_F3_wholeNight" "Adapt_NREM_negLengthLoc_F3_wholeNight"
[19] "Adapt_N2_negLengthLoc_Fz_firstHour" "Adapt_N3_negLengthLoc_Fz_firstHour"
...
我希望有人可以帮助我,如果还有其他问题,我当然很乐意提供更多信息!
提前感谢!
以mtcars
数据为例,可以通过创建具有所需顺序的向量,然后以提取操作符的[
形式在列规范中使用此向量,来对数据帧中的列进行重新排序。
首先,我们将使用colnames()
提取列的原始顺序并打印它们
theNames <- colnames(mtcars)
theNames
> theNames
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
接下来,我们将通过创建reorderedNames
向量并将其与[
一起使用,将所有整数列移至数据框的左侧。
reorderedNames <- c("cyl" , "vs" , "am" , "gear" ,"carb","disp" ,
"drat", "wt" , "qsec", "mpg")
mtcars[,reorderedNames]
...和输出的前几行:
> mtcars[,reorderedNames]
cyl vs am gear carb disp drat wt qsec mpg
Mazda RX4 6 0 1 4 4 160.0 3.90 2.620 16.46 21.0
Mazda RX4 Wag 6 0 1 4 4 160.0 3.90 2.875 17.02 21.0
Datsun 710 4 1 1 4 1 108.0 3.85 2.320 18.61 22.8
Hornet 4 Drive 6 1 0 3 1 258.0 3.08 3.215 19.44 21.4
Hornet Sportabout 8 0 0 3 2 360.0 3.15 3.440 17.02 18.7
Valiant 6 1 0 3 1 225.0 2.76 3.460 20.22 18.1
Duster 360 8 0 0 3 4 360.0 3.21 3.570 15.84 14.3
Merc 240D 4 1 0 4 2 146.7 3.69 3.190 20.00 24.4
在OP中,问题引用了具有大量列的数据框。为了扩展此过程以自动进行列排序,至少有两种主要方法。
pivot_longer()
将列名称拆分为所需的分组变量来创建窄格式的整洁数据集。您必须将列名称拆分为它组成的不同部分。这是通过str_split
程序包中的stringr
完成的。它为每个列名生成一个带有条目的列表,每个条目都是具有不同部分的字符向量。要创建具有不同部分的新列,我使用map_chr
包中的purrr
访问每个列名称的对应条目。然后,排列各列。要获得所需的顺序,请将字符转换为factor
,然后用levels
指定顺序。列的新顺序由列rowid
表示:
old_order <- data.frame(col_names = c("Adapt_N2_negLengthLoc_C3_firstHour", "Adapt_N3_negLengthLoc_C3_firstHour",
"Adapt_NREM_negLengthLoc_C3_firstHour", "Book_N2_negLengthLoc_C3_firstHour",
"Book_N3_negLengthLoc_C3_firstHour", "Book_NREM_negLengthLoc_C3_firstHour",
"Adapt_N2_negLengthLoc_Fz_firstHour", "Adapt_N3_negLengthLoc_Fz_firstHour"))
library(dplyr)
library(stringr)
splitted_names <- str_split(old_order$col_names, "_")
new_order <- old_order %>%
tibble::rowid_to_column() %>%
mutate(Condition = purrr::map_chr(splitted_names, `[`, 1),
Sleepstage = purrr::map_chr(splitted_names, `[`, 2),
Parameter = purrr::map_chr(splitted_names, `[`, 3),
Electrode = purrr::map_chr(splitted_names, `[`, 4),
Nightpart = purrr::map_chr(splitted_names, `[`, 5)) %>%
arrange(factor(Parameter, levels = c("negLengthLoc", "posLength", "wholeLength", "negPeak", "nbnegPeaks", "initialMeannegSlope", "finalMeannegSlope", "initialMaxnegslope", "finalMaxnegslope", "posPeak", "nbposPeaks", "initialMeannposSlope", "finalMeanposSlope", "initialMaxposSlope", "PeaktoPeak", "Number", "Density")),
factor(Condition, levels = c("Adapt", "NoFilter", "Filter", "Book")),
factor(Electrode, levels = c("F3", "Fz", "F4", "C3", "Cz", "C4", "P3", "Pz", "P4", "O1", "O2")),
factor(Nightpart, levels = c("firstHour", "firstQuarter", "secondQuarter", "thirdQuarter", "fourthQuarter", "wholeNight")),
factor(Sleepstage, levels = c("N2", "N3", "NREM"))) %>%
pull(rowid)
old_order$col_names[new_order]
[1] Adapt_N2_negLengthLoc_Fz_firstHour Adapt_N3_negLengthLoc_Fz_firstHour Adapt_N2_negLengthLoc_C3_firstHour
[4] Adapt_N3_negLengthLoc_C3_firstHour Adapt_NREM_negLengthLoc_C3_firstHour Book_N2_negLengthLoc_C3_firstHour
[7] Book_N3_negLengthLoc_C3_firstHour Book_NREM_negLengthLoc_C3_firstHour
8 Levels: Adapt_N2_negLengthLoc_C3_firstHour ... Book_NREM_negLengthLoc_C3_firstHour