如何在组合实验和控制数据上绘制 PCA 结果以测试它们是否分离

问题描述 投票:0回答:1

我有一个包含 90 个观察值的数据集 - 45 个实验和 45 个对照。我为每次观察收集了 10 个距离(米)变量。我想运行主成分分析 (PCA) 来查看实验观察值和控制观察值是否分离。这是之前对类似数据进行的方法。

我可以毫无问题地运行 PCA(按照教程here),并且我可以按颜色和省略号绘制实验/控制个体,但我不知道如何通过实验/控制拆分和绘制我的变量,我已经搜索并过度思考它到了脑雾的程度。我是一个*非常*新手 R 用户,当我应用它时,很多语法都是新的、令人困惑的并且充满错误。

我是否以错误的方式处理这件事?有简单的答案吗?我是否需要重新排列我的数据才能实现我想要的目标。

目前,我的数据按组名称(实验/对照)和变量排列为列,观察结果排列为行,如下所示(小样本):

       Group variable 1 variable 2 variable 3  variable 4 variable 5 variable 6 variable 7 variable 8
1        Ctrl   227.1758   76.33834  479.79328  900.431106   74.92103   78.69078 817.950938  853.15631
2        Ctrl   122.2748   82.85017  441.36049   94.211760   48.14546   42.43298 391.669754  397.64129
3        Ctrl   212.0073 1087.69218  310.09934  801.236328  762.45060  101.45367 148.865600  452.02212
4        Ctrl   165.2110  180.89114 1125.10707  287.599761   38.21226  110.05009 377.681321  178.84576
5        Ctrl   125.6233 1356.35936  752.14057    1.540822   17.06021  239.38640   4.906561  211.59177
6        Ctrl   240.0000  108.75317  126.99220  683.712745  139.54299  663.22566 274.265917 1225.14002
7        Ctrl   219.2393   17.81320  962.80249  744.238958  200.14079  455.19716 382.870502  937.11596
8        Ctrl   240.0000  751.95769  131.03213 1024.863454 1130.30304  136.19081 357.986240  863.35511
9        Ctrl   203.0863   80.83451  139.10481  770.567722  770.11240  212.89216  84.812646  131.88929
10 Experiment   192.0000  643.99000  729.90000  292.170000  129.04000  417.28000 366.020000  699.28000
11 Experiment   228.3302   62.68912  748.05168   12.536495   13.46899   63.25804  11.021662   62.62971
12 Experiment   226.3750  164.09029  131.15948  657.808968  387.28992  171.88133 656.338016  838.65025
13 Experiment   165.1418   75.74496 1400.75860 1729.237137  585.63204   65.72580  48.848643  688.00960
14 Experiment   222.7844  360.05409   51.39071 1019.845739 1018.10060  341.20432  31.046823  572.00411
15 Experiment   154.5468  533.66462  217.38821   74.902684  214.52490   76.90764  72.429564  236.32533
16 Experiment   130.0000 1173.69122  203.44864  684.127360  690.38973   51.80260  12.048432  383.40479
17 Experiment   213.1949   28.29785  843.76458  319.815834   24.22977  167.51248 302.743708  618.30222
18 Experiment   213.2566  530.85413  364.92104  425.524837   32.45679   28.45651  66.567557  427.69808
19 Experiment   145.9915  325.44247   65.40580  533.997851  100.40048  265.10440 553.048633  370.76282
   variable 9 variable 10
1   153.71433  632.975613
2    41.19583   48.973480
3   379.10343   20.407055
4   291.24420  716.283657
5  1621.15039 1169.221042
6   267.87993  302.452429
7   876.50519  807.668093
8   686.00076  146.134961
9  1392.94408  920.897862
10  800.95000  849.020000
11 1198.05713  932.001818
12 1100.65313  954.594713
13 1241.07884   67.022017
14  731.06865  178.861739
15  864.90849  112.641722
16  525.20077    1.332423
17  177.53370  672.354680
18  541.06775 1697.203881
19   68.16860  407.169531
r pca
1个回答
0
投票

使用

autoplot

library(ggplot2)
library(ggfortify)

res <- prcomp(df[,-1])

autoplot(res, x = 1, y = 2, data = df, color = 'Group')

数据

df <- structure(list(Group = c("Ctrl", "Ctrl", "Ctrl", "Ctrl", "Ctrl", 
"Ctrl", "Ctrl", "Ctrl", "Ctrl", "Experiment", "Experiment", "Experiment", 
"Experiment", "Experiment", "Experiment", "Experiment", "Experiment", 
"Experiment", "Experiment"), `variable 1` = c(227.1758, 122.2748, 
212.0073, 165.211, 125.6233, 240, 219.2393, 240, 203.0863, 192, 
228.3302, 226.375, 165.1418, 222.7844, 154.5468, 130, 213.1949, 
213.2566, 145.9915), `variable 2` = c(76.33834, 82.85017, 1087.69218, 
180.89114, 1356.35936, 108.75317, 17.8132, 751.95769, 80.83451, 
643.99, 62.68912, 164.09029, 75.74496, 360.05409, 533.66462, 
1173.69122, 28.29785, 530.85413, 325.44247), `variable 3` = c(479.79328, 
441.36049, 310.09934, 1125.10707, 752.14057, 126.9922, 962.80249, 
131.03213, 139.10481, 729.9, 748.05168, 131.15948, 1400.7586, 
51.39071, 217.38821, 203.44864, 843.76458, 364.92104, 65.4058
), `variable 4` = c(900.431106, 94.21176, 801.236328, 287.599761, 
1.540822, 683.712745, 744.238958, 1024.863454, 770.567722, 292.17, 
12.536495, 657.808968, 1729.237137, 1019.845739, 74.902684, 684.12736, 
319.815834, 425.524837, 533.997851), `variable 5` = c(74.92103, 
48.14546, 762.4506, 38.21226, 17.06021, 139.54299, 200.14079, 
1130.30304, 770.1124, 129.04, 13.46899, 387.28992, 585.63204, 
1018.1006, 214.5249, 690.38973, 24.22977, 32.45679, 100.40048
), `variable 6` = c(78.69078, 42.43298, 101.45367, 110.05009, 
239.3864, 663.22566, 455.19716, 136.19081, 212.89216, 417.28, 
63.25804, 171.88133, 65.7258, 341.20432, 76.90764, 51.8026, 167.51248, 
28.45651, 265.1044), `variable 7` = c(817.950938, 391.669754, 
148.8656, 377.681321, 4.906561, 274.265917, 382.870502, 357.98624, 
84.812646, 366.02, 11.021662, 656.338016, 48.848643, 31.046823, 
72.429564, 12.048432, 302.743708, 66.567557, 553.048633), `variable 8` = c(853.15631, 
397.64129, 452.02212, 178.84576, 211.59177, 1225.14002, 937.11596, 
863.35511, 131.88929, 699.28, 62.62971, 838.65025, 688.0096, 
572.00411, 236.32533, 383.40479, 618.30222, 427.69808, 370.76282
), `variable 9` = c(153.71433, 41.19583, 379.10343, 291.2442, 
1621.15039, 267.87993, 876.50519, 686.00076, 1392.94408, 800.95, 
1198.05713, 1100.65313, 1241.07884, 731.06865, 864.90849, 525.20077, 
177.5337, 541.06775, 68.1686), `variable 10` = c(632.975613, 
48.97348, 20.407055, 716.283657, 1169.221042, 302.452429, 807.668093, 
146.134961, 920.897862, 849.02, 932.001818, 954.594713, 67.022017, 
178.861739, 112.641722, 1.332423, 672.35468, 1697.203881, 407.169531
)), class = "data.frame", row.names = c(NA, -19L))
© www.soinside.com 2019 - 2024. All rights reserved.