我想使用prcomp
对前两列中具有重复因子,后跟数值向量的数据集进行PCA分析:
Genus1 Species1 6.320000 8.720000 6.420000
Genus2 Species2 8.430000 11.780000 4.490000
Genus2 Species2 8.310000 10.940000 4.180000
Genus3 Species3 9.290000 13.060000 5.990000
Genus3 Species3 8.960000 13.320000 6.36000
如何将这个数据集转换为正确的格式以与prcomp
一起运行,以使PC得分与原始数据集的顺序相同?
假设您的数据是:
x = structure(list(V1 = structure(c(1L, 2L, 2L, 3L, 3L), .Label = c("Genus1",
"Genus2", "Genus3"), class = "factor"), V2 = structure(c(1L,
2L, 2L, 3L, 3L), .Label = c("Species1", "Species2", "Species3"
), class = "factor"), V3 = c(6.32, 8.43, 8.31, 9.29, 8.96), V4 = c(8.72,
11.78, 10.94, 13.06, 13.32), V5 = c(6.42, 4.49, 4.18, 5.99, 6.36
)), class = "data.frame", row.names = c(NA, -5L))
无论如何,您都无法通过因素进行pca,因此:
pca = prcomp(x[,3:5])
pca_scores = cbind(x[,1:2],pca$x)
pca_scores
V1 V2 PC1 PC2 PC3
1 Genus1 Species1 -3.4571239 0.8812539 0.003197962
2 Genus2 Species2 0.2914003 -0.9790128 -0.165842662
3 Genus2 Species2 -0.4813849 -1.3641274 0.099844800
4 Genus3 Species3 1.8024971 0.5080058 0.199344981
5 Genus3 Species3 1.8446114 0.9538805 -0.136545080