如何设计正确的代码以标准化基因表达值矩阵?

问题描述 投票:0回答:1

[计算PCA之前,我需要规范化我的数据。我有一个矩阵,其中的行名称代表疾病组(0代表对照,1代表溃疡性结肠炎,2代表克罗恩氏病)。其余数据代表基因表达值。

这是我的数据;

       structure(c(5.54e-05, 5.58e-06, 9.74e-05, 1.33e-06, 1.29e-05, 
       7.22e-06, 0.000215899, 3.6e-06, 0.000146724, 1.53e-05, 0.000913187, 
       1.9e-06, 0.007421464, 0.000648006, 5.1e-06, 6.15e-06, 4.73e-06, 
       0.000119899, 0.000884487, 0.000850632, 0.000236607, 7.36e-06, 
       8.48e-06, 2.63e-05, 0.001368493, 1.12e-05, 0.000177568, 0.006338532, 
       0.006162866, 0.040695132, 0.013255055, 0.033086619, 0.074158811, 
       0.004967497, 0.01247423, 0.043201417, 0.011470285, 0.038447751, 
       0.018825124, 0.027701807, 0.063373762, 0.005374513, 0.048876252, 
       0.009959848, 0.004434078, 0.004176856, 0.015288913, 0.060226053, 
       0.05128922, 0.006557554, 0.017460326, 0.007684784, 0.002107577, 
       0.005773192, 0.076186393, 0.037631043, 0.052159393, 0.012179365, 
       0.047199766, 0.022458838, 0.030261613, 0.00626629, 0.028664896, 
       0.02285845, 0.02801855, 0.017681676, 0.040563592, 0.029791175, 
       0.034778056, 0.019318473, 0.011847912, 0.009614177, 0.064027542, 
       0.035334149, 0.041638955, 0.056015014, 0.03304865, 0.017660205, 
       0.030187166, 0.057919531, 0.029990489, 0.000112884, 0.000920886, 
       0.001081748, 0.000195159, 0.001678445, 0.000171612, 0.000191702, 
       0.000560035, 0.000384056, 0.000454783, 0.000723385, 0.000203897, 
       0.000973337, 0.000822171, 0.000620526, 0.000260769, 0.000214607, 
       0.002077443, 0.00065843, 0.000403672, 0.000378651, 0.000409306, 
       0.001722587, 0.000213785, 0.000176643, 0.002022878, 0.001886929, 
       0.053029236, 0.022594965, 0.011967636, 0.026851113, 0.03773798, 
       0.031356268, 0.10410326, 0.063265216, 0.018028454, 0.116038001, 
       0.00572817, 0.053635968, 0.059126941, 0.011835241, 0.004639624, 
       0.014302911, 0.082948853, 0.015202238, 0.021295431, 0.043342, 
       0.008153675, 0.015613747, 0.043289609, 0.048834321, 0.019144763, 
       0.059809871, 0.006990685, 0.04082966, 0.02986135, 0.061405171, 
       0.006142619, 0.009767602, 0.035427993, 0.03729329, 0.01309739, 
       0.00221718, 0.040211393, 0.006303841, 0.030146612, 0.032033879, 
       0.024590398, 0.077991721, 0.017215666, 0.014731147, 0.04802582, 
       0.03168714, 0.03244771, 0.032278613, 0.017301885, 0.013450667, 
       0.040207755, 0.042669615, 0.03456749, 0.034631319, 1.93e-05, 
       4.72e-06, 5.41e-05, 0, 1.91e-05, 9.33e-07, 5.98e-06, 0, 1.05e-06, 
       4.1e-07, 7.72e-05, 4.07e-07, 0.000585154, 0.000246992, 7.86e-06, 
       3.13e-06, 2.14e-06, 7.56e-06, 9.29e-05, 0.000116024, 5.51e-05, 
       7.79e-06, 6.65e-06, 2.06e-06, 0.000104342, 4.16e-06, 1.27e-05, 
       0.000197502, 0.00015135, 0.000107306, 6.54e-05, 0.000225564, 
       0.000142631, 0.000168873, 3.5e-05, 0.000365242, 0.000174254, 
       0.000339327, 8.7e-05, 0.000136679, 0.000156634, 0.000224181, 
       0.000205305, 8.87e-05, 0.000305774, 0.000133615, 0.00015118, 
       0.000107229, 0.000162579, 0.000152249, 6.88e-05, 0.000113864, 
       0.000249258, 0.00024256, 0.00079296, 0.007640951, 0.004937327, 
       0.000422361, 0.000953513, 0.000951187, 0.000671306, 0.001106406, 
       0.002606568, 0.003006867, 0.001911646, 0.00135411, 0.012461738, 
       0.000434917, 0.00237646, 0.007857561, 0.000436889, 0.00048816, 
       0.000348146, 0.000931449, 0.000323974, 0.004945321, 0.000693845, 
       0.000479572, 0.000843415, 0.001419675, 0.001547478, 8.16e-05, 
       6.63e-05, 0.000101583, 3.08e-05, 0.000147039, 5.13e-05, 0.000109479, 
       2.39e-05, 0.000225475, 4.28e-05, 0.000230785, 2.1e-05, 0.0001356, 
       0.000124173, 0.000245128, 0.000275446, 3.18e-05, 0.00017516, 
       0.000180192, 0.000246669, 0.000378708, 4.35e-05, 0.000267824, 
       7.2e-05, 7.65e-05, 8.79e-05, 0.000130026, 0.000111462, 3.17e-05, 
       0.000200096, 3.12e-06, 8.75e-05, 3.11e-06, 6.89e-06, 0.000165936, 
       5.98e-05, 0.000201355, 5.92e-06, 2.57e-05, 2.53e-05, 3.27e-05, 
       0.000137446, 0.000134402, 5.86e-07, 3.9e-05, 0.018886909, 0.050343466, 
       4.15e-05, 1.67e-05, 0.000172614, 4.95e-05, 1.27e-05, 9.85e-05, 
       4.28e-05, 0.002708402, 0.003215586, 0.00457116, 0.001713549, 
       0.024353184, 0.006660748, 0.003198887, 0.003094386, 0.004789163, 
       0.002816955, 0.021587313, 0.002084725, 0.00378062, 0.021751495, 
       0.009097143, 0.012216225, 0.001125765, 0.013043534, 0.005514773, 
       0.008323962, 0.026898764, 0.002149135, 0.008021623, 0.006673567, 
       0.005391139, 0.018578559, 0.013786297, 0.00080595, 0.001289505, 
       0.002451416, 0.000234107, 0.001694733, 0.000288175, 0.002357478, 
       0.000856129, 0.00159752, 0.000117538, 0.000166581, 0.000367288, 
       0.001039841, 0.001779528, 0.000438092, 0.001012515, 0.000529936, 
       0.003193086, 0.002562702, 0.00277401, 0.003013136, 0.001349197, 
       0.001646296, 0.001114222, 0.001207882, 0.002804949, 0.000366419
       ), .Dim = c(27L, 13L), .Dimnames = list(c("2", "0", "0", "0", 
      "1", "0", "0", "1", "1", "1", "2", "0", "0", "1", "2", "2", "1", 
      "2", "2", "2", "2", "1", "1", "2", "2", "0", "0"), c("Gene1", 
      "Gene2", "Gene3", "Gene4", "Gene5", "Gene6", "Gene7", "Gene8", 
      "Gene9", "Gene10", "Gene11", "Gene12", "Gene13")))

真实的数据集要大得多,有194行和600个基因。我的数据中确实有0。我尝试使用插入符包和qqnorm进行对数转换(通过向数据点中添加一个非常小的数字,这我认为不是最好的)。这些方法均无法使数据符合正态分布,如通过某些列的直方图,Anderson-Darling测试或QQ图确定的那样。

我了解我可以通过Caret或bestNormalize包尝试Yeo Johnson方法,因为我的值为零。

到目前为止,我一直在尝试的代码是这个,并且数据点不会通过使用插入符号包的YeoJohnson进行转换。没有错误信息。

      require(caret)

      preProcessValues <- preProcess(data, method = "YeoJohnson")


      datanorm <- predict(preProcessValues, data)

datanorm值仍与原始数据相同。

我尝试过的其他方法将是qqnorm(data),当查看输出值时,该方法不会规范化。

更新;我从下面的评论中了解到Z评分规范化将无助于确保数据符合正态分布。

关于如何纠正上述问题的任何建议都将是有益的,因为这是我可以输入数据作为矩阵的唯一软件包。

bestNormalize包只能在单个列上使用(因为x必须是向量),并且理想情况下,我需要在值矩阵上进行转换。

       require(bestNormalize)

       values <- yeojohnson(data[, 1], standardize=T)
       normoutput <-predict(values)
r normalization
1个回答
0
投票

尽管您说过,bestNormalize必须只使用一个向量,但是您可以使用循环在每个列上单独运行bestNormalize,这可能就是您要寻找的。这是实现此目的的另一篇文章:

https://stackoverflow.com/a/60592624/12777743

© www.soinside.com 2019 - 2024. All rights reserved.