使用 GSEAPY 进行富集分析

问题描述 投票:0回答:2

我正在尝试使用 gseapy richr 对如下所示的基因名称列表运行富集分析:

0     RAB4B
1     TIGAR
2     RNF44
3     DNAH3
4    RPL23A
5     ARL8B
6     CALB2
7     MFSD3
8      PIGV
9    ZNF708
Name: 0, dtype: object

我正在使用以下代码:

# run enrichr
# if you are only intrested in dataframe that enrichr returned, please set no_plot=True

# list, dataframe, series inputs are supported
enr = gseapy.enrichr(gene_list = glist2,
                 gene_sets=['ARCHS4_Cell-lines', 'KEGG_2016','KEGG_2013', 'GO_Cellular_Component_2018', 'GO_Cellular_Component_AutoRIF', 'GO_Cellular_Component_AutoRIF_Predicted_zscore', 'GO_Molecular_Function_2018', 'GO_Molecular_Function_AutoRIF', 'GO_Molecular_Function_AutoRIF_Predicted_zscore'],
                 organism='Human', # don't forget to set organism to the one you desired! e.g. Yeast
                 description='test_name',
                 outdir='test/enrichr_kegg',
                 # no_plot=True,
                 cutoff=1 # test dataset, use lower value from range(0,1)
                )

但是,我收到以下错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Adjusted P-value'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-78-dad3e0840d86> in <module>
      9                  outdir='test/enrichr_kegg',
     10                  # no_plot=True,
---> 11                  cutoff=1 # test dataset, use lower value from range(0,1)
     12                 )

~/venv/lib/python3.7/site-packages/gseapy/enrichr.py in enrichr(gene_list, gene_sets, organism, description, outdir, background, cutoff, format, figsize, top_term, no_plot, verbose)
    500     # set organism
    501     enr.set_organism()
--> 502     enr.run()
    503 
    504     return enr

~/venv/lib/python3.7/site-packages/gseapy/enrichr.py in run(self)
    418                               top_term=self.__top_term, color='salmon',
    419                               title=self._gs,
--> 420                               ofname=outfile.replace("txt", self.format))
    421                 if msg is not None : self._logger.warning(msg)
    422             self._logger.info('Done.\n')

~/venv/lib/python3.7/site-packages/gseapy/plot.py in barplot(df, column, title, cutoff, top_term, figsize, color, ofname, **kwargs)
    498     if colname in ['Adjusted P-value', 'P-value']:
    499         # check if any values in `df[colname]` can't be coerced to floats
--> 500         can_be_coerced = df[colname].map(isfloat)
    501         if np.sum(~can_be_coerced) > 0:
    502             raise ValueError('some value in %s could not be typecast to `float`'%colname)

/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'Adjusted P-value'

在计算调整后的 p 值之前,似乎一切都运行良好。另外,当我将基因名称插入 Biomart 等网站时,我会得到输入值的回报,但我不知道代码中调整后的 P 值出了什么问题。有人能指出我正确的方向吗?谢谢

python pandas bioinformatics
2个回答
0
投票

您的基因列表中有多少个基因?我有同样的问题。我的基因列表大约有 22000 个基因。我只挑选了前 5000 个基因。然后问题就解决了。当然,您可以根据需要更改它。

这是我的代码:

import gseapy

enr_res = gseapy.enrichr(gene_list=glist[:5000],
                         organism='human',
                         gene_sets=['GO_Biological_Process_2018','KEGG_2019_Human','WikiPathways_2019_Human','GO_Biological_Process_2017b'],
                         description='pathway',
                         cutoff = 0.5)

0
投票

我有一个类似的错误,它对我从 20000 个基因的列表中取出前 2000 个基因有用

© www.soinside.com 2019 - 2024. All rights reserved.