我正在尝试使用 gseapy richr 对如下所示的基因名称列表运行富集分析:
0 RAB4B
1 TIGAR
2 RNF44
3 DNAH3
4 RPL23A
5 ARL8B
6 CALB2
7 MFSD3
8 PIGV
9 ZNF708
Name: 0, dtype: object
我正在使用以下代码:
# run enrichr
# if you are only intrested in dataframe that enrichr returned, please set no_plot=True
# list, dataframe, series inputs are supported
enr = gseapy.enrichr(gene_list = glist2,
gene_sets=['ARCHS4_Cell-lines', 'KEGG_2016','KEGG_2013', 'GO_Cellular_Component_2018', 'GO_Cellular_Component_AutoRIF', 'GO_Cellular_Component_AutoRIF_Predicted_zscore', 'GO_Molecular_Function_2018', 'GO_Molecular_Function_AutoRIF', 'GO_Molecular_Function_AutoRIF_Predicted_zscore'],
organism='Human', # don't forget to set organism to the one you desired! e.g. Yeast
description='test_name',
outdir='test/enrichr_kegg',
# no_plot=True,
cutoff=1 # test dataset, use lower value from range(0,1)
)
但是,我收到以下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Adjusted P-value'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-78-dad3e0840d86> in <module>
9 outdir='test/enrichr_kegg',
10 # no_plot=True,
---> 11 cutoff=1 # test dataset, use lower value from range(0,1)
12 )
~/venv/lib/python3.7/site-packages/gseapy/enrichr.py in enrichr(gene_list, gene_sets, organism, description, outdir, background, cutoff, format, figsize, top_term, no_plot, verbose)
500 # set organism
501 enr.set_organism()
--> 502 enr.run()
503
504 return enr
~/venv/lib/python3.7/site-packages/gseapy/enrichr.py in run(self)
418 top_term=self.__top_term, color='salmon',
419 title=self._gs,
--> 420 ofname=outfile.replace("txt", self.format))
421 if msg is not None : self._logger.warning(msg)
422 self._logger.info('Done.\n')
~/venv/lib/python3.7/site-packages/gseapy/plot.py in barplot(df, column, title, cutoff, top_term, figsize, color, ofname, **kwargs)
498 if colname in ['Adjusted P-value', 'P-value']:
499 # check if any values in `df[colname]` can't be coerced to floats
--> 500 can_be_coerced = df[colname].map(isfloat)
501 if np.sum(~can_be_coerced) > 0:
502 raise ValueError('some value in %s could not be typecast to `float`'%colname)
/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 'Adjusted P-value'
在计算调整后的 p 值之前,似乎一切都运行良好。另外,当我将基因名称插入 Biomart 等网站时,我会得到输入值的回报,但我不知道代码中调整后的 P 值出了什么问题。有人能指出我正确的方向吗?谢谢
您的基因列表中有多少个基因?我有同样的问题。我的基因列表大约有 22000 个基因。我只挑选了前 5000 个基因。然后问题就解决了。当然,您可以根据需要更改它。
这是我的代码:
import gseapy
enr_res = gseapy.enrichr(gene_list=glist[:5000],
organism='human',
gene_sets=['GO_Biological_Process_2018','KEGG_2019_Human','WikiPathways_2019_Human','GO_Biological_Process_2017b'],
description='pathway',
cutoff = 0.5)
我有一个类似的错误,它对我从 20000 个基因的列表中取出前 2000 个基因有用