我正在使用scprep做一些单细胞RNA测序。我正在使用命令scprep.stats.differential_expression_by_cluster(data, clusters)
其中clusters是sk.learn kmeans的输出。
根据文档,输出为dict(pd.DataFrame)。
我的输出看起来像这样:
{0: difference rank
C1qb (ENSMUSG00000036905) 0.176254 0
C1qa (ENSMUSG00000036887) 0.145618 1
C1qc (ENSMUSG00000036896) 0.120607 2
Crybb1 (ENSMUSG00000029343) 0.105344 3
Tyrobp (ENSMUSG00000030579) 0.098916 4
... ... ...
mt-Co3 (ENSMUSG00000064358) -68.884323 16091
Malat1 (ENSMUSG00000092341) -77.371274 16092
Tuba1a (ENSMUSG00000072235) -91.835869 16093
Tmsb4x (ENSMUSG00000049775) -101.908864 16094
mt-Atp6 (ENSMUSG00000064357) -120.025289 16095
[16096 rows x 2 columns], 1: difference rank
Tmsb4x (ENSMUSG00000049775) 127.537848 0
Tuba1a (ENSMUSG00000072235) 91.644383 1
Tubb2b (ENSMUSG00000045136) 48.972048 2
mt-Atp6 (ENSMUSG00000064357) 41.105186 3
Stmn1 (ENSMUSG00000028832) 40.466334 4
... ... ...
Meg3 (ENSMUSG00000021268) -2.904875 16091
Hmgb2 (ENSMUSG00000054717) -4.784257 16092
Vim (ENSMUSG00000026728) -5.001676 16093
Dbi (ENSMUSG00000026385) -6.704505 16094
Fabp7 (ENSMUSG00000019874) -12.319859 16095
[16096 rows x 2 columns], 2: difference rank
Gria2 (ENSMUSG00000033981) 1.688701 0
Pou3f2 (ENSMUSG00000095139) 1.167767 1
Pou3f3 (ENSMUSG00000045515) 0.999804 2
Cldn5 (ENSMUSG00000041378) 0.971778 3
Robo2 (ENSMUSG00000052516) 0.877576 4
当我尝试pd.DataFrame.from_dict(dict)
时收到错误消息
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-383-630287ba17f3> in <module>
----> 1 df = pd.DataFrame.from_dict(diff)
~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in from_dict(cls, data, orient, dtype, columns)
1188 raise ValueError("only recognize index or columns for orient")
1189
-> 1190 return cls(data, index=index, columns=columns, dtype=dtype)
1191
1192 def to_numpy(self, dtype=None, copy=False):
~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
409 )
410 elif isinstance(data, dict):
--> 411 mgr = init_dict(data, index, columns, dtype=dtype)
412 elif isinstance(data, ma.MaskedArray):
413 import numpy.ma.mrecords as mrecords
~/anaconda/lib/python3.6/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
255 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
256 ]
--> 257 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
258
259
~/anaconda/lib/python3.6/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
75 # figure out the index, if necessary
76 if index is None:
---> 77 index = extract_index(arrays)
78 else:
79 index = ensure_index(index)
~/anaconda/lib/python3.6/site-packages/pandas/core/internals/construction.py in extract_index(data)
356
357 if not indexes and not raw_lengths:
--> 358 raise ValueError("If using all scalar values, you must pass an index")
359
360 if have_series:
ValueError: If using all scalar values, you must pass an index
我尝试了各种方法,例如pd.DataFrame.from_dict(dict, orient='index')
,这给了我以下输出结果>>
0 0 difference ran... 1 difference ran... 2 difference rank... 3 difference rank... 4 difference ran... 5 difference ran... 6 difference ran... 7 difference rank... 8 difference ran... 9 difference ran... 10 difference ran... 11 difference ran... 12 difference ran... 13 difference ran... 14 difference ran... 15 difference ran... 16 difference ran... 17 difference ran... 18 difference rank... 19 difference rank... 20 difference ran... 21 difference ran... 22 difference rank... 23 difference rank... 24 difference rank... 25 difference ran...
我想拥有26个不同的csv文件,这些文件的基因名称为行,'差异'和'排名'为列。
我查看了github上的原始代码,发现结果写成这样:
result = {cluster : differential_expression( select.select_rows(data, idx=clusters==cluster), select.select_rows(data, idx=clusters!=cluster), measure = measure, direction = direction, gene_names = gene_names, n_jobs = n_jobs) for cluster in np.unique(clusters)}
如何获得我想要的输出?
谢谢
我正在使用scprep做一些单细胞RNA测序。我正在使用命令scprep.stats.differential_expression_by_cluster(data,clusters),其中clusters是sk.learn kmeans的输出。 ...
您可以从字典中检索数据框并将其保存为excel文件: