需要将具有多个样本的df转换为方框图的pandas代码

问题描述 投票:0回答:1

我正在编写脚本以根据一些RNA-Seq数据绘制箱形图。

伪代码

1. Select a row based on gene name 
2. make a column for each type of cell 
3. make box plot 

我有1个和3个向下

df2 = df[df[0].str.match("TCAP")]
????
import plotly.express as px
fig = px.box(df,x="CellType",y = "Expression",title = "GENE")
fig.show()

代码需要转换下表

Gene    Celltype-1_#1  Celltype-1_#2  Celltype-1_#3  Celltype-2_#1  Celltype-2_#2  Celltype-2_#3

A          1                1              1              3              3            3
B          5                5              5              4              4            4

对此使用:df2 = df [df [0] .str.match(“ TCAP”)]

 Gene    Celltype-1_#1  Celltype-1_#2  Celltype-1_#3  Celltype-2_#1  Celltype-2_#2  Celltype-2_#3

    A          1                1              1              3              3            3

然后我需要代码才能使其成为这个

Gene  CellType   Expression    

 A       1           1

 A       1           1

 A       1           1

 A       2           3

 A       2           3

 A       2           3    
python pandas biopython
1个回答
0
投票

您可以使用Pandas的stack方法进行这种转换。

# need to have an index to make stack work
df = df.set_index('Gene')

# stack returns a series here
df = df.stack().to_frame().reset_index()

# At this point we have:
#     Gene        level_1  0
#  0     A  Celltype-1_#1  1
#  1     A  Celltype-1_#2  1
#  2     A  Celltype-1_#3  1
#  3     A  Celltype-2_#1  3
#  4     A  Celltype-2_#2  3
#  5     A  Celltype-2_#3  3
#  6     B  Celltype-1_#1  5
#  7     B  Celltype-1_#2  5
#  8     B  Celltype-1_#3  5
#  9     B  Celltype-2_#1  4
#  10    B  Celltype-2_#2  4
#  11    B  Celltype-2_#3  4

df.columns = ['Gene', 'Celltype', 'Expression']

# optionally rename values in celltype column
df['Celltype'] = df['Celltype'].apply(lambda t: t[9:10])

# now you can select by Gene or any other columns and pass to Plotly:
print(df[df['Gene'] == 'A'])

#     Gene Celltype  Expression
#  0     A        1           1
#  1     A        1           1
#  2     A        1           1
#  3     A        2           3
#  4     A        2           3
#  5     A        2           3

请注意,通过预先堆叠整个数据框,现在很容易一次选择多个基因,然后将它们一起传递给Plotly:

df_many = df[df['Gene'].isin(['A', 'B'])]
© www.soinside.com 2019 - 2024. All rights reserved.