我正在编写脚本以根据一些RNA-Seq数据绘制箱形图。
伪代码
1. Select a row based on gene name
2. make a column for each type of cell
3. make box plot
我有1个和3个向下
df2 = df[df[0].str.match("TCAP")]
????
import plotly.express as px
fig = px.box(df,x="CellType",y = "Expression",title = "GENE")
fig.show()
代码需要转换下表
Gene Celltype-1_#1 Celltype-1_#2 Celltype-1_#3 Celltype-2_#1 Celltype-2_#2 Celltype-2_#3
A 1 1 1 3 3 3
B 5 5 5 4 4 4
对此使用:df2 = df [df [0] .str.match(“ TCAP”)]
Gene Celltype-1_#1 Celltype-1_#2 Celltype-1_#3 Celltype-2_#1 Celltype-2_#2 Celltype-2_#3
A 1 1 1 3 3 3
然后我需要代码才能使其成为这个
Gene CellType Expression
A 1 1
A 1 1
A 1 1
A 2 3
A 2 3
A 2 3
您可以使用Pandas的stack
方法进行这种转换。
# need to have an index to make stack work
df = df.set_index('Gene')
# stack returns a series here
df = df.stack().to_frame().reset_index()
# At this point we have:
# Gene level_1 0
# 0 A Celltype-1_#1 1
# 1 A Celltype-1_#2 1
# 2 A Celltype-1_#3 1
# 3 A Celltype-2_#1 3
# 4 A Celltype-2_#2 3
# 5 A Celltype-2_#3 3
# 6 B Celltype-1_#1 5
# 7 B Celltype-1_#2 5
# 8 B Celltype-1_#3 5
# 9 B Celltype-2_#1 4
# 10 B Celltype-2_#2 4
# 11 B Celltype-2_#3 4
df.columns = ['Gene', 'Celltype', 'Expression']
# optionally rename values in celltype column
df['Celltype'] = df['Celltype'].apply(lambda t: t[9:10])
# now you can select by Gene or any other columns and pass to Plotly:
print(df[df['Gene'] == 'A'])
# Gene Celltype Expression
# 0 A 1 1
# 1 A 1 1
# 2 A 1 1
# 3 A 2 3
# 4 A 2 3
# 5 A 2 3
请注意,通过预先堆叠整个数据框,现在很容易一次选择多个基因,然后将它们一起传递给Plotly:
df_many = df[df['Gene'].isin(['A', 'B'])]