code to reproduce this dataframe in R:
ID = c("0", "51", "7", "62", "1","10","5", "79", "62", "10","1","7")
mRNA = c("0", "0", "30", "1", "0", "14", "0", "1", "1", "16", "0", "0")
Centroid = c("d0","d0", "d0", "d0", "d1", "d1","d1", "d1", "d1", "d10", "d10", "d10")
df <- data.frame(ID,mRNA,Centroid)
我想重新格式化此数据,使其看起来像这样:
到目前为止,我已经尝试过:
r <- reshape(df, direction = "wide", idvar="Centroid", timevar="ID")
输出看起来接近我想要的,但是还没有到那里(我不希望它使用ID重命名列)。
我更喜欢R中的解决方案,但是如果您在python中有建议,我也可以尝试一下。任何帮助将不胜感激!
我们可以为每个ID
创建一个唯一的Centroid
列,然后以宽格式重新整理数据的形状。
library(dplyr)
df %>%
group_by(Centroid) %>%
mutate(ID = paste0("mRNA_", row_number())) %>%
tidyr::pivot_wider(names_from = ID, values_from = mRNA)
# Centroid mRNA_1 mRNA_2 mRNA_3 mRNA_4 mRNA_5
# <fct> <fct> <fct> <fct> <fct> <fct>
#1 d0 0 0 30 1 NA
#2 d1 0 14 0 1 1
#3 d10 16 0 0 NA NA
如果您在Python中曾经需要它,这是一个使用pandas的解决方案,它的数据结构类似于R,即DataFrame。
# setup
import pandas as pd
ID = [0, 51, 7, 62, 1, 10, 5, 79, 62, 10, 1, 7]
mRNA = [0, 0, 30, 1, 0, 14, 0, 1, 1, 16, 0, 0]
Centroid = ['d0', 'd0', 'd0', 'd0', 'd1', 'd1', 'd1', 'd1', 'd1', 'd10', 'd10', 'd10']
df = pd.DataFrame([ID,mRNA,Centroid])
df = df.transpose()
df.rename(columns={0:'ID',1:'mRNA',2:'Centroid'},inplace=True)
# transformation
df['mRNA_idx'] = 'mRNA_' + (df.groupby(['Centroid']).cumcount() + 1).astype(str)
df.pivot(index='Centroid',columns='mRNA_idx',values='mRNA')
# result
>>> df
mRNA_idx mRNA_1 mRNA_2 mRNA_3 mRNA_4 mRNA_5
Centroid
d0 0 0 30 1 NaN
d1 0 14 0 1 1
d10 16 0 0 NaN NaN