如何向量化pandas数据框中的操作?

问题描述 投票:0回答:1
import pandas as pd

columns = ['S1', 'S2', 'S3', 'S4', 'S5']

df = pd.DataFrame({'Patient':['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p8', 'p10'],
                   'S1':[0.7, 0.3, 0.5, 0.8, 0.9, 0.1, 0.9, 0.2, 0.6, 0.3],
                   'S2':[0.2, 0.3, 0.5, 0.4, 0.9, 0.1, 0.9, 0.7, 0.4, 0.3],
                   'S3':[0.6, 0.3, 0.5, 0.8, 0.9, 0.8, 0.9, 0.3, 0.6, 0.3],
                   'S4':[0.2, 0.3, 0.7, 0.8, 0.9, 0.1, 0.9, 0.7, 0.3, 0.3 ],
                   'S5':[0.9, 0.8, 0.5, 0.8, 0.9, 0.7, 0.2, 0.7, 0.6, 0.3 ]})

# vectorized operations in data frame

# get the number of the cells that are >=0.5 for each column
arr1 = df[columns].ge(0.5).sum().to_numpy()

# get the sum the cells that are >=0.5 for each column
arr2 = df[df[columns]>=0.5][columns].sum().to_numpy()

print(arr1)
print(arr2)

如何获取 df 中每列的患者列表或一组患者,如下所示?

[('p1', 'p3', 'p4', 'p5', 'p7', 'p9'), 
 ('p3', 'p5', 'p7', 'p8'), 
 ('p1', 'p3', 'p4', 'p5', 'p6', 'p7', 'p9'), 
 (...),
 (...)]
python pandas dataframe vectorization
1个回答
0
投票

结果不是表格格式。在这种情况下,您可以只使用列表理解:

[df.Patient[df[col] >= 0.5].to_list() for col in columns]

#[['p1', 'p3', 'p4', 'p5', 'p7', 'p8'],
# ['p3', 'p5', 'p7', 'p8'],
# ['p1', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8'],
# ['p3', 'p4', 'p5', 'p7', 'p8'],
# ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p8', 'p8']]
© www.soinside.com 2019 - 2024. All rights reserved.