import pandas as pd
columns = ['S1', 'S2', 'S3', 'S4', 'S5']
df = pd.DataFrame({'Patient':['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p8', 'p10'],
'S1':[0.7, 0.3, 0.5, 0.8, 0.9, 0.1, 0.9, 0.2, 0.6, 0.3],
'S2':[0.2, 0.3, 0.5, 0.4, 0.9, 0.1, 0.9, 0.7, 0.4, 0.3],
'S3':[0.6, 0.3, 0.5, 0.8, 0.9, 0.8, 0.9, 0.3, 0.6, 0.3],
'S4':[0.2, 0.3, 0.7, 0.8, 0.9, 0.1, 0.9, 0.7, 0.3, 0.3 ],
'S5':[0.9, 0.8, 0.5, 0.8, 0.9, 0.7, 0.2, 0.7, 0.6, 0.3 ]})
# vectorized operations in data frame
# get the number of the cells that are >=0.5 for each column
arr1 = df[columns].ge(0.5).sum().to_numpy()
# get the sum the cells that are >=0.5 for each column
arr2 = df[df[columns]>=0.5][columns].sum().to_numpy()
print(arr1)
print(arr2)
如何获取 df 中每列的患者列表或一组患者,如下所示?
[('p1', 'p3', 'p4', 'p5', 'p7', 'p9'),
('p3', 'p5', 'p7', 'p8'),
('p1', 'p3', 'p4', 'p5', 'p6', 'p7', 'p9'),
(...),
(...)]
结果不是表格格式。在这种情况下,您可以只使用列表理解:
[df.Patient[df[col] >= 0.5].to_list() for col in columns]
#[['p1', 'p3', 'p4', 'p5', 'p7', 'p8'],
# ['p3', 'p5', 'p7', 'p8'],
# ['p1', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8'],
# ['p3', 'p4', 'p5', 'p7', 'p8'],
# ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p8', 'p8']]