根据不同列中的条件在不同行中平均复制

问题描述 投票:0回答:1

我有以下最小示例,对于这些元素(在这种情况下,粒径),我希望从中平均不同条件(生物膜,SS_mgL,%MP)等的重复。如您所见,我已经对此进行了非常粗略的处理,但是我敢肯定,这样做的方法更加优雅。所有建议表示赞赏。

# Load the Pandas libraries with alias 'pd' 
import pandas as pd 
# Load the Numpy libraries with alias 'np'
import numpy as np
# Load the Matplotlib library pyplot with alias 'plt'
from matplotlib import pyplot as plt

#Load data from my public Github repository
url = 'https://raw.githubusercontent.com/matt-salter/public/master/test.csv'
df = pd.read_csv(url,sep=';')

# Define size arrays etc.
midpoint = np.array([1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5,9.5,10.5,11.5,12.5,13.5,14.5,15.5,16.5,58.5])

# Average size distribution for condition Biofilm=0, SS_mgL=10, %MP=0
size_dist = np.array([df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 1.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 2.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 3.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 4.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 5.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 6.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 7.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 8.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 9.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 10.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 11.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 12.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 13.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 14.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 15.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 16.5), 'dn/dlogDP'].mean(),
                 df.loc[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0) & (df['Midpoint'] == 58.5), 'dn/dlogDP'].mean()])

plt.semilogx(midpoint,size_dist)     
plt.xlim([1,100])     
plt.xlabel('Particle size ($\mu$m)')     
plt.ylabel('dn/dlog$_{Dp}$')  
python pandas dataframe
1个回答
0
投票

您可以使用groupby,然后使用agg功能来获得所需的输出:

grp_df = df[(df['Biofilm'] == 0) & (df['SS_mgL'] == 10) & (df['%MP'] == 0)]\
                        .groupby('Midpoint').agg({'dn/dlogDP':['mean']})

grp_df.columns = ['_'.join(clm).strip() for clm in grp_df.columns.values]
grp_df.reset_index(inplace=True)

size_dist = grp_df['dn/dlogDP_mean'].to_numpy()

如果需要在特定范围内,也可以预先放置Midpoint上的过滤器。

© www.soinside.com 2019 - 2024. All rights reserved.