我有一个示例数据框:
import pandas as pd
df = pd.DataFrame({
'A': [10, 18, 30, 40],
'B': [50, 20, 40, 6]
})
给定一个截止值,例如 25,如何将高于截止值的所有值替换为低于截止值列的最大值?
一个可行的解决方案:
import numpy as np
cutoff = 25
for col in df:
ceiling = df[col][ df[col] <= cutoff ].max()
df[col] = np.where(df[col] > cutoff, ceiling, df[col])
这给出了:
A B
0 10 20
1 18 20
2 18 20
3 18 6
我的实际数据框要大得多,因此对性能敏感。
DataFrame.clip
并通过 cutoff
和
DataFrame.where
获得 max
下的最大值:
cutoff = 25
out = df.clip(upper=df.where(df.le(cutoff)).max(), axis=1)
print (out)
A B
0 10 20
1 18 20
2 18 20
3 18 6
如何运作:
print (df.le(cutoff))
A B
0 True False
1 True True
2 False False
3 False True
print (df.where(df.le(cutoff)))
A B
0 10.0 NaN
1 18.0 20.0
2 NaN NaN
3 NaN 6.0
print (df.where(df.le(cutoff)).max())
A 18.0
B 20.0
dtype: float64
import pandas as pd
import numpy as np
# Sample dataframe
df = pd.DataFrame({
'A': [10, 18, 30, 40],
'B': [50, 20, 40, 6]
})
# Cutoff value
cutoff = 25
# Replace values above cutoff with max values below cutoff column-wise
for col in df.columns:
mask = df[col] > cutoff
max_below_cutoff = np.max(df.loc[~mask, col])
df.loc[mask, col] = max_below_cutoff
print(df)