在 pandas 数据帧的行上有效应用最大梯度

Question

我想对一系列值应用最大梯度（绝对值）。问题是每一行的值将取决于前一行的值。

如果 $f(t)$ 代表我的初始序列，并且

$g(t)$

是我的“平滑”序列，并且我想应用 $a>0$ 的最大（绝对值）梯度，我想要 $$g(t +1) = g(t) + 符号(f(t+1)-g(t))*min(a, |f(t+1)-g(t)|)$$

示例：我想从行到行应用绝对值的最大梯度 1，所以如果我有以下数据框：

我想获得：

   B    smoothed
0  0.0   0.0
1  1.5   1.0
2  2.3   2.0 
3  2     2.0
4  0.4   1.0

我可以使用“iterrows()”来做到这一点，但是有没有一种方法对于大型数据帧更有效？

Answer 1

这是一种适用于大型数据帧的标准方法：

import numpy as np
import pandas as pd

def smooth_series_with_gradient_limit(series, max_gradient):
    f = series.to_numpy()
    g = np.zeros_like(f)
    g[0] = f[0]  
    for t in range(1, len(f)):
        gradient = f[t] - g[t-1]
        g[t] = g[t-1] + np.sign(gradient) * min(max_gradient, np.abs(gradient))
    
    return pd.Series(g, index=series.index)

df = pd.DataFrame({
    'B': [0.0, 1.5, 2.3, 2, 0.4]
})

df['smoothed'] = smooth_series_with_gradient_limit(df['B'], 1)
print(df)

这给出了您的预期输出。为了展示效率，这里以 100 万条条目为例

import numpy as np
import pandas as pd
import time

def smooth_series_with_gradient_limit(series, max_gradient):
    f = series.to_numpy()
    g = np.zeros_like(f)
    g[0] = f[0]
    for t in range(1, len(f)):
        gradient = f[t] - g[t-1]
        g[t] = g[t-1] + np.sign(gradient) * min(max_gradient, np.abs(gradient))
    return pd.Series(g, index=series.index)

np.random.seed(0)
large_df = pd.DataFrame({
    'B': np.random.randn(1000000)  
})

start_time = time.time()
large_df['smoothed'] = smooth_series_with_gradient_limit(large_df['B'], 1)
end_time = time.time()

print(f"Execution time: {end_time - start_time:.2f} seconds")

哪个是

Execution time: 3.33 seconds

在 pandas 数据帧的行上有效应用最大梯度

问题描述投票：0回答：1

1个回答

最新问题

在 pandas 数据帧的行上有效应用最大梯度

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1