不规则间隔点的高斯和滤波器

问题描述 投票:9回答:3

例如,我有一组点(x,y)作为两个向量x,y。

from pylab import *
x = sorted(random(30))
y = random(30)
plot(x,y, 'o-')

enter image description here

现在我想用一个高斯来平滑这些数据,并且只在x轴上的某些点(有规律的间隔)上进行评估,比如说:

x_eval = linspace(0,1,11)

我得到的提示是这种方法被称为 "高斯和滤波器",但到目前为止,我还没有在numpyscipy中找到任何实现,尽管它乍一看像是一个标准问题.由于x值不是等距的,我不能使用scipy.ndimage.gaussian_filter1d.

通常这种平滑是通过furrier空间并与内核相乘来完成的,但我真的不知道这对不规则间距的数据是否可行。

谢谢你的任何想法

python numpy scipy gaussian smooth
3个回答
10
投票

这对于非常大的数据集来说是个大问题,但是你所要求的正确计算方法是这样的。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0) # for repeatability
x = np.random.rand(30)
x.sort()
y = np.random.rand(30)

x_eval = np.linspace(0, 1, 11)
sigma = 0.1

delta_x = x_eval[:, None] - x
weights = np.exp(-delta_x*delta_x / (2*sigma*sigma)) / (np.sqrt(2*np.pi) * sigma)
weights /= np.sum(weights, axis=1, keepdims=True)
y_eval = np.dot(weights, y)

plt.plot(x, y, 'bo-')
plt.plot(x_eval, y_eval, 'ro-')
plt.show()

enter image description here


3
投票

在回答之前我想说 这更像是一个DSP问题而不是编程问题...

...说到这里,有一个简单的两步解决你的问题。

第一步:对数据进行重新采样

所以为了说明这一点,我们可以创建一个不等量取样的随机数据集。

import numpy as np
x = np.cumsum(np.random.randint(0,100,100))
y = np.random.normal(0,1,size=100)

这样就得到了这样的结果:

Unevenly sampled data

我们可以用简单的线性插值法对数据重新取样。

nx = np.arange(x.max()) # choose new x axis sampling
ny = np.interp(nx,x,y) # generate y values for each x

这将我们的数据转换为:

Same data resampled evenly

第二步:

在这一阶段,您可以使用一些可用的工具,通过 scipy 对给定Σ值的数据进行高斯滤波。

import scipy.ndimage.filters as filters
fx = filters.gaussian_filter1d(ny,sigma=100)

将其与原始数据进行对比,我们得到:

Gaussian filter applied

滤波的选择 sigma 值决定了过滤器的宽度。


1
投票

根据 @Jaime 的回答,我写了一个函数,实现了这个功能,并增加了一些额外的文档和丢弃远离数据点的估计值的能力。

我认为可以通过引导法在这个估计值上获得置信区间,但我还没有这样做。

def gaussian_sum_smooth(xdata, ydata, xeval, sigma, null_thresh=0.6):
    """Apply gaussian sum filter to data.

    xdata, ydata : array
        Arrays of x- and y-coordinates of data. 
        Must be 1d and have the same length.

    xeval : array
        Array of x-coordinates at which to evaluate the smoothed result

    sigma : float
        Standard deviation of the Gaussian to apply to each data point
        Larger values yield a smoother curve.

    null_thresh : float
        For evaluation points far from data points, the estimate will be
        based on very little data. If the total weight is below this threshold,
        return np.nan at this location. Zero means always return an estimate.
        The default of 0.6 corresponds to approximately one sigma away 
        from the nearest datapoint.
    """
    # Distance between every combination of xdata and xeval
    # each row corresponds to a value in xeval
    # each col corresponds to a value in xdata
    delta_x = xeval[:, None] - xdata

    # Calculate weight of every value in delta_x using Gaussian
    # Maximum weight is 1.0 where delta_x is 0
    weights = np.exp(-0.5 * ((delta_x / sigma) ** 2))

    # Multiply each weight by every data point, and sum over data points
    smoothed = np.dot(weights, ydata)

    # Nullify the result when the total weight is below threshold
    # This happens at evaluation points far from any data
    # 1-sigma away from a data point has a weight of ~0.6
    nan_mask = weights.sum(1) < .6
    smoothed[nan_mask] = np.nan

    # Normalize by dividing by the total weight at each evaluation point
    # Nullification above avoids divide by zero warning shere
    smoothed = smoothed / weights.sum(1)


    return smoothed
© www.soinside.com 2019 - 2024. All rights reserved.