在Python中快速删除列表中的异常值？

Question

我有一个很长的时间和温度值的列表，它的结构如下。

list1 = [[1, 72], [2, 72], [3, 73], [4, 72], [5, 74], [6, 73], [7, 71], [8, 92], [9, 73]]

有些时间温度对是数据中不正确的峰值。例如，在时间8，它飙升到92度。我想消除这些温度值的突然跳动或骤降。

为了做到这一点，我写了下面的代码（我删除了不需要的东西，只复制了去除尖峰outliers的部分）。

outlierpercent = 3

for i in values:
    temperature = i[1]
    index = values.index(i)
    if index > 0:
        prevtemp = values[index-1][1]
        pctdiff = (temperature/prevtemp - 1) * 100
        if abs(pctdiff) > outlierpercent:
            outliers.append(i)

当这段代码工作时（我可以设置最小的百分比差异，以使它被认为是一个尖峰。outlierpercent)，这需要超长的时间(每个列表5-10分钟)。我的列表非常长（每个列表大约有500万个数据点），而且我有几百个列表。

我想知道是否有更快的方法可以做到这一点？我在这里主要担心的是时间。还有其他类似的问题，不过，对于这种结构的超长列表，他们的效率似乎不是很高，所以我不知道该怎么做! 谢谢！谢谢

Answer 1

outlierpercent = 3

for index in range(1, len(values)):
    temperature = values[index][1]
    prevtemp = values[index-1][1]

    pctdiff = (temperature/prevtemp - 1) * 100
    if abs(pctdiff) > outlierpercent:
        outliers.append(index)

随着时间的推移，这应该会好很多

更新。

只有第一个离群值被删除的问题是因为在我们删除一个离群值后，在下一次迭代中，我们将比较被删除的离群值的临时值（prevtemp = values[index-1][1]).

我相信你可以通过更好地处理之前的温度来避免这种情况。类似这样。

outlierpercent = 3
prevtemp = values[0][1]

for index in range(1, len(values)):
    temperature = values[index][1]

    pctdiff = (temperature/prevtemp - 1) * 100
    # outlier - add to list and don't update prev temp
    if abs(pctdiff) > outlierpercent:
        outliers.append(index)
    # valid temp, update prev temp
    else:
        prevtemp = values[index-1][1]

Answer 2

使用Numpy来加快计算速度

随着

values = [[1, 72], [2, 72], [3, 73], [4, 72], [5, 74], [6, 73], [7, 71], [8, 92], [9, 73]]

Numpy代码

# Convert list to Numpy array
a = np.array(values)

# Calculate absolute percent difference of temperature
b = np.diff(a[:, 1])*100/a[:-1, 1]

# List of outliers
outlier_indices = np.where(np.abs(b) > outlierpercent)
if outlier_indices:
  print(a[outlier_indices[0]+1])  # add one since b is is one short due to 
                                  # computing difference
 # Output: List of outliers same as original code
[[ 8 92]
 [ 9 73]]

Answer 3

这样就应该有两个列表，有效和离群值。

为了速度，我尽量减少数学运算。

请原谅任何错别字，这是键盘组成，未经测试。

lolim=None
outliers=[]
outlierpercent=3.0
lower_mult=(100.0-outlierpercent)/100.0
upper_mult=(100.0+outlierpercent)/100.0
for index,temp in values
    if lolim is None:
         valids=[[index,temp]]            # start the valid list
         lolim,hilim=[lower_mult,upper_mult]*temp  # create initial range
    else:
         if lolim <= temp <= hilim:
             valids.append([index,temp])               # new valid entry
             lolim,hilim=[lower_mult,upper_mult]*temp  # update range
         else:
             outliers.append([index,temp])             # save outliers, keep old range

在Python中快速删除列表中的异常值？

问题描述投票：1回答：1

1个回答

最新问题

在Python中快速删除列表中的异常值？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1