从 csv 文件开始,我有一个多维 numpy 数组(尺寸:617 x 9),我只需要该 n x m 数组的一列。该列中存储了如下所示的顺序数据:
[0,0,0,0,0,0,620,625,622,710,658,2150,2142,2569,2600,21,24,30,45,32,14,1100,1119,1150 ...]
以或多或少的循环方式,它们代表彼此不太相似的一组值。我需要将这些值的平均值和标准差分配给一个列表。所以在具体的例子中我们将有(我只会计算平均值抱歉):
[0, 647.0, 2365.25, 27.67, 1126.33, ...]
我对Python不太熟练,所以我的想法是用第n个和下一个n+1之间存在200差异的值填充列表列表,然后使用它,例如:
[[0,0,0,0,0,0], [620,625,622,710,658], [2150,2142,2569,2600], [21,24,30,45,32,14], [1100,1119,1150] ...]
我的初学者代码是这样的:
import numpy as np
import os, time, argparse, matplotlib, glob, datetime, sys, math
path = "/path_to_file/file.csv"
data = np.loadtxt(data, delimiter=',', skiprows=1)
x = [[]]
n_entries = len(data[:,0])
count = 0
m = 0
for n in range(n_entries-1):
if (math.isclose(data[n,1], data[n+1,1], abs_tol = 200)):
x[count][m] = data[n,1]
m += 1
else:
count += 1
m = 0
不幸的是我得到这个输出:
Traceback (most recent call last):
File "/path_to_python_file/file.py", line 49, in <module>
x[count][m] = data[n,1]
IndexError: list assignment index out of range
首先,我很想对这个错误有一个解释,尽管在网上搜索,我感觉我必须之前初始化列表......但实际上我原则上并不真正知道它的尺寸。其次,如果有人感觉比我聪明,我将不胜感激任何有关其他方法的建议!
提前谢谢大家!
谢谢巴马尔!你的建议很有启发!我将发布答案,以防将来有人很快需要它!当然会有一种更优雅的方法来做到这一点,但效果很好!
import statistics as st
import numpy as np
import math
def cyclic_values_finder(array,value):
m_crutch = []
m = 0
for n in range(n_entries-1):
if not (math.isclose(array[n,value], array[n+1,value], abs_tol = 200)):
m_crutch.append(n+1)
m_crutch.append(n_entries)
return m_crutch
def get_array_values(array, value):
arr = []
for ii in range(n_entries):
arr.append(array[ii,value])
return arr
n = 1 # whatever columns of data you'll need to process
path = "/path_to_file/file.csv"
data = np.loadtxt(path, delimiter=',', skiprows=1)
data_needed = get_array_values(data,n)
n_rows = np.array(cyclic_values_finder(data,n))
splitted_data = np.split(data_needed)
avg = []
stdev = []
for row in splitted_data :
avg.append(st.mean(row))
stdev.append(st.stdev(row))