考虑以下数据:
data = np.array([[i for i in range(3)] for _ in range(9)])
print(data)
print(f'data has shape {data.shape}')
[[0 1 2]
[0 1 2]
[0 1 2]
[0 1 2]
[0 1 2]
[0 1 2]
[0 1 2]
[0 1 2]
[0 1 2]]
data has shape (9, 3)
还有一些参数,我们称之为
history
。 history 的功能是,它在第一个维度上堆叠history
许多数组[0 1 2]
。例如,考虑该过程的 1 次迭代 history=2
history = 2
data = np.array([[[0, 1, 2], [0, 1, 2]]])
print(f'data has now shape {data.shape}')
data has now shape (1, 2, 3)
现在,让我们考虑 2 次迭代:
history = 2
data = np.array([[[0, 1, 2], [0, 1, 2]],[[0, 1, 2], [0, 1, 2]]])
print(f'data has now shape {data.shape}')
data has now shape (2, 2, 3)
这个过程应该重复,直到数据被完全处理。这意味着,我们可能会在最后丢失一些数据,因为
data.shape[0]/history % 2 != 0
。
因此,history=2
的最终结果将是
([[[0, 1, 2],
[0, 1, 2]],
[[0, 1, 2],
[0, 1, 2]],
[[0, 1, 2],
[0, 1, 2]],
[[0, 1, 2],
[0, 1, 2]]])
如何做到高效?
如果我没理解错的话,你可以切片,然后重塑:
history = 2
out = data[:data.shape[0]//history*history].reshape((-1, history, data.shape[1]))
输出:
array([[[0, 1, 2],
[0, 1, 2]],
[[0, 1, 2],
[0, 1, 2]],
[[0, 1, 2],
[0, 1, 2]],
[[0, 1, 2],
[0, 1, 2]]])