如何使numpy的存储效率更高?

问题描述 投票:0回答:1

我有这个python代码来计算不同点之间的坐标距离。

IDs,X,Y,Z
0-20,193.722,175.733,0.0998975
0-21,192.895,176.727,0.0998975
7-22,187.065,178.285,0.0998975
0-23,192.296,178.648,0.0998975
7-24,189.421,179.012,0.0998975
8-25,179.755,179.347,0.0998975
8-26,180.436,179.288,0.0998975
7-27,186.453,179.2,0.0998975
8-28,178.899,180.92,0.0998975

该代码运行完美,但是由于我现在拥有的坐标数量非常大(〜50000),因此我需要优化此代码,否则无法运行。有人可以建议我这样做可以提高内存效率吗?感谢您的任何建议。

#!/usr/bin/env python
import pandas as pd
import scipy.spatial as spsp

df_1 =pd.read_csv('Spots.csv', sep=',')
coords = df_1[['X', 'Y', 'Z']].to_numpy()
distances = spsp.distance_matrix(coords, coords)
df_1['dist'] = distances.tolist()

# CREATES columns d0, d1, d2, d3
dist_cols = df_1['IDs']
df_1[dist_cols] = df_1['dist'].apply(pd.Series)

df_1.to_csv("results_Spots.csv")
python pandas numpy scipy numpy-ndarray
1个回答
-1
投票

您正在代码中询问〜50000 x〜50000矩阵中的点对点距离。如果您真的想存储它,结果将是非常大的。矩阵密集,因为每个点到另一个点的距离都为正。我建议重新考虑您的业务需求。您是否真的需要预先计算所有这些点并将它们存储在磁盘上的文件中?有时最好是即时进行所需的计算。 scipy.spacial速度很快,甚至比读取预先计算的值慢得多。

© www.soinside.com 2019 - 2024. All rights reserved.