我正在研究python中的K-means算法并且以直观的方式完成了这个代码,并且想要优化和重构它。
for i in range(N):
for j in range(K):
averages[i, j] = np.linalg.norm(trips[i] - centroids[j])**2
for i in range(N):
assigns[i] = int(np.argmin(averages[i]))
for i in range(K):
temp = np.zeros([F])
temp = np.expand_dims(temp, axis=0)
for j in range(N):
if(int(assigns[j]) == i):
temp = np.insert(temp, 0, trips[j], axis=0);
temp = temp[:-1, :]
if(temp.shape[0] > 0):
centroids[i] = temp.sum(axis=0) / temp.shape[0]
谢谢!
你可以使用列表理解,它应该加快一点点:
for i1 in range(N):
averages[i1] = [np.linalg.norm(trips[i1] - centroids[j])**2 for j in range(K)]
assigns = [int(np.argmin(averages[i2])) for i2 in range(N)]
for i3 in range(K):
temp = np.zeros([F])
temp = np.expand_dims(temp, axis=0)
for j in range(N):
if(int(assigns[j]) == i3):
temp = np.insert(temp, 0, trips[j], axis=0)
temp = temp[:-1, :]
if(temp.shape[0] > 0):
centroids[i3] = temp.sum(axis=0) / temp.shape[0]
我重命名了一些索引,所以我不知道我是否在公式中选择了正确的索引。无论如何,我不建议在嵌套循环中使用相同的索引,它可以创建很难找到问题。