我得到scipy.sparse.csr_matrix
对象,其大小为full_sites_sparse.shape
(336358, 48371)
。我也有尺寸为union_df[hour].shape
(336358, 17)
的数据框。我想在那里合并并获得另一个(336358, 48388)
尺寸,我尝试在其中合并:
full_sites_hour_sparse = np.hstack((full_sites_sparse.A, union_df[hour].values))
和
full_sites_hour_sparse = scipy.sparse.hstack(full_sites_sparse.A, union_df[hour].values)
但是两者都引发内存不足异常。还有其他方法吗?
from scipy.sparse import csr_matrix, hstack, coo_matrix, vstack
t2 = csr_matrix(union_df[hour].values)
diff_n_rows = full_sites_sparse.shape[0] - t2.shape[0]
Xb_new = vstack((t2, csr_matrix((diff_n_rows, t2.shape[1]))))
X = hstack((full_sites_sparse, Xb_new))