下面的代码用于读取csv文件并将输出写入csv文件。这段代码工作得很好。但是,当csv文件大小(行数)增加时,会出现错误。我尝试将Xms更改为512m,将Xmx更改为2024m,将XX:ReservedCodeCacheSize更改为480m。但是仍然出现内存错误。
Traceback (most recent call last):
File "/root/PycharmProjects/AppAct/statfile.py", line 5, in <module>
df = df.astype(float)
File "pandas/core/generic.py", line 5691, in astype
**kwargs)
File "pandas/core/internals/managers.py", line 531, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "pandas/core/internals/managers.py", line 402, in apply
bm._consolidate_inplace()
File "pandas/core/internals/managers.py", line 929, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "pandas/core/internals/managers.py", line 1899, in _consolidate
_can_consolidate=_can_consolidate)
File "pandas/core/internals/blocks.py", line 3149, in _merge_blocks
new_values = new_values[argsort]
MemoryError
import pandas as pd
all_df = pd.read_csv("/root/Desktop/Time-20ms/AllDataNew20ms.csv")
df = all_df.loc[:, all_df.columns != "activity"]
df = df.astype(float)
mask = (df != 0).any(axis=1)
df = df[mask]
recover_lines_of_activity_column = all_df["activity"][mask]
final_df = pd.concat([recover_lines_of_activity_column, df], axis=1)
final_df.to_csv("/root/Desktop/Dataset.csv", index=False)
更改您的PyCharm内存限制(如-Xms
和其他JVM设置一样,对实际运行您的Python代码的Python解释器绝对没有任何影响。
简单明了,将整个数据帧转换为浮点数(df = df.astype(float)
)时,系统内存不足。
除了更改代码以更有效地执行操作之外,还可以添加物理内存或启用交换。
(还可以确定您使用的是64位Python吗?]
一种简单的优化方法是减少复制和转换数据的工作–将dtype=...
直接传递到pd.read_csv()
。例如,请参见this answer。