我有一个 25GB 的 numpy 数组字典。 字典如下所示:
"109c3708-3b0c-4868-a647-b9feb306c886_1"
200x23
、类型为 float64
当我在循环中重复使用 pickle 加载数据时,加载时间会变慢(请参阅下面的代码和结果)。这可能是什么原因造成的?
代码:
def load_pickle(file: int) -> dict:
with open(f"D:/data/batched/{file}.pickle", "rb") as handle:
return pickle.load(handle)
for i in range(0, 9):
print(f"\nIteration {i}")
start_time = time.time()
file = None
print(f"Unloaded file in {time.time() - start_time:.2f} seconds")
start_time = time.time()
file = load_pickle(0)
print(f"Loaded file in {time.time() - start_time:.2f} seconds")
结果:
Iteration 0
Unloaded file in 0.00 seconds
Loaded file in 18.80 seconds
Iteration 1
Unloaded file in 14.78 seconds
Loaded file in 30.51 seconds
Iteration 2
Unloaded file in 28.67 seconds
Loaded file in 30.21 seconds
Iteration 3
Unloaded file in 35.38 seconds
Loaded file in 40.25 seconds
Iteration 4
Unloaded file in 39.91 seconds
Loaded file in 41.24 seconds
Iteration 5
Unloaded file in 43.25 seconds
Loaded file in 45.57 seconds
Iteration 6
Unloaded file in 46.94 seconds
Loaded file in 48.19 seconds
Iteration 7
Unloaded file in 51.67 seconds
Loaded file in 51.32 seconds
Iteration 8
Unloaded file in 55.25 seconds
Loaded file in 56.11 seconds
备注:
file
变量中的先前数据),然后再次上升。随着时间的推移,卸载和装载零件的速度似乎都会减慢。令我惊讶的是,卸载部分的 RAM 下降得如此之慢。del file
和 gc.collect()
,但这不会加快任何速度。return pickle.load(handle)
更改为 return handle.read()
,则卸载时间始终为 0.45 秒,加载时间始终为 4.85 秒。Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:51:29) [MSC v.1929 64 bit (AMD64)]
)。有什么想法吗?如果有一种具有相似读取速度并且不会遇到上述问题的替代方案(我不担心压缩),我也愿意放弃使用 pickle。
编辑: 我已经针对不同尺寸的泡菜运行了上述加载和卸载循环。下面的结果显示了速度随时间的相对变化。对于 3 GB 以上的任何内容,卸载时间开始显着增加。
我很想知道这种速度变慢的原因,并且我在类似的任务中也遇到过这种情况。我用 h5py 而不是 pickle 来“解决”它。全部在 Windows 11 上测试,下周将在 Linux 上运行。
我的任务是读取数百万张 numpy 图像并动态获取区域。应用程序规定图像以大约 3000 到 6000 个的批次存储在文件中。
imagesDict = {i: np.random.randint(0, 255, (300, 300), dtype=np.uint8) for i in range(4000)}
with open(filePath, 'wb') as file:
pickle.dump(imagesDict, file, pickle.HIGHEST_PROTOCOL)
thumbs = []
num_image_sets = 0
durations_s_sum = 0.
for i in range(500):
start_s = time.perf_counter()
with open(filePath, 'rb') as file:
imagesDict: dict[int, np.ndarray] = pickle.load(file)
for key in imagesDict.keys():
image = imagesDict[key]
thumb = image[:50, :50].copy()
thumbs.append(thumb)
durations_s_sum += (time.perf_counter() - start_s)
num_image_sets += 1
if 50 <= num_image_sets:
memory_info = psutil.Process(os.getpid()).memory_info()
print(f"{durations_s_sum:4.1f}s for 50 image sets of 4000 images, rss={memory_info.rss/1024/1024:6,.0f}MB, vms={memory_info.vms/1024/1024:6,.0f}MB")
durations_s_sum = 0.
num_image_sets = 0
pickle.load() 的速度随着每次迭代而减慢,很快达到不可接受的水平:
10.6s for 50 image sets of 4000 images, rss= 1,575MB, vms= 1,579MB
10.0s for 50 image sets of 4000 images, rss= 2,117MB, vms= 2,134MB
11.5s for 50 image sets of 4000 images, rss= 2,632MB, vms= 2,662MB
14.2s for 50 image sets of 4000 images, rss= 3,150MB, vms= 3,193MB
16.3s for 50 image sets of 4000 images, rss= 3,670MB, vms= 3,726MB
19.1s for 50 image sets of 4000 images, rss= 4,212MB, vms= 4,280MB
22.6s for 50 image sets of 4000 images, rss= 4,746MB, vms= 4,824MB
25.4s for 50 image sets of 4000 images, rss= 5,276MB, vms= 5,367MB
29.2s for 50 image sets of 4000 images, rss= 5,817MB, vms= 5,919MB
35.3s for 50 image sets of 4000 images, rss= 6,360MB, vms= 6,472MB
with h5py.File(filePath, 'w') as h5:
for i in range(4000):
image = np.random.randint(0, 255, (300, 300), dtype=np.uint8)
h5.create_dataset(str(i), data=image)
thumbs = []
num_image_sets = 0
durations_s_sum = 0.
for i in range(500):
start_s = time.perf_counter()
with h5py.File(filePath, "r") as h5:
for key in h5.keys():
image = h5[key]
thumb = image[:50, :50]
thumbs.append(thumb)
durations_s_sum += (time.perf_counter() - start_s)
num_image_sets += 1
if 50 <= num_image_sets:
memory_info = psutil.Process(os.getpid()).memory_info()
print(f"{durations_s_sum:4.1f}s for 50 image sets of 4000 images, rss={memory_info.rss/1024/1024:6,.0f}MB, vms={memory_info.vms/1024/1024:6,.0f}MB")
durations_s_sum = 0.
num_image_sets = 0
h5py 较慢,但持续时间几乎恒定在 19 秒左右,因此它在时间上胜出:
20.3s for 50 image sets of 4000 images, rss= 646MB, vms= 637MB
20.3s for 50 image sets of 4000 images, rss= 1,166MB, vms= 1,167MB
19.7s for 50 image sets of 4000 images, rss= 1,685MB, vms= 1,697MB
19.4s for 50 image sets of 4000 images, rss= 2,208MB, vms= 2,229MB
19.7s for 50 image sets of 4000 images, rss= 2,731MB, vms= 2,764MB
19.8s for 50 image sets of 4000 images, rss= 3,255MB, vms= 3,298MB
19.4s for 50 image sets of 4000 images, rss= 3,778MB, vms= 3,832MB
19.9s for 50 image sets of 4000 images, rss= 4,303MB, vms= 4,366MB
19.6s for 50 image sets of 4000 images, rss= 4,826MB, vms= 4,899MB
19.9s for 50 image sets of 4000 images, rss= 5,349MB, vms= 5,434MB
此外,如果内存碎片是问题所在,为什么 h5py 没有表现出类似的行为?