我正在对一个非常大的图像(25 088 像素 x 36 864 像素)进行一些图像处理。由于图像非常大,我通过 256x256 像素“图块”进行图像处理。我注意到在我的 Windows 任务管理器上运行我的函数时,我的 CPU、RAM、GPU 或 SSD 的利用率都没有达到 50%。这让我相信我可以以某种方式挤出一些性能。
def processImage(self, img, tileSize = 256, numberOfThreads = 8): # a function within a class
height, width, depth = img.shape
print(height,width,depth,img.dtype)
#create a duplicate but empty matrix same as the img
processedImage = np.zeros((height,width,3), dtype=np.uint8)
#calculate left and top offsets
leftExcessPixels = int((width%tileSize)/2)
topExcessPixels = int((height%tileSize)/2)
#calculate the number of tiles columns(X) and row(Y)
XNumberOfTiles = int(width/tileSize)
YNumberOfTiles = int(height/tileSize)
#
for y in range(YNumberOfTiles):
for x in range(XNumberOfTiles):
XStart = (leftExcessPixels + (tileSize * x))
YStart = (topExcessPixels + (tileSize * y))
XEnd = XStart + tileSize
YEnd = YStart + tileSize
croppedImage = img[YStart:YEnd, XStart:XEnd]
print('Y: ' + str(y) + ' X: ' + str(x),end=" ")
#process the cropped images and store it on the same location on the empty image
processedImage[YStart:YEnd, XStart:XEnd] = self.doSomeImageProcessing(croppedImage)
多线程似乎是我并行处理“图块”的解决方案。由于瓷砖的加工是相互独立的,因此同时加工多个瓷砖应该没有问题。我不确定如何做,但
self.doSomeImageProcessing(croppedImage)
生成的矩阵应该放回相同的坐标,但位于名为 processedImage
的不同变量上。我担心由于有多个线程并且所有线程都试图写入 processedImage
图像变量 python 可能不太喜欢这样,关于如何处理它有什么想法吗?
编辑::
这是用于测试的示例代码
from multiprocessing import Process, Value, Array
from time import monotonic
import numpy as np
def doSomeImageProcessing(npRGBImage):
#Dummy image processing, just set all values to 255 or make the image white
print('I have been called')
a = npRGBImage
a[:] = 255
return a
def processImage(tileSize = 256):
#create a large dummy image
img = np.zeros((25088,36864,3), dtype=np.uint8)
height, width, depth = img.shape
print(height,width,depth,img.dtype)
processedImage = np.zeros((height,width,depth), dtype=np.uint8)
leftExcessPixels = int((width%tileSize)/2)
topExcessPixels = int((height%tileSize)/2)
XNumberOfTiles = int(width/tileSize)
YNumberOfTiles = int(height/tileSize)
for y in range(YNumberOfTiles):
for x in range(XNumberOfTiles):
XStart = (leftExcessPixels + (tileSize * x))
YStart = (topExcessPixels + (tileSize * y))
XEnd = XStart + tileSize
YEnd = YStart + tileSize
croppedImage = img[YStart:YEnd, XStart:XEnd]
print('Y: ' + str(y) + ' X: ' + str(x),end=" ")
#Recreate the full image using the processed tiles
#Original Approach
#Run time 6.375 seconds
processedImage[YStart:YEnd, XStart:XEnd] = doSomeImageProcessing(croppedImage)
#check if all indexes were set to 255
mean = np.mean(processedImage)
if mean == 255:
print('Image Processing successful: ', mean)
else:
print('Image Processing failed: ', mean)
if __name__ == "__main__":
start_time = monotonic()
processImage()
print(f"Run time {monotonic() - start_time} seconds")
在Python中使用多线程函数运行图像数据处理不会对整体性能产生任何更多的改进。这是由于 GIL 或 Global Interpreter Lock 造成的,它可以防止底层解释器在任何给定时间运行多个系统线程。
您可以用来提高处理性能的是使用multiprocessing模块,正如@Grismar已经提到的那样。
如果没有实际数据来运行测试,就很难为您提供可行的解决方案。也就是说,从我在您的代码中看到的内容来看,您可以执行以下操作:
from multiprocessing import Process, Value, Array
import time
def processImage(self, img, tileSize = 256, numberOfThreads = 8): # a function within a class
height, width, depth = img.shape
print(height,width,depth,img.dtype)
#create a duplicate but empty matrix same as the img
processedImage = np.zeros((height,width,3), dtype=np.uint8)
#calculate left and top offsets
leftExcessPixels = int((width%tileSize)/2)
topExcessPixels = int((height%tileSize)/2)
#calculate the number of tiles columns(X) and row(Y)
XNumberOfTiles = int(width/tileSize)
YNumberOfTiles = int(height/tileSize)
#
for y in range(YNumberOfTiles):
for x in range(XNumberOfTiles):
XStart = (leftExcessPixels + (tileSize * x))
YStart = (topExcessPixels + (tileSize * y))
XEnd = XStart + tileSize
YEnd = YStart + tileSize
croppedImage = img[YStart:YEnd, XStart:XEnd]
print('Y: ' + str(y) + ' X: ' + str(x),end=" ")
#process the cropped images and store it on the same location on the empty image
# XXX use a Shared Memory Data Structure
image_array = Array('i', processedImage[YStart:YEnd, XStart:XEnd])
process = Process(target=self.doSomeImageProcessing, args=(image_array,))
process.start()
process.join()
当然,如果没有实际的样本数据进行测试,我们无法判断需要调整多少,但总的来说,这应该会在图像处理时间上为您带来良好的性能提升。
我认为您需要考虑的一个重要方面是在这些进程之间拥有一些共享状态。如果您最终选择这条路线,则有 共享内存映射,它将允许您跨进程访问数据。
尽管您需要提供一个简短的自包含正确示例