在Python中将.IMG（经典磁盘映像）转换为.PNG / .JPG

Question

我有一个1.00万个以上的.IMG文件的数据集，我需要将其转换为.PNG / .JPG格式，以将CNN应用于简单的分类任务。我指的是this answer，该解决方案对我部分地有效。我的意思是某些图像没有正确转换。据我了解，其原因是有些图像的像素深度为16，而有些图像的像素深度为8。

for file in fileList:
    rawData = open(file, 'rb').read()
    size = re.search("(LINES              = \d\d\d\d)|(LINES              = \d\d\d)", str(rawData))
    pixelDepth = re.search("(SAMPLE_BITS        = \d\d)|(SAMPLE_BITS        = \d)", str(rawData))
    size = (str(size)[-6:-2])
    pixelDepth = (str(pixelDepth)[-4:-2])
    print(int(size))
    print(int(pixelDepth))
    imgSize = (int(size), int(size))



    img = Image.frombytes('L', imgSize, rawData)
    img.save(str(file)+'.jpg')

数据源：NASA Messenger Mission.IMG files and their corresponding converted .JPG Files

像素深度为8的文件已成功转换：

像素深度为16的文件未正确转换：

请让我知道是否需要提供更多信息。

Answer 1

[希望我从其他答案here起，您现在对文件的格式有了更好的了解。因此，代码应如下所示：

#!/usr/bin/env python3

import sys
import re
import numpy as np
from PIL import Image
import cv2

rawData  = open('EW0220137564B.IMG', 'rb').read()
# File size in bytes
fs       = len(rawData)
bitDepth = int(re.search("SAMPLE_BITS\s+=\s+(\d+)",str(rawData)).group(1))
bytespp  = int(bitDepth/8)
height   = int(re.search("LINES\s+=\s+(\d+)",str(rawData)).group(1))
width    = int(re.search("LINE_SAMPLES\s+=\s+(\d+)",str(rawData)).group(1))
print(bitDepth,height,width)

# Offset from start of file to image data - assumes image at tail end of file
offset = fs - (width*height*bytespp)

# Check bitDepth
if bitDepth == 8:
    na = np.frombuffer(rawData, offset=offset, dtype=np.uint8).reshape(height,width)
elif bitDepth == 16:
    dt = np.dtype(np.uint16)
    dt = dt.newbyteorder('>')
    na = np.frombuffer(rawData, offset=offset, dtype=dt).reshape(height,width).astype(np.uint8)
else:
    print(f'ERROR: Unexpected bit depth: {bitDepth}',file=sys.stderr)

# Save either with PIL
Image.fromarray(na).save('result.jpg')
# Or with OpenCV may be faster
cv2.imwrite('result.jpg', na)

如果您有成千上万的工作，我建议使用GNU Parallel，您可以使用homebrew使用以下命令轻松将其安装在Mac上：

brew install parallel

然后您可以在上面更改我的程序，以使用文件名作为硬编码文件名的参数，并且并行完成所有命令的命令是：

parallel --dry-run script.py {} ::: *.IMG

需要更多的精力，您可以通过将上面的代码放入函数中，并为每个指定为参数的文件调用该函数，从而更快地完成任务。这样，您可以避免为每个图像启动一个新的Python解释器，并告诉GNU Parallel将尽可能多的文件传递给脚本的每次调用，如下所示：

parallel -X --dry-run script.py ::: *.IMG

脚本的结构如下：

def processOne(filename):
    open, read, search, extract, save as per my code above

# Main - process all filenames received as parameters
for filename in sys.argv[1:]:
    processOne(filename)

在Python中将.IMG（经典磁盘映像）转换为.PNG / .JPG

问题描述投票：0回答：1

1个回答

最新问题

在Python中将.IMG（经典磁盘映像）转换为.PNG / .JPG

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1