使用Astropy打开FITS时出现OSError 24

Question

首先，我已经阅读了以下内容：

还有一些来自第一个的链接，但没有一个工作......

我的问题是在Jupyter笔记本中打开巨大的（> 80 Mb / pc。）和大量（~3000）FITS文件。相关的代码段如下：

# Dictionary to store NxN data matrices of cropped image tiles
CroppedObjects = {}

# Defining some other, here used variable....
# ...

# Interate over all images ('j'), which contain the current object, indexed by 'i'
for i in range(0, len(finalObjects)):
    for j in range(0, len(containingImages[containedObj[i]])):

        countImages += 1

        # Path to the current image: 'mnt/...'
        current_image_path = ImagePaths[int(containingImages[containedObj[i]][j])]

        # Open .fits images
        with fits.open(current_image_path, memmap=False) as hdul:
            # Collect image data
            image_data = fits.getdata(current_image_path)

            # Collect WCS data from the current .fits's header
            ImageWCS = wcs.WCS(hdul[1].header)

            # Cropping parameters:
            # 1. Sky-coordinates of the croppable object
            # 2. Size of the crop, already defined above
            Coordinates = coordinates.SkyCoord(finalObjects[i][1]*u.deg,finalObjects[i][2]*u.deg, frame='fk5')
            size = (cropSize*u.pixel, cropSize*u.pixel)

            try:
                # Cut out the image tile
                cutout = Cutout2D(image_data, position=Coordinates, size=size, wcs=ImageWCS, mode='strict')

                # Write the cutout to a new FITS file
                cutout_filename = "Cropped_Images_Sorted/Cropped_" + str(containedObj[i]) + current_image_path[-23:]

                # Sava data to dictionary
                CroppedObjects[cutout_filename] = cutout.data

                foundImages += 1

            except:
                pass

            else:
                del image_data
                continue

        # Memory maintainance                
        gc.collect()

        # Progress bar
        sys.stdout.write("\rProgress: [{0}{1}] {2:.3f}%\tElapsed: {3}\tRemaining: {4}  {5}".format(u'\u2588' * int(countImages/allCrops * progressbar_width),
                                                                                                   u'\u2591' * (progressbar_width - int(countImages/allCrops * progressbar_width)),
                                                                                                   countImages/allCrops * 100,
                                                                                                   datetime.now()-starttime,
                                                                                                   (datetime.now()-starttime)/countImages * (allCrops - countImages),
                                                                                                   foundImages))

        sys.stdout.flush()

好吧，它实际上有三件事：

打开特定的FITS文件
切出一个正方形（但是strictly，所以如果数组只是部分重叠，那么try语句跳转到循环中的下一步）
更新进度条

然后转到下一个文件，做同样的事情并迭代我的所有FITS文件。

但是：如果我尝试运行此代码，在大约1000个找到的图片后，它停止并给出和OSError: [Errno 24] Too many open files在线：

image_data = fits.getdata(current_image_path)

我尝试了一切，本来应该解决问题，但没有任何帮助...甚至没有设置内存映射到false或使用fits.getdata和gc.collect() ...还尝试了许多小的更改，如运行没有try声明，切断所有图像拼贴，没有任何限制。在else语句中的del也是我的另一个悲惨的尝试。还有什么可以让它最终成功？另外，如果有些事情不清楚，请随时问我！我也会尽力帮你理解这个问题！

Answer 1

这条线是伤害你的：

image_data = fits.getdata(current_image_path)

你刚刚用memmap=False在前一行上打开了那个文件，但是你用memmap=True重新打开它，当你通过将它包装在image_data中然后保持对Cutout2D的引用时保持文件打开对数据的引用：

CroppedObjects[cutout_filename] = cutout.data

据我所知，Cutout2D不一定要复制数据，如果它没有，所以你仍然有效地持有对image_data的引用，这是mmap'd。

解决方案：不要在这里使用fits.getdata。请参阅有关此in the docs的警告：

这些函数对于交互式Python会话和简单分析脚本非常有用，但不应该用于应用程序代码，因为它们效率很低。例如，每次调用getval()都需要重新解析整个FITS文件。重复使用这些函数的代码应该使用open()打开文件并直接访问数据结构。

所以在你的情况下，你想要替换线：

image_data = fits.getdata(current_image_path)

同

image_data = hdul[1].data

正如@Christoph在他的回答中所写，摆脱所有del image_data和gc.collect()的东西，因为它无论如何都没有帮助你。

附录：来自Cutout2D的API文档：

如果False（默认），则剪切数据将是原始数据数组的视图。如果True，那么剪切数据将保存原始数据数组的副本。

所以这是明确地说明（并且我通过查看代码确认了这一点）Cutout2D只是看到了原始数据数组，这意味着它正在坚持对它的引用。如果你愿意，你可以通过调用Cutout2D(..., copy=True)来避免这种情况。如果你这样做，你可能也可以取消memmap=False。使用mmap可能有用也可能没用：它部分取决于图像的大小和可用的物理RAM量。在你的情况下，它可能会更快，因为你没有使用整个图像，只是采取它们的剪切。这意味着使用memmap=True可能更有效，因为它可以允许避免将整个图像阵列分页到内存中。

但这也可能还取决于很多事情，所以你可能想用fits.open(..., memmap=False) + Cutout2D(..., copy=False)和fits.open(..., memmap=True) + Cutout2D(..., copy=True)做一些性能测试，可能只有少量的文件。

Answer 2

我过去也遇到过类似的问题（见here）。最后我的工作大致如下：

total = 0
for filename in filenames:
    with fits.open(filename, memmap=False) as hdulist:
        data = hdulist['spam'].data
    total += data.sum()

一些说明：

使用fits.open打开文件，使用memmap=False
在with块中使用它，使文件关闭可靠
保持with block short，只需将所需数据加载到内存中，然后退出即可关闭文件
在文件关闭后执行您需要处理的数据;可能并不真正需要，但如果Python引用文件中的数据是阻止其被关闭的问题，这简化了这种情况。我不认为剪切代码是你的例子中的问题，但它可能是 - 尝试取消注释它？
不要做额外的fits.getdata，我认为再次打开文件
不应该需要del和gc.collect，如果这里的代码很简单，就不会有循环引用，Python会在范围的末尾可靠地删除对象

现在它可能无济于事，你仍然会遇到问题。在这种情况下，继续进行的方法是制作一个对Astropy开发人员无法运行的最小可重现示例（如我做here），然后向Astropy提出问题，给出你的Python版本，Astropy版本和操作系统，或在这里发布。重点是：这很复杂，可能依赖于运行时/版本，因此要尝试将其固定下来，任何人都可以运行的示例，这对您来说是失败的。

使用Astropy打开FITS时出现OSError 24

问题描述投票：0回答：2

2个回答

最新问题

使用Astropy打开FITS时出现OSError 24

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2