为什么使用 Python 时找不到 .tar 对象/文件内的文件?

问题描述 投票:0回答:1

我正在使用 python3 并使用 boto3.resource (s3_target = boto3.resource),由此,我可以通过 s3_target.Bucket('dataset') 访问我的存储桶。

最终,我希望迭代存储桶的 .tar 文件并提取有关 .tar 文件中包含的 .tiff 文件的信息。

但是,我的问题是在访问 .tiff 文件时出现的。

# Function to extract TIFF files from a tar archive
def extract_tiff_from_tar(bucket, tar_key):
    obj = bucket.Object(tar_key)
    response = obj.get()
    
    # Extract .tar file from S3 object
    tar_buffer = io.BytesIO(response['Body'].read())

    # Extract TIFF files from the .tar archive
    with tarfile.open(fileobj=tar_buffer, mode='r') as tar:

        # Determine the size of the .tar file
        tar_size_bytes = sum(member.size for member in tar.getmembers())
        tar_size_gb = tar_size_bytes / (1024 * 1024 * 1024)
        print(f"Size of the tar file: {tar_size_gb:.2f} GB")

        for member in tar.getmembers():
            # Filter the TIFF files
            if member.name.endswith('.tiff') and member.isfile():
                print("Current file type in tar: " + str(member.name))
                # Check if the file exists
                if os.path.exists(member.name):
                    print('This file exists: ' + str(member.name))

                # Use the try/except block to attempt to validate the existance of the TIFF
                try:
                    with open(member.name, 'rb') as tiff_file:
                        tiff = tf.imread(tiff_file)
                        print('This is the shape of the tiff: ' + str(tiff.shape))

                except FileNotFoundError as e:
                    continue

上面的代码提供以下输出:

Size of the tar file: 1.59 GB
Current file type in tar: POW-xx0
This file exists: POW-xx0
This is the shape of the tiff: (2048, 3072, 15)
Current file type in tar: POW-xx1
Current file type in tar: POW-xx2

我的问题是,为什么是这行代码

if os.path.exists(member.name):
                    print('This file exists: ' + str(member.name))

未执行,尽管member.name是.tar文件中的文件之一,我们可以从代码片段的每次执行中对其进行迭代的事实来看

for member in tar.getmembers():

如有任何意见,我们将不胜感激。我觉得很奇怪,.tar 对象的第一个成员没有错误,但是以下成员尽管显示了它们的名称(这也称为它们的键吗?),但在传递到 os.path 时并未计算为 true .exists() 方法。

python amazon-web-services boto3 tar filenotfounderror
1个回答
0
投票

我找到了问题的答案,尽管我仍然不确定最初的问题发生了什么。

如果我使用 tar.extract() 方法并传递 .tarchive 的成员,它允许我将返回值分配给变量。从这里,我可以读取它,将其更改为二进制,然后使用 tifffile(此处为 tf)读取图像并访问它。

感谢那些发表评论的人,我希望这对将来的人有所帮助。我现在需要以某种方式关闭这篇文章吗?抱歉,我是新人。

with tarfile.open(fileobj=tar_buffer, mode='r') as tar:

    # Determine the size of the .tar file
    tar_size_bytes = sum(member.size for member in tar.getmembers())
    tar_size_gb = tar_size_bytes / (1024 * 1024 * 1024)
    print(f"Size of the tar file: {tar_size_gb:.2f} GB")

    for member in tar.getmembers():
        # Filter the TIFF files
        if member.name.endswith('.tiff') and member.isfile():

            a = tar.extractfile(member)
            tiff_data = a.read()
            tiff_io = io.BytesIO(tiff_data)
            tiff = tf.imread(tiff_io)
            print(tiff.shape)
© www.soinside.com 2019 - 2024. All rights reserved.