我正在使用 python3 并使用 boto3.resource (s3_target = boto3.resource),由此,我可以通过 s3_target.Bucket('dataset') 访问我的存储桶。
最终,我希望迭代存储桶的 .tar 文件并提取有关 .tar 文件中包含的 .tiff 文件的信息。
但是,我的问题是在访问 .tiff 文件时出现的。
# Function to extract TIFF files from a tar archive
def extract_tiff_from_tar(bucket, tar_key):
obj = bucket.Object(tar_key)
response = obj.get()
# Extract .tar file from S3 object
tar_buffer = io.BytesIO(response['Body'].read())
# Extract TIFF files from the .tar archive
with tarfile.open(fileobj=tar_buffer, mode='r') as tar:
# Determine the size of the .tar file
tar_size_bytes = sum(member.size for member in tar.getmembers())
tar_size_gb = tar_size_bytes / (1024 * 1024 * 1024)
print(f"Size of the tar file: {tar_size_gb:.2f} GB")
for member in tar.getmembers():
# Filter the TIFF files
if member.name.endswith('.tiff') and member.isfile():
print("Current file type in tar: " + str(member.name))
# Check if the file exists
if os.path.exists(member.name):
print('This file exists: ' + str(member.name))
# Use the try/except block to attempt to validate the existance of the TIFF
try:
with open(member.name, 'rb') as tiff_file:
tiff = tf.imread(tiff_file)
print('This is the shape of the tiff: ' + str(tiff.shape))
except FileNotFoundError as e:
continue
上面的代码提供以下输出:
Size of the tar file: 1.59 GB
Current file type in tar: POW-xx0
This file exists: POW-xx0
This is the shape of the tiff: (2048, 3072, 15)
Current file type in tar: POW-xx1
Current file type in tar: POW-xx2
我的问题是,为什么是这行代码
if os.path.exists(member.name):
print('This file exists: ' + str(member.name))
未执行,尽管member.name是.tar文件中的文件之一,我们可以从代码片段的每次执行中对其进行迭代的事实来看
for member in tar.getmembers():
如有任何意见,我们将不胜感激。我觉得很奇怪,.tar 对象的第一个成员没有错误,但是以下成员尽管显示了它们的名称(这也称为它们的键吗?),但在传递到 os.path 时并未计算为 true .exists() 方法。
我找到了问题的答案,尽管我仍然不确定最初的问题发生了什么。
如果我使用 tar.extract() 方法并传递 .tarchive 的成员,它允许我将返回值分配给变量。从这里,我可以读取它,将其更改为二进制,然后使用 tifffile(此处为 tf)读取图像并访问它。
感谢那些发表评论的人,我希望这对将来的人有所帮助。我现在需要以某种方式关闭这篇文章吗?抱歉,我是新人。
with tarfile.open(fileobj=tar_buffer, mode='r') as tar:
# Determine the size of the .tar file
tar_size_bytes = sum(member.size for member in tar.getmembers())
tar_size_gb = tar_size_bytes / (1024 * 1024 * 1024)
print(f"Size of the tar file: {tar_size_gb:.2f} GB")
for member in tar.getmembers():
# Filter the TIFF files
if member.name.endswith('.tiff') and member.isfile():
a = tar.extractfile(member)
tiff_data = a.read()
tiff_io = io.BytesIO(tiff_data)
tiff = tf.imread(tiff_io)
print(tiff.shape)