我正在尝试使用 python 从内存中读取和写入 tar.gz 文件。我已经阅读了相关的 python 文档,并提出了以下最小工作示例来演示我的问题。
text = "This is a test."
file_name = "test.txt"
text_buffer = io.BytesIO()
text_buffer.write(text.encode(encoding="utf-8"))
tar_buffer = io.BytesIO()
# Start a tar file with the memory buffer as the "file".
with tarfile.open(fileobj=tar_buffer, mode="w:gz") as archive:
# We must create a TarInfo object for each file we put into the tar file.
info = tarfile.TarInfo(file_name)
text_buffer.seek(0, io.SEEK_END)
info.size = text_buffer.tell()
# We have to reset the data frame buffer as tarfile.addfile doesn't do this for us.
text_buffer.seek(0, io.SEEK_SET)
# Add the text to the tarfile.
archive.addfile(info, text_buffer)
with open("test.tar.gz", "wb") as f:
f.write(tar_buffer.getvalue())
# The following command works fine.
# tar -zxvf test.tar.gz
archive_contents = dict()
# Start a tar file with the memory buffer as the "file".
with tarfile.open(fileobj=tar_buffer, mode="r:*") as archive:
for entry in archive:
entry_fd = archive.extractfile(entry.name)
archive_contents[entry.name] = entry_fd.read().decode("utf-8")
奇怪的是使用
tar
命令提取存档完全正常。我看到一个文件 test.txt
包含字符串 This is a test.
.
但是
for entry in archive
立即完成,因为存档中似乎没有文件。 archive.getmembers()
返回一个空列表。
另一个奇怪的问题是当我在打开字节流时设置
mode="r:gz"
我得到以下异常
Exception has occurred: ReadError
empty file
tarfile.EmptyHeaderError: empty header
During handling of the above exception, another exception occurred:
File ".../test.py", line 283, in <module>
with tarfile.open(fileobj=tar_buffer, mode="r:gz") as archive:
tarfile.ReadError: empty file
我也尝试过使用
test.tar.gz
命令创建一个 tar
文件(假设它们可能是我编写 tar 文件的方式的一些问题),但我得到了同样的异常。
我一定是遗漏了一些基本的东西,但我似乎无法在网上找到任何例子。
在提取文件之前,您需要将缓冲区的位置重置为开头,因为写入 tar_buffer 后,其位置在文件的末尾。因此,当您尝试从中读取时,没有要提取的文件
with open("test.tar.gz", "wb") as f:
f.write(tar_buffer.getvalue())
tar_buffer.seek(0)
archive_contents = dict()