我想在 2GB+ 文件的开头添加一个新行。我尝试了以下代码,但代码内存不足 错误。
myfile = open(tableTempFile, "r+")
myfile.read() # read everything in the file
myfile.seek(0) # rewind
myfile.write("WRITE IN THE FIRST LINE ")
myfile.close();
请注意,Python 中的任何内置函数都无法做到这一点。
您可以在 LINUX 中使用 tail / cat 等轻松完成此操作。
为了通过 Python 执行此操作,我们必须使用辅助文件,并且对于非常大的文件执行此操作,我认为这种方法是可行的:
def add_line_at_start(filename,line_to_be_added):
f = fileinput.input(filename,inplace=1)
for xline in f:
if f.isfirstline():
print line_to_be_added.rstrip('\r\n') + '\n' + xline,
else:
print xline
注意:
处理大文件时,切勿尝试使用 read() / readlines() 函数。这些方法尝试将完整的文件加载到您的内存中
在您给定的代码中,seek 函数将带您开始,但您编写的所有内容都会覆盖当前内容
如果您有能力一次性将整个文件存储在内存中:
first_line_update = "WRITE IN THE FIRST LINE \n"
with open(tableTempFile, 'r+') as f:
lines = f.readlines()
lines[0] = first_line_update
f.writelines(lines)
否则:
from shutil import copy
from itertools import islice, chain
# TODO: use a NamedTemporaryFile from the tempfile module
first_line_update = "WRITE IN THE FIRST LINE \n"
with open("inputfile", 'r') as infile, open("tmpfile", 'w+') as outfile:
# replace the first line with the string provided:
outfile.writelines(
(line for line in chain((first_line_update,), islice(infile,1,None)))
# if you don't want to replace the first line but to insert another line before
# this simplifies to:
#outfile.writelines(line for line in chain((first_line_update,), infile))
copy("tmpfile", "infile")
# TODO: remove temporary file
一般来说,你不能这样做。文件是字节序列,而不是行序列。此数据模型不允许在任意点插入 - 您可以用另一个字节替换一个字节或在末尾附加字节。
您可以:
tempfile
模块将帮助您)r
中打开基本文件,并将其第一行后的内容逐段复制到临时文件中(请注意,附加到文件末尾要容易得多 - 您所需要做的就是在附加
a
模式下打开文件。)
基于之前使用临时文件的解决方案:
def add_lines_at_beginning(filename: str, text: str):
# using temporary file (will be removed when closed)
with (open(filename, 'r', encoding="utf-8") as infile,
NamedTemporaryFile(mode='w+', encoding="utf-8", delete=False) as outfile):
# replace the first line with the string provided:
outfile.writelines(
(line for line in chain((text,), islice(infile,1,None))))
# if you don't want to replace the first line but to insert another line before
# this simplifies to:
#outfile.writelines(line for line in chain((first_line_update,), infile))
copy(outfile.name, filename)