如何逐行读取文件到列表中？

Question

如何在Python中读取文件的每一行并将每一行存储为列表中的元素？

我想逐行读取文件，并将每行附加到列表的末尾。

Answer 1

with open(fname) as f:
    content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content]

Answer 2

要将文件读入列表，您需要做三件事：

打开文件
阅读文件
将内容存储为列表

幸运的是，Python可以很容易地完成这些操作，因此将文件读入列表的最短方法是：

lst = list(open(filename))

但是我会补充一些解释。

打开文件

我假设你想打开一个特定的文件而你不直接处理文件句柄（或类似文件的句柄）。在Python中打开文件最常用的函数是open，它在Python 2.7中需要一个必需参数和两个可选参数：

文件名
模式
缓冲（我会在这个答案中忽略这个论点）

文件名应该是表示文件路径的字符串。例如：

open('afile')   # opens the file named afile in the current working directory
open('adir/afile')            # relative path (relative to the current working directory)
open('C:/users/aname/afile')  # absolute path (windows)
open('/usr/local/afile')      # absolute path (linux)

请注意，需要指定文件扩展名。这对Windows用户尤为重要，因为在浏览器中查看时，默认情况下会隐藏.txt或.doc等文件扩展名。

第二个参数是mode，默认情况下是r，意思是“只读”。这正是您所需要的。

但是如果您确实想要创建文件和/或写入文件，则需要在此处使用不同的参数。 There is an excellent answer if you want an overview。

要读取文件，您可以省略mode或明确传递它：

open(filename)
open(filename, 'r')

两者都将以只读模式打开文件。如果您想在Windows上读取二进制文件，则需要使用模式rb：

open(filename, 'rb')

在其他平台上，'b'（二进制模式）被简单地忽略了。

现在我已经展示了如何open文件，让我们谈谈你总是需要再次close的事实。否则它将保持文件的打开文件句柄，直到进程退出（或Python使文件句柄变得无效）。

虽然你可以使用：

f = open(filename)
# ... do stuff with f
f.close()

当open和close之间发生异常时，这将无法关闭文件。你可以通过使用try和finally来避免这种情况：

f = open(filename)
# nothing in between!
try:
    # do stuff with f
finally:
    f.close()

然而，Python提供了具有更漂亮语法的上下文管理器（但对于open，它几乎与上面的try和finally相同）：

with open(filename) as f:
    # do stuff with f
# The file is always closed after the with-scope ends.

最后一种方法是在Python中打开文件的推荐方法！

读取文件

好的，你已经打开了文件，现在该如何阅读？

open函数返回一个file对象，它支持Pythons迭代协议。每次迭代都会给你一行：

with open(filename) as f:
    for line in f:
        print(line)

这将打印文件的每一行。但请注意，每一行最后都会包含换行符\n（您可能需要检查您的Python是否使用universal newlines support构建 - 否则您也可以在Windows上使用\r\n或在Mac上使用\r作为换行符）。如果你不想要，你可以简单地删除最后一个字符（或Windows上的最后两个字符）：

with open(filename) as f:
    for line in f:
        print(line[:-1])

但是最后一行不一定有一个尾随换行符，所以不应该使用它。可以检查它是否以尾随换行结束，如果是，则删除它：

with open(filename) as f:
    for line in f:
        if line.endswith('\n'):
            line = line[:-1]
        print(line)

但是你可以简单地从字符串的末尾删除所有空格（包括\n字符），这也将删除所有其他尾随空格，因此如果这些很重要，你必须小心：

with open(filename) as f:
    for line in f:
        print(f.rstrip())

然而，如果线条以\r\n（Windows“换行符”）结束，.rstrip()也会照顾\r！

将内容存储为列表

既然您知道如何打开文件并阅读它，那么就可以将内容存储在列表中了。最简单的选择是使用list函数：

with open(filename) as f:
    lst = list(f)

如果您想要删除尾随换行符，则可以使用列表推导：

with open(filename) as f:
    lst = [line.rstrip() for line in f]

甚至更简单：.readlines()对象的file方法默认返回行的list：

with open(filename) as f:
    lst = f.readlines()

这也将包括尾随的换行符，如果你不想要它们，我会推荐[line.rstrip() for line in f]方法，因为它避免保留两个包含内存中所有行的列表。

有一个额外的选项来获得所需的输出，但它相当“次优”：read字符串中的完整文件然后拆分换行：

with open(filename) as f:
    lst = f.read().split('\n')

要么：

with open(filename) as f:
    lst = f.read().splitlines()

这些会自动处理尾随换行符，因为不包括split字符。但是它们并不理想，因为您将文件保存为字符串和内存中的行列表！

摘要

打开文件时使用with open(...) as f，因为您不需要自己关闭文件，即使发生异常也会关闭文件。
file对象支持迭代协议，因此逐行读取文件就像for line in the_file_object:一样简单。
始终浏览可用功能/类的文档。大部分时间都是完美匹配任务或至少一两个好任务。在这种情况下显而易见的选择是readlines()但是如果你想在将它们存储在列表中之前处理这些行，我会建议一个简单的列表理解。

Answer 3

这是通过对文件使用列表推导的另一个选项;

lines = [line.rstrip() for line in open('file.txt')]

这应该是更有效的方式，因为大部分工作是在Python解释器内完成的。

Answer 4

另一种选择是numpy.genfromtxt，例如：

import numpy as np
data = np.genfromtxt("yourfile.dat",delimiter="\n")

这将使data成为NumPy数组，其行数与文件中的数量相同。

Answer 5

如果您想从命令行或stdin读取文件，还可以使用fileinput模块：

# reader.py
import fileinput

content = []
for line in fileinput.input():
    content.append(line.strip())

fileinput.close()

像这样传递文件：

$ python reader.py textfile.txt

在这里阅读更多：http://docs.python.org/2/library/fileinput.html

Answer 6

最简单的方法

一个简单的方法是：

将整个文件作为字符串读取
逐行拆分字符串

在一行中，这将给出：

lines = open('C:/path/file.txt').read().splitlines()

Answer 7

Read and write text files with Python 2 and Python 3; it works with Unicode

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Define data
lines = ['     A first string  ',
         'A Unicode sample: €',
         'German: äöüß']

# Write text file
with open('file.txt', 'w') as fp:
    fp.write('\n'.join(lines))

# Read text file
with open('file.txt', 'r') as fp:
    read_lines = fp.readlines()
    read_lines = [line.rstrip('\n') for line in read_lines]

print(lines == read_lines)

需要注意的事项：

with是一个所谓的context manager。它确保打开的文件再次关闭。
这里简单地制作.strip()或.rstrip()的所有解决方案将无法再现lines，因为它们也剥离了白色空间。

常见文件结尾

.txt

更高级的文件写入/读取

CSV：超简单格式（read & write）
JSON：很适合编写人类可读的数据;非常常用（read & write）
YAML：YAML是JSON的超集，但更容易阅读（read & write，comparison of JSON and YAML）
pickle：Python序列化格式（read & write）
MessagePack（Python package）：更紧凑的表现形式（read & write）
HDF5（Python package）：对于矩阵很好（read & write）
XML：也存在*叹气*（read＆write）

对于您的应用程序，以下可能很重要：

其他编程语言的支持
读/写性能
紧凑性（文件大小）

另见：Comparison of data serialization formats

如果您正在寻找制作配置文件的方法，您可能需要阅读我的短文Configuration files in Python。

Answer 8

在Python 3.4中引入，pathlib有一个非常方便的方法从文件中读取文本，如下所示：

from pathlib import Path
p = Path('my_text_file')
lines = p.read_text().splitlines()

（splitlines调用将其从包含文件的全部内容的字符串转换为文件中的行列表）。

pathlib有很多方便的便利。 read_text简洁明了，您不必担心打开和关闭文件。如果您只需要一次性读取文件，那么这是一个不错的选择。

Answer 9

f = open("your_file.txt",'r')
out = f.readlines() # will append in the list out

现在变量输出是你想要的列表（数组）。你可以这样做：

for line in out:
    print line

要么

for line in f:
    print line

你会得到相同的结果。

Answer 10

只需使用splitlines（）函数。这是一个例子。

inp = "file.txt"
data = open(inp)
dat = data.read()
lst = dat.splitlines()
print lst
# print(lst) # for python 3

在输出中，您将获得行列表。

Answer 11

如果你想要面对一个非常大/巨大的文件并希望更快地阅读（想象你是在Topcoder / Hackerrank编码竞赛中），你可能会一次将更大的行读入内存缓冲区，而不是只是在文件级别逐行迭代。

buffersize = 2**16
with open(path) as f: 
    while True:
        lines_buffer = f.readlines(buffersize)
        if not lines_buffer:
            break
        for line in lines_buffer:
            process(line)

Answer 12

见Input and Ouput：

with open('filename') as f:
    lines = f.readlines()

或者剥离换行符：

lines = [line.rstrip('\n') for line in open('filename')]

编者注：这个答案的原始空白剥离命令line.strip()，正如Janus Troelsen的评论暗示的那样，将删除所有前导和尾随空格，而不仅仅是尾随\n。

Answer 13

最简单的方法是获得一些额外的好处：

lines = list(open('filename'))

要么

lines = tuple(open('filename'))

要么

lines = set(open('filename'))

在set的情况下，我们必须记住，我们没有保留行顺序并摆脱重复的行。

Answer 14

用这个：

import pandas as pd
data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc.
array = data.values

data是一种数据框类型，并使用值来获取ndarray。您还可以使用array.tolist()获取列表。

Answer 15

您也可以在NumPy中使用loadtxt命令。这会检查比genfromtxt更少的条件，因此它可能更快。

import numpy
data = numpy.loadtxt(filename, delimiter="\n")

Answer 16

大纲和摘要

使用filename，从Path(filename)对象处理文件，或直接使用open(filename) as f，执行以下操作之一：

list(fileinput.input(filename))
使用with path.open() as f，请致电f.readlines()
list(f)
path.read_text().splitlines()
path.read_text().splitlines(keepends=True)
每次一行迭代fileinput.input或f和list.append
将f传递给绑定的list.extend方法
在列表理解中使用f

我解释下面每个用例。

在Python中，如何逐行读取文件？

这是一个很好的问题。首先，让我们创建一些示例数据：

from pathlib import Path
Path('filename').write_text('foo\nbar\nbaz')

文件对象是惰性迭代器，所以只需迭代它。

filename = 'filename'
with open(filename) as f:
    for line in f:
        line # do something with the line

或者，如果您有多个文件，请使用另一个惰性迭代器fileinput.input。只有一个文件：

import fileinput

for line in fileinput.input(filename): 
    line # process the line

或者对于多个文件，传递一个文件名列表：

for line in fileinput.input([filename]*2): 
    line # process the line

同样，上面的f和fileinput.input都是/ return惰性迭代器。你只能使用迭代器一次，所以提供功能代码同时避免冗长我将使用稍微更简洁的fileinput.input(filename)来自这里的apropos。

在Python中，如何逐行读取文件到列表中？

啊，但是出于某种原因你想要它在列表中吗？如果可能的话，我会避免这样做。但如果你坚持......只需将fileinput.input(filename)的结果传递给list：

list(fileinput.input(filename))

另一个直接的答案是调用f.readlines，它返回文件的内容（最多可选的hint字符数，因此您可以将其分解为多个列表）。

您可以通过两种方式获取此文件对象。一种方法是将文件名传递给open内置：

filename = 'filename'

with open(filename) as f:
    f.readlines()

或者使用pathlib模块中的新Path对象（我已经非常喜欢它，并将在这里使用）：

from pathlib import Path

path = Path(filename)

with path.open() as f:
    f.readlines()

list也将使用文件迭代器并返回一个列表 - 一个非常直接的方法：

with path.open() as f:
    list(f)

如果您不介意在拆分之前将整个文本作为单个字符串读入内存，则可以使用Path对象和splitlines()字符串方法将其作为单行。默认情况下，splitlines删除换行符：

path.read_text().splitlines()

如果你想保留换行符，请通过keepends=True：

path.read_text().splitlines(keepends=True)

我想逐行读取文件，并将每行附加到列表的末尾。

现在这有点愚蠢，因为我们已经用几种方法很容易地证明了最终结果。但是你可能需要在列表中过滤或操作行，所以让我们幽默这个请求。

使用list.append可以在添加之前对每行进行过滤或操作：

line_list = []
for line in fileinput.input(filename):
    line_list.append(line)

line_list

使用list.extend会更直接，如果你有一个预先存在的列表，也许有用：

line_list = []
line_list.extend(fileinput.input(filename))
line_list

或者更具惯用性，我们可以使用列表推导，并在需要时在其中进行映射和过滤：

[line for line in fileinput.input(filename)]

或者甚至更直接地，关闭圆圈，只需将其传递给列表即可直接创建新列表而无需操作线条：

list(fileinput.input(filename))

结论

您已经看到很多方法可以将文件中的行放到列表中，但我建议您避免将大量数据实现到列表中，而是使用Python的惰性迭代来处理数据（如果可能）。

也就是说，更喜欢fileinput.input或with path.open() as f。

Answer 17

我会尝试下面提到的方法之一。我使用的示例文件名为dummy.txt。你可以找到文件here。我认为，该文件与代码位于同一目录中（您可以更改fpath以包含正确的文件名和文件夹路径。）

在下面提到的两个示例中，您需要的列表由lst提供。

1.>第一种方法：

fpath = 'dummy.txt'
with open(fpath, "r") as f: lst = [line.rstrip('\n \t') for line in f]

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

2.>在第二种方法中，可以使用Python标准库中的csv.reader模块：

import csv
fpath = 'dummy.txt'
with open(fpath) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='   ')
    lst = [row[0] for row in csv_reader] 

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

您可以使用这两种方法中的任何一种。在这两种方法中，创建lst所需的时间几乎相等。

Answer 18

Command line version

#!/bin/python3
import os
import sys
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
filename = dname + sys.argv[1]
arr = open(filename).read().split("\n") 
print(arr)

Run with:

python3 somefile.py input_file_name.txt

Answer 19

我喜欢使用以下内容。立刻读线。

contents = []
for line in open(filepath, 'r').readlines():
    contents.append(line.strip())

或者使用列表理解：

contents = [line.strip() for line in open(filepath, 'r').readlines()]

Answer 20

这是一个Python（3）帮助器图书馆我用来简化文件I / O的类：

import os

# handle files using a callback method, prevents repetition
def _FileIO__file_handler(file_path, mode, callback = lambda f: None):
  f = open(file_path, mode)
  try:
    return callback(f)
  except Exception as e:
    raise IOError("Failed to %s file" % ["write to", "read from"][mode.lower() in "r rb r+".split(" ")])
  finally:
    f.close()


class FileIO:
  # return the contents of a file
  def read(file_path, mode = "r"):
    return __file_handler(file_path, mode, lambda rf: rf.read())

  # get the lines of a file
  def lines(file_path, mode = "r", filter_fn = lambda line: len(line) > 0):
    return [line for line in FileIO.read(file_path, mode).strip().split("\n") if filter_fn(line)]

  # create or update a file (NOTE: can also be used to replace a file's original content)
  def write(file_path, new_content, mode = "w"):
    return __file_handler(file_path, mode, lambda wf: wf.write(new_content))

  # delete a file (if it exists)
  def delete(file_path):
    return os.remove() if os.path.isfile(file_path) else None

然后你会使用FileIO.lines函数，如下所示：

file_ext_lines = FileIO.lines("./path/to/file.ext"):
for i, line in enumerate(file_ext_lines):
  print("Line {}: {}".format(i + 1, line))

请记住，mode（默认为"r"）和filter_fn（默认情况下检查空行）参数是可选的。

你甚至可以删除read，write和delete方法，然后离开FileIO.lines，甚至将它变成一个名为read_lines的独立方法。

Answer 21

如果文档中还有空行我想在内容中读取并通过filter传递以防止空字符串元素

with open(myFile, "r") as f:
    excludeFileContent = list(filter(None, f.read().splitlines()))

Answer 22

这比必要的更明确，但做你想要的。

with open("file.txt", "r") as ins:
    array = []
    for line in ins:
        array.append(line)

Answer 23

这将从文件中生成一行“数组”。

lines = tuple(open(filename, 'r'))

Answer 24

如果你想要包括\n：

with open(fname) as f:
    content = f.readlines()

如果你不想要\n包括：

with open(fname) as f:
    content = f.read().splitlines()

Answer 25

您可以简单地执行以下操作，如下所示：

with open('/your/path/file') as f:
    my_lines = f.readlines()

请注意，此方法有两个缺点：

1）您将所有行存储在内存中。在一般情况下，这是一个非常糟糕的主意。该文件可能非常大，您可能会耗尽内存。即使它不大，也只是浪费内存。

2）当你阅读它们时，这不允许处理每一行。因此，如果您在此之后处理您的行，则效率不高（需要两次通过而不是一次）。

对于一般情况，更好的方法如下：

with open('/your/path/file') as f:
    for line in f:
        process(line)

您可以以任何方式定义过程函数。例如：

def process(line):
    if 'save the world' in line.lower():
         superman.save_the_world()

（Superman课程的实施留给你练习）。

这适用于任何文件大小，只需1遍即可浏览您的文件。这通常是通用解析器的工作方式。

Answer 26

根据Methods of File Objects，将文本文件转换为list的最简单方法是：

with open('file.txt') as f:
    my_list = list(f)

将文本文件读取到列表的其他方法：

使用with和readlines()：

with open('file.txt') as fp:
    lines = fp.readlines()

如果你不关心关闭文件，这个单线程工作：

lines = open('file.txt').readlines()

传统方式：

fp = open('file.txt') # Open file on read mode
lines = fp.read().split("\n") # Create a list containing all lines
fp.close() # Close file

Answer 27

数据到列表中

假设我们有一个包含我们数据的文本文件，如下所示：

Text file content:

line 1
line 2
line 3

在同一目录中打开cmd（右键单击鼠标并选择cmd或PowerShell）
运行python并在解释器中写道：

The Python script

>>> with open("myfile.txt", encoding="utf-8") as file:
...     x = [l.strip() for l in file]
>>> x
['line 1','line 2','line 3']

Using append

x = []
with open("myfile.txt") as file:
    for l in file:
        x.append(l.strip())

Or...

>>> x = open("myfile.txt").read().splitlines()
>>> x
['line 1', 'line 2', 'line 3']

Or...

>>> x = open("myfile.txt").readlines()
>>> x
['linea 1\n', 'line 2\n', 'line 3\n']

Or...

>>> y = [x.rstrip() for x in open("my_file.txt")]
>>> y
['line 1','line 2','line 3']


with open('testodiprova.txt', 'r', encoding='utf-8') as file:
    file = file.read().splitlines()
  print(file)

with open('testodiprova.txt', 'r', encoding='utf-8') as file:
  file = file.readlines()
  print(file)

Answer 28

将文件行读入列表的清洁和pythonic方式

首先，您应该专注于打开文件并以高效和pythonic的方式阅读其内容。以下是我个人不喜欢的方式示例：

infile = open('my_file.txt', 'r')  # Open the file for reading.

data = infile.read()  # Read the contents of the file.

infile.close()  # Close the file since we're done using it.

相反，我更喜欢以下打开文件进行读取和写入的方法，因为它非常干净，并且一旦完成使用它就不需要额外的步骤来关闭文件。在下面的语句中，我们打开文件进行读取，并将其分配给变量'infile'。一旦此语句中的代码运行完毕，该文件将自动关闭。

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

现在我们需要专注于将这些数据放入Python列表中，因为它们是可迭代的，高效的和灵活的。在您的情况下，期望的目标是将文本文件的每一行放入单独的元素中。为此，我们将使用splitlines（）方法，如下所示：

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

最终产品：

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

测试我们的代码：

文本文件的内容：

     A fost odatã ca-n povesti,
     A fost ca niciodatã,
     Din rude mãri împãrãtesti,
     O prea frumoasã fatã.

用于测试目的的打印语句：

    print my_list  # Print the list.

    # Print each line in the list.
    for line in my_list:
        print line

    # Print the fourth element in this list.
    print my_list[3]

输出（由于unicode字符而看起来不同）：

     ['A fost odat\xc3\xa3 ca-n povesti,', 'A fost ca niciodat\xc3\xa3,',
     'Din rude m\xc3\xa3ri \xc3\xaemp\xc3\xa3r\xc3\xa3testi,', 'O prea
     frumoas\xc3\xa3 fat\xc3\xa3.']

     A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri
     împãrãtesti, O prea frumoasã fatã.

     O prea frumoasã fatã.

如何逐行读取文件到列表中？

问题描述投票：2029回答：28

28个回答

打开文件

读取文件

将内容存储为列表

摘要

Read and write text files with Python 2 and Python 3; it works with Unicode

常见文件结尾

更高级的文件写入/读取

大纲和摘要

在Python中，如何逐行读取文件？

在Python中，如何逐行读取文件到列表中？

结论

Command line version

Run with:

Text file content:

The Python script

Using append

Or...

Or...

Or...

最新问题

如何逐行读取文件到列表中？

问题描述 投票：2029回答：28

28个回答

打开文件

读取文件

将内容存储为列表

摘要

Read and write text files with Python 2 and Python 3; it works with Unicode

常见文件结尾

更高级的文件写入/读取

大纲和摘要

在Python中，如何逐行读取文件？

在Python中，如何逐行读取文件到列表中？

结论

Command line version

Run with:

Text file content:

The Python script

Using append

Or...

Or...

Or...

最新问题

问题描述投票：2029回答：28