Python会返回文件中的标签过多或不足的行

问题描述 投票:-1回答:2

我需要一些可以执行以下操作的python:

  • 在制表符分隔的文本文件中查找所有大于或小于X个制表符的行。
  • 打印这些行(当然,每一行都是自己的行)

例如,“ the_file.txt”具有以下内容:

Field1[TAB]Field2[TAB]Field3[TAB]Field4[TAB]Field5
Field1[TAB]Field2[TAB]Field3
Field1[TAB]Field2[TAB]Field3[TAB]Field4
Field1[TAB]Field2[TAB]Field3[TAB]Field4[TAB]Field5

Pseudopython:

Read the_file.txt
Find all rows that do not have 4 tabs
Print the entire content of those rows

返回:

Field1[TAB]Field2[TAB]Field3
Field1[TAB]Field2[TAB]Field3[TAB]Field4

要考虑的一件事是,我要针对其运行Python的文件通常非常大,总是1000+行,经常10,000+行,有时是100,000+行。

谢谢!

python tabs delimited-text
2个回答
0
投票

只需这样做:

df=pd.read_csv('the_file.txt',sep='\t')
df.columns=['Col1','Col2','Col3','Col4','Col5']
nans = lambda df: df[df.isnull().any(axis=1)]

print(nans(df))

输出:

    Col1    Col2    Col3    Col4    Col5
0   Field1  Field2  Field3  NaN     NaN
1   Field1  Field2  Field3  Field4  NaN

0
投票

你去]

number_not_tabs = 4

with open('the_file.txt') as f:
    content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
for x in content:
    if x.count("\t") != number_not_tabs:
        print(x)
© www.soinside.com 2019 - 2024. All rights reserved.