我需要一些可以执行以下操作的python:
例如,“ the_file.txt”具有以下内容:
Field1[TAB]Field2[TAB]Field3[TAB]Field4[TAB]Field5
Field1[TAB]Field2[TAB]Field3
Field1[TAB]Field2[TAB]Field3[TAB]Field4
Field1[TAB]Field2[TAB]Field3[TAB]Field4[TAB]Field5
Pseudopython:
Read the_file.txt
Find all rows that do not have 4 tabs
Print the entire content of those rows
返回:
Field1[TAB]Field2[TAB]Field3
Field1[TAB]Field2[TAB]Field3[TAB]Field4
要考虑的一件事是,我要针对其运行Python的文件通常非常大,总是1000+行,经常10,000+行,有时是100,000+行。
谢谢!
只需这样做:
df=pd.read_csv('the_file.txt',sep='\t')
df.columns=['Col1','Col2','Col3','Col4','Col5']
nans = lambda df: df[df.isnull().any(axis=1)]
print(nans(df))
输出:
Col1 Col2 Col3 Col4 Col5
0 Field1 Field2 Field3 NaN NaN
1 Field1 Field2 Field3 Field4 NaN
你去]
number_not_tabs = 4
with open('the_file.txt') as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
for x in content:
if x.count("\t") != number_not_tabs:
print(x)