我有2个文本文件。 file1有6列2行,但是file2有2列5行,如下例所示:
文件1:
Code S1 S2 S3 S4 S5
X2019060656_12 4.068522 1889.299282 1547.771971 434.392935 4346.019078
X2019060657_05 1.318325 1290.142988 285.579601 73.329331 2222.198520
file2:
Class group
X2019060656_12 A
X2019060657_05 A
X2019060658_04 A
X2019060659_03 A
X2019060660_08 A
我想制作file2的一个子集,但仅是其中“类”列与file1中的“代码”列相似的行。这是预期的输出:
预期输出:
Class group
X2019060656_12 A
X2019060657_05 A
为此,我在python中编写了以下代码:
file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
file1 = {}
keys1 = []
values1 = []
with open("file1.txt") as file1:
for line in file1.lines():
keys1.append(line[0])
values1.append(line[1])
dict_file1 = dict(zip(keys1, values1))
file2 = {}
keys2 = []
values2 = []
with open("file2.txt") as file2:
for line in file2.lines():
keys2.append(line[0])
values2.append(line[1])
dict_file2 = dict(zip(keys2, values2))
newlist = []
for item in dict_file1:
for item2 in dict_file2:
if item1 == item2:
new_list.append(line)
with open('new_file.txt', 'w') as f:
for i in new_list:
f.write("%s\n" % i)
但是输出文件不在预期输出行中。你知道如何解决吗?
您可以使用pandas
这样操作:
import pandas as pd
df1 = pd.read_csv("file1.txt",delim_whitespace=True)
df2 = pd.read_csv("file2.txt",delim_whitespace=True)
df2[df2['Class'].isin(df1['Code'])]
输出:
Class group
0 X2019060656_12 A
1 X2019060657_05 A
如果要导出到文件,请使用df2.to_csv