python比较两个数据框值并从第一个数据框中获得不同的值

问题描述 投票:0回答:3

我有两个如下所示的数据框

import pandas as pd
df1 = pd.DataFrame(
    {
        "Server": ["Server1", "Server1","Server1","Server1","Server1"],
        "FileName": [
            "2020-05-01T18:18:00Z/Server1/file1",
            "2020-05-01T18:18:13Z/Server1/file2",
            "2020-05-01T18:20:47Z/Server1/file3",
            "2020-05-01T18:21:46Z/Server1/file4",
            "2020-05-01T18:24:43Z/Server1/file5",
        ],
    }
)


df2 = pd.DataFrame(
    {
        "Server": ["Server1", "Server1","Server1","Server1","Server1"],
        "FileName": [
            "2020-05-01T18:18:00Z/Server1/file1",
            "2020-05-01T18:18:13Z/Server1/file2",
            "2020-05-01T18:20:47Z/Server1/file3",
            "2020-05-01T18:33:08Z/Server1/file6",
            "2020-05-01T18:33:11Z/Server1/file7",
        ],
    }
)

df1:

                             FileName   Server
0  2020-05-01T18:18:00Z/Server1/file1  Server1
1  2020-05-01T18:18:13Z/Server1/file2  Server1
2  2020-05-01T18:20:47Z/Server1/file3  Server1
3  2020-05-01T18:21:46Z/Server1/file4  Server1
4  2020-05-01T18:24:43Z/Server1/file5  Server1

df2:

                             FileName   Server
0  2020-05-01T18:18:00Z/Server1/file1  Server1
1  2020-05-01T18:18:13Z/Server1/file2  Server1
2  2020-05-01T18:20:47Z/Server1/file3  Server1
3  2020-05-01T18:33:08Z/Server1/file6  Server1
4  2020-05-01T18:33:11Z/Server1/file7  Server1

我想要来自df1的文件,这些文件不在df2中。列服务器在这里无关紧要。我想要下面的数据框

                             FileName   Server
0  2020-05-01T18:21:46Z/Server1/file4  Server1
1  2020-05-01T18:24:43Z/Server1/file5  Server1

我已经通过遍历每个值来实现这一点。有没有任何简便的方法可以做到这一点。

df = pd.DataFrame()
for index1, row1 in df1.iterrows():
    flag = 0
    for index2, row2 in df2.iterrows():
        if row1['FileName'] == row2['FileName']:
            flag = 1
    if flag == 0:
        df = df.append({'Server': row1['Server'], 'FileName': row1['FileName']}, ignore_index=True)
print df
python dataframe
3个回答
0
投票

我不确定这样做的效率如何,但是您可以使用这1个线性代码而不是使用循环来迭代数据帧。

result = pd.DataFrame(df1.merge(df2, how = 'outer' ,indicator=True).loc[lambda x : x['_merge']=='left_only'])
del result["_merge"] #You can keep this _merge column

print(result)

输出

    Server                            FileName
3  Server1  2020-05-01T18:21:46Z/Server1/file4
4  Server1  2020-05-01T18:24:43Z/Server1/file5

0
投票

这将起作用:

df1[df1['FileName'] != df2['FileName']].reset_index(drop=True)

0
投票

您可以使用isin方法

df1[~df1['FileName'].isin(df2['FileName'])]
© www.soinside.com 2019 - 2024. All rights reserved.