我有两个 2d 列表,超过 200 行。我想在每个子列表中找到 df1 和 df2 中的匹配项,并且仅当 df1、df2 中的值匹配时才返回不匹配和匹配值。
这是一个简短的例子
df1 = [[1, 7, 3, 5], [5, 5, 14, 10]]
df2 = [[1, 17, 3, 5], [34, 14, 74], [34, 3, 87], [25, 14, 10]]
想要的结果
no_match = [[17],[34,87],[34,74],[25]]
match = [[1,3,5],[3],[14],[14,10]]
试试这个:
df1 = [[1, 7, 3, 5], [5, 5, 14, 10]]
df2 = [[1, 17, 3, 5], [34, 14, 74], [34, 3, 87], [25, 14, 10]]
no_match = []
match = []
for sublist in df2:
sublist_match = []
sublist_no_match = []
for item in sublist:
for sublist_df1 in df1:
if item in sublist_df1:
sublist_match.append(item)
break
else:
sublist_no_match.append(item)
no_match.append(sublist_no_match)
match.append(sublist_match)
print("no_match =", no_match)
print("match =", match)
输出:
no_match = [[17], [34, 74], [34, 87], [25]]
match = [[1, 3, 5], [14], [3], [14, 10]]
您可以简单地展平并从
df1
中删除重复项,然后迭代 df2
确定当前项目是否位于展平的 df1
中。
# data
df1 = [[1, 7, 3, 5], [5, 5, 14, 10]]
df2 = [[1, 17, 3, 5], [34, 14, 74], [34, 3, 87], [25, 14, 10]]
# flatten df1 and remove duplicates
s1 = set([i for l in df1 for i in l])
# init containers
match = []
no_match = []
for data in df2:
# prime containers for this iteration of data
match.append([])
no_match.append([])
# append item to the proper container
for item in data:
(no_match, match)[item in s1][-1].append(item)
# print results
print('match:', match)
print('no match:', no_match)