确定,以下:
说我在DF中有两个来宾列表,我想知道哪些退出了,添加了哪些并获得了新的DF,我该怎么办?
两个DF是:
list1 = {'First Name': ['Adi', 'Beni', 'Dimi', 'Sergei'], 'Last Name': ['Beer', 'Wine', 'Liquor', 'Vodka'], 'Job': ['Brewer', 'Farmer', 'Shopowner', 'Guest']}
list2 = {'First Name': ['Adi', 'Beni', 'Sergei', 'Don'], 'Last Name': ['Beer', 'Wine', 'Vodka', 'Brown']}
如果NameX在list1和list2中,请给我一个新的df,并同时包含两个。
如果NameX在list1中,但在2中退出,请在新的df中显示我
并且:如果NameX仅在列表2中,请在另一个df中告诉我。
我该怎么做?
顺便说一句,'Job'条目只是在一个df中。目标是将其保留在新的dfs中。
非常感谢!
最简单的方法是将名字和姓氏放在一起,进行比较,然后再次分开。
这里是相应的代码:
FIRST_NAME = 'First Name'
LAST_NAME = 'Last Name'
JOB = 'Job'
KEYS = [FIRST_NAME, LAST_NAME, JOB]
def intersection(lst1, lst2):
return [value for value in lst1 if value in lst2]
def unzip(lst):
a = []
b = []
for x, y in lst:
a.append(x)
b.append(y)
return a, b
def get_jobs(dct, lst):
return [dct.get(x) for x in lst]
def get_dict(lst, dict0, keys=KEYS):
a, b = unzip(lst)
values = [a, b, get_jobs(dict0, lst)]
return dict(zip(keys, values))
list1 = {FIRST_NAME: ['Adi', 'Beni', 'Dimi', 'Sergei'], LAST_NAME: ['Beer', 'Wine', 'Liquor', 'Vodka'],
JOB: ['Brewer', 'Farmer', 'Shopowner', 'Guest']}
list2 = {FIRST_NAME: ['Adi', 'Beni', 'Sergei', 'Don'], LAST_NAME: ['Beer', 'Wine', 'Vodka', 'Brown']}
mapped1 = list(zip(list1.get(FIRST_NAME), list1.get(LAST_NAME)))
mapped2 = list(zip(list2.get(FIRST_NAME), list2.get(LAST_NAME)))
dict_jobs = dict(zip(mapped1, list1.get(JOB)))
intersec = intersection(mapped1, mapped2)
left = []
right = []
for element in set(mapped1) ^ set(mapped2):
left.append(element) if element in mapped1 else right.append(element)
dict_intersec = get_dict(intersec, dict_jobs)
dict_left = get_dict(left, dict_jobs)
dict_rigth = get_dict(right, dict_jobs)
输出:
{'First Name': ['Adi', 'Beni', 'Sergei'], 'Last Name': ['Beer', 'Wine', 'Vodka'], 'Job': ['Brewer', 'Farmer', 'Guest']}
{'First Name': ['Dimi'], 'Last Name': ['Liquor'], 'Job': ['Shopowner']}
{'First Name': ['Don'], 'Last Name': ['Brown'], 'Job': [None]}
在dict_jobs
中,合并的名称将映射到其作业。intersec
是包含两个原始列表中包含的名称的列表。left
包含仅出现在第一个列表中的名称,right
仅包含出现在第二个列表中的名称。
pandas.merge(...)
是您要寻找的:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
要一次获得所有选项,请执行:
df1=pd.DataFrame(list1)
df2=pd.DataFrame(list2)
df3=pd.merge(df1, df2, on=["First Name", "Last Name"], how="outer", indicator=True)
输出:
First Name Last Name Job _merge
0 Adi Beer Brewer both
1 Beni Wine Farmer both
2 Dimi Liquor Shopowner left_only
3 Sergei Vodka Guest both
4 Don Brown NaN right_only