我已经导入了从LinkedIn导出的关于我的联系的电子表格,并希望将人们在不同级别上的职位进行分类。
因此,我创建了一个词典,其中包含用于查找每个职位级别的术语。
字典的第一个版本是:
dicpositions = {'0 - CEO, Founder': ['CEO', 'Founder', 'Co-Founder', 'Cofounder', 'Owner'], '1 - Director of': ['Director', 'Head'], '2 - Manager': ['Manager', 'Administrador'], '3 - Engenheiro': ['Engenheiro', 'Engineering'], '4 - Consultor': ['Consultor', 'Consultant'], '5 - Estagiário': ['Estagiário', 'Intern'], '6 - Desempregado': ['Self-Employed', 'Autônomo'], '7 - Professor': ['Professor', 'Researcher'] }
而且我需要一个代码来读取电子表格中的每个位置,检查是否有这些术语并在另一特定列中返回等效键。
我正在读取的数据帧的示例数据将是:
sample = pd.Series(data = (['(blank)'], ['Estagiário'], ['Professor', 'Adjunto'], ['CEO', 'and', 'Founder'], ['Engenheiro', 'de', 'Produção'], ['Consultant'], ['Founder', 'and', 'CTO'], ['Intern'], ['Manager', 'Specialist'], ['Administrador', 'de', 'Novos', 'Negócios'], ['Administrador', 'de', 'Serviços']))
哪个返回:
0 [(blank)] 1 [Estagiário] 2 [Professor, Adjunto] 3 [CEO, and, Founder] 4 [Engenheiro, de, Produção] 5 [Consultant] 6 [Founder, and, CTO] 7 [Intern] 8 [Manager, Specialist] 9 [Administrador, de, Novos, Negócios] 10 [Administrador, de, Serviços] dtype: object
我完成了以下代码:
import pandas as pd plan = pd.read_excel('SpreadSheet Name.xlsx', sheet_name = 'Positions') list0 = ['CEO', 'Founder', 'Co-Founder', 'Cofounder', 'Owner'] list1 = ['Director', 'Head'] list2 = ['Manager', 'Administrador'] listgeral = [dic0, dic1, dic2] def in_list(list_to_search,terms_to_search): results = [item for item in list_to_search if item in terms_to_search] if len(results) > 0: return '0 - CEO, Founder' else: pass plan['PositionLevel'] = plan['Position'].str.split().apply(lambda x: in_list(x, listgeral[0]))
实际输出:
Position PositionLevel 0 '(blank)' None 1 'Estagiário' None 2 'Professor Adjunto' None 3 'CEO and Founder' '0 - CEO, Founder' 4 'Engenheiro de produção' None 5 'Consultant' None 6 'Founder and CTO' '0 - CEO, Founder' 7 'Intern' None 8 'Manager Specialist' None 9 'Administrador de Novos Negócios' None
预期输出:
Position PositionLevel 0 '(blank)' None 1 'Estagiário' '5 - Estagiário' 2 'Professor Adjunto' '7 - Professor' 3 'CEO and Founder' '0 - CEO, Founder' 4 'Engenheiro de produção' '3 - Engenheiro' 5 'Consultant' '4 - Consultor' 6 'Founder and CTO' '0 - CEO, Founder' 7 'Intern' '5 - Estagiário' 8 'Manager Specialist' '2 - Manager' 9 'Administrador de Novos Negócios' '2 - Manager'
[首先,我打算为我的
listgeral
中的每个列表运行该代码,但我确实没有这样做。然后,我开始相信最好将它用于大型词典,就像从问题开头的dicpositions
并返回术语的键一样。
我尝试将以下代码应用于该程序:
dictest = {'0 - CEO, Founder': ['CEO', 'Founder', 'Co-Founder', 'Cofounder', 'Owner'], '1 - Director of': ['Director', 'Head'], '2 - Manager': ['Manager', 'Administrador']} def in_dic (x, dictest): for key in dictest: for elem in dictest[key]: if elem == x: return key return False
in_dic('CEO', dictest)
的输出是'0 - CEO, Founder'
例如,in_dic('Banana', dictest)
的输出为False
但是我无法从它开始并将此功能in_dic()
应用到我的问题。
我真的很感谢任何人的帮助。
非常感谢。
我已经导入了从LinkedIn导出的关于我的联系的电子表格,并希望将人们在不同级别上的职位进行分类。因此,我创建了一个字典,其中包含用于查找每个词的术语...