我需要一个 pandas apply 函数来迭代子列表并返回为每个子列表找到的第一个值。
我有一个像这样的数据框:
data = {'project_names': [['Datalabor', 'test', 'tpuframework'], ['regETU', 'register', 'tpuframework'], [], ['gpuframework', 'cpuframework']]}
df = pd.DataFrame(data)
df
我有一个嵌套项目列表,其中包含如下子列表:
project_list_1 = [
['labor', 'DataLab', 'Anotherdatalabor'],
['reg', 'register'],
['gpu'],
['tpu']
]
project_list_1
最终输出应如下所示:
data = {'matches': [['labor', 'tpu'], ['reg', 'tpu'], [None], ['gpu']]}
final_df = pd.DataFrame(data)
final_df
我尝试过这样的事情:
df2['matches'] = df['project_names'].apply(lambda row: next((project for project in project_list_2 if any(project.lower() in word.lower() for word in row)), None))
df2
该方法仅适用于像这样的平面列表。为了收集找到的第一个元素,我使用
next()
而不是列表理解。
project_list_2 = ['labor', 'DataLab', 'register', 'gpu', 'reg', 'tpu']
我需要在project_list_1上运行该方法并获得上述所需的输出。
尝试:
project_list_1 = [
["labor", "DataLab", "Anotherdatalabor"],
["reg", "register"],
["gpu"],
["tpu"],
]
def fn(v, project_list):
out = []
print(v)
for project in project_list:
for p in project:
if any(w for w in v if (rv := p) in w):
out.append(rv)
break
return out or [None]
df["matches"] = df["project_names"].apply(fn, project_list=project_list_1)
print(df)
打印:
project_names matches
0 [Datalabor, test, tpuframework] [labor, tpu]
1 [regETU, register, tpuframework] [reg, tpu]
2 [] [None]
3 [gpuframework, cpuframework] [gpu]