我有一个带有一个文本列的数据框。我必须将匹配值的键作为新列。在下面提供的代码中,我只得到一个键,而无需给出第二个键就可以转到下一行。请在下面查看我尝试过的示例代码。任何帮助,将不胜感激。
Dict_new = { 'key1': ['orange', 'yellow', 'blue'], 'key2': ['red', 'saffron',
'purple'], 'key3': ['white', 'grey', 'black']}
column of the data frame :
white beard and purple hairs.
orange coloured car with black tilted windows.
eyes are red and grey hair.
我的输出为:
new_code:
key3,
key1,
key2.
我仅获得第一把钥匙,无法进一步获取第二把钥匙。
这是我尝试过的代码。
def new_code(x):
for keys, values in dict_new.items():
for value in values:
if value in x:
return keys
df2['new_code'] = df1['column'].apply(new_code)
我期望什么作为输出:
new_code:
key3 key2,
key1 key3,
key2 key3.
任何帮助将不胜感激。
尝试一下:
一个警告-您必须只将文本除以space
,否则,在执行任何操作之前,要么将标点符号全部消除(我在这里使用replace
,因为您在示例中只加了点) ,或者您使用re.split()
。
import pandas as pd
_data={'txt': ["white beard and purple hairs.", "orange coloured car with black tilted windows.","eyes are red and grey hair."]}
df=pd.DataFrame(data=_data)
Dict_new = { 'key1': ['orange', 'yellow', 'blue'], 'key2': ['red', 'saffron',
'purple'], 'key3': ['white', 'grey', 'black']}
df['new_code']=df['txt'].apply(lambda x: ' '.join([k for k in Dict_new.keys() if len(set(x.replace('.', '').split() ).intersection(set(Dict_new[k])) )>0 ]))
print(df)
输出:
txt new_code0 white beard and purple hairs. key2 key31 orange coloured car with black tilted windows. key1 key32 eyes are red and grey hair. key2 key3