我在Excel中有一个数据集,我想复制它。
我的python代码看起来像。
data_frames = [df_mainstore, df_store_A, df_store_B]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=["Id_number"], how='outer'), data_frames)
print(df_merged)
由于我合并了几个数据框(可以是不同的列数和名称) 写出所有的列数是很繁琐的,而这是在这段代码中完成的。例子:
isY = lambda x:int(x=='Y')
countEmail= lambda row: isY(row['Store Contact A']) + isY(row['Store B Contact'])
df['Contact Email'] = df.apply(countEmail,axis=1)
我也在为这个表达方式而苦恼。isY = lambda x:int(x=='@')
如何以类似于在Excel中的方式添加 "联系人有电子邮件 "一栏?
你可以使用 filter
来选择含有联系人的列,然后使用 str.contains
得体 邮件地址格式 最后你要 any
每行如此。
#data sample
df_merged = pd.DataFrame({'id': [0,1,2,3],
'Store A': list('abcd'),
'Store Contact A':['[email protected]', '', 'e', 'f'],
'Store B': list('ghij'),
'Store B Contact':['[email protected]', '', '[email protected]', '']})
# define the pattern as in the link
pat = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
# create the column as wanted
df_merged['Contact has Email'] = df_merged.filter(like='Contact')\
.apply(lambda x: x.str.contains(pat))\
.any(1)
print (df_merged)
id Store A Store Contact A Store B Store B Contact Contact has Email
0 0 a [email protected] g [email protected] True
1 1 b h False
2 2 c e i [email protected] True
3 3 d f j False
你可以用 pandas.Series.str.包含
df_merged['Contact has Email'] = df_merged['Store Contact A'].str.contains('@', na=False)|df_merged['Store B Contact'].str.contains('@', na=False)