例如,我有数据框:
df = pd.DataFrame({
"EmailAdd": ["[email protected]", "[email protected]"],
"Subject": ["Report submission", "Meeting update"]
})
我想遍历“ EmailAdd”的每个元素,并使用@分开,然后再添加2列,第一个包含“ pamelasilvera”的“ EmailAdd_root”,第二个包含“ gmail.com”的“ EmailAdd_ext”,我该怎么做?
我们用join
做str.split
df=df.join(df.EmailAdd.str.split('@',expand=True))
Out[138]:
EmailAdd Subject 0 1
0 [email protected] Report submission pamelasilvera gmail.com
1 [email protected] Meeting update indiejesse.d gmail.com
我们也可以将str.extract
与命名的正则表达式组一起使用:
df.join(df.EmailAdd.str.extract('^(?P<Email>[^@]+)@(?P<Domain>.+)'))
输出:
EmailAdd Subject Email Domain
0 [email protected] Report submission pamelasilvera gmail.com
1 [email protected] Meeting update indiejesse.d gmail.com