我目前正在尝试弄清楚如何解析我存储在特定文件夹中的所有msg文件,然后将正文文本保存到数据帧,但是当我尝试提取emaill的正文时,它也在提取附加到它的电子邮件。我只想提取味精文件中存在的第一封电子邮件的正文。
#src-code:https://stackoverflow.com/questions/52608069/parsing-multiple-msg-files-and-storing-the-body-text-in-a-csv-file
#reading multiple .msg files using python
from pathlib import Path
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
# Assuming \Documents\Email Reader is the directory containg files
for p in Path(r'C:\Users\XY\Documents\Email Reader').iterdir():
if p.is_file() and p.suffix == '.msg':
msg = outlook.OpenSharedItem(p)
print(msg.Body)
我有类似的要求。完整代码在这里:https://medium.com/@theamazingexposure/accessing-shared-mailbox-using-exchangelib-python-f020e71a96ab
出于您的目的,我认为此摘要将起作用。它读取带有特定主题行的第一条消息:
from exchangelib import Credentials, Account, FileAttachment
credentials = Credentials('First_Name.Last_Name@some_domain.com', 'Your_Password_Here')
account = Account('First_Name.Last_Name@some_domain.com', credentials=credentials, autodiscover=True)
filtered_items = account.inbox.filter(subject__contains='Your Search String Here')
print("Getting latest email from Given Search String...")
for item in account.inbox.filter(subject__contains='Your Search String Here').order_by('-datetime_received')[:1]:
print(item.subject, item.text_body.encode('UTF-8'), item.sender, item.datetime_received) #body of email is extracted using:: item.text_body.encode('UTF-8')