使用Python提取Outlook电子邮件数据时出错

问题描述 投票:0回答:1

我有一个Python脚本,它使用os.walkwin32com.client从我的C:/驱动器上的文件夹及其子文件夹中提取Outlook电子邮件文件(.msg)中的信息。它似乎工作,但当我尝试对返回的数据帧做任何事情(如emailData.head() Python崩溃)。由于权限错误,我也无法将数据帧写入.csv。

我想知道我的代码是否没有正确关闭outlook /每条消息,这是导致问题的原因?任何帮助,将不胜感激。

import os
import win32com.client
import pandas as pd

# initialize Outlook client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")

# set input directory (where the emails are) and output directory (where you
# would like the email data saved)
inputDir = 'C:/Users/.../myFolderPath'
outputDir = 'C:/Users/.../myOutputPath'


def emailDataCollection(inputDir,outputDir):
    """ This function loops through an input directory to find
    all '.msg' email files in all folders and subfolders in the
    directory, extracting information from the email into lists,
    then converting the lists to a Pandas dataframe before exporting
    to a '.csv' file in the output directory
    """
    # Initialize lists
    msg_Path = []
    msg_SenderName = []
    msg_SenderEmailAddress = []
    msg_SentOn = []
    msg_To = []
    msg_CC = []
    msg_BCC = []
    msg_Subject = []
    msg_Body = []
    msg_AttachmentCount = []

    # Loop through the directory
    for root, dirnames, filenames in os.walk(inputDir):
        for filename in filenames:
            if filename.endswith('.msg'): # check to see if the file is an email
                filepath = os.path.join(root,filename) # save the full filepath
                # Extract email data into lists
                msg = outlook.OpenSharedItem(filepath)
                msg_Path.append(filepath)
                msg_SenderName.append(msg.SenderName)
                msg_SenderEmailAddress.append(msg.SenderEmailAddress)
                msg_SentOn.append(msg.SentOn)
                msg_To.append(msg.To)
                msg_CC.append(msg.CC)
                msg_BCC.append(msg.BCC)
                msg_Subject.append(msg.Subject)
                msg_Body.append(msg.Body)
                msg_AttachmentCount.append(msg.Attachments.Count)
                del msg

    # Convert lists to Pandas dataframe
    emailData = pd.DataFrame({'Path' : msg_Path,
                          'SenderName' : msg_SenderName,
                          'SenderEmailAddress' : msg_SenderEmailAddress,
                          'SentOn' : msg_SentOn,
                          'To' : msg_To,
                          'CC' : msg_CC,
                          'BCC' : msg_BCC,
                          'Subject' : msg_Subject,
                          'Body' : msg_Body,
                          'AttachmentCount' : msg_AttachmentCount
    }, columns=['Path','SenderName','SenderEmailAddress','SentOn','To','CC',
            'BCC','Subject','Body','AttachmentCount'])


    return(emailData)


# Call the function
emailData = emailDataCollection(inputDir,outputDir)

# Causes Python to crash
emailData.head()
# Fails due to permission error
emailData.to_csv(outputDir,header=True,index=False)
python pandas email outlook win32com
1个回答
1
投票

希望这不是太晚,但我设法找出问题的根源:

由于来自msg_SentOn的日期时间数据,内核崩溃了。如果检查ms​​g_SentOn中数据的type(),则将其归类为pywintype.datetime,它与pandas不兼容。

您需要将msg_SentOn中的元素转换为datetime.datetime格式。

这里的来源是有用的:http://timgolden.me.uk/python/win32_how_do_i/use-a-pytime-value.html

© www.soinside.com 2019 - 2024. All rights reserved.