输出中未创建 csv 文件

问题描述 投票:0回答:1

我正在尝试从多个 Excel 文件中提取电子邮件并将它们附加到 CSV 文件中。该程序在过去几天一直在运行。但现在它不会在输出文件夹中创建 CSV 文件。我什至尝试手动制作一个,但一旦我运行代码,它就会删除手动创建的 CSV 文件。

这是我的程序

import os
import re
import pandas as pd

# Regular expression pattern to match email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'


# Function to extract email addresses from a string
def extract_emails(text):
    return re.findall(email_pattern, text)


# Function to extract emails from an Excel file
def extract_emails_from_excel(file_path):
    email_list = []
    try:
        df = pd.read_excel(file_path)
        for column in df.columns:
            for cell in df[column]:
                if isinstance(cell, str):
                    emails = extract_emails(cell)
                    email_list.extend(emails)
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
    return email_list


# Specify the folder containing Excel files
folder_path = r'E:\1DW\Excel'

# Specify the path for the output CSV file
output_csv_file = r'C:\Users\HAL-2023\Desktop\Py_out\output_emails.csv'
# Ensure the output CSV file is empty or create it if it doesn't exist
if os.path.exists(output_csv_file):
    os.remove(output_csv_file)

# Loop through Excel files in the folder
for filename in os.listdir(folder_path):
    if filename.endswith('.xlsx') or filename.endswith('.xls'):
        input_file_path = os.path.join(folder_path, filename)
        email_addresses = extract_emails_from_excel(input_file_path)

        # Append extracted email addresses to the CSV file
        if email_addresses:
            df = pd.DataFrame({'Email Addresses': email_addresses})
            df.to_csv(output_csv_file, mode='a', index=False, header=False)

print(f"Extracted email addresses written to {output_csv_file}")

结果

C:\Users\HAL-2023\Desktop\Python\venv\Scripts\python.exe C:\Users\HAL-2023\Desktop\Python\email_from_excel.py 
C:\Users\HAL-2023\Desktop\Python\venv\lib\site-packages\openpyxl\styles\stylesheet.py:226: UserWarning: Workbook contains no default style, apply openpyxl's default
  warn("Workbook contains no default style, apply openpyxl's default")
Extracted email addresses written to C:\Users\HAL-2023\Desktop\Py_out\output_emails.csv

Process finished with exit code 0

但是该文件夹中没有名为“output_emails.csv”的文件。

python pandas export-to-csv
1个回答
0
投票

您的问题似乎是删除脚本开头的

output_emails.csv
文件然后尝试附加到它的方法。当您迭代多个 Excel 文件并尝试将电子邮件写入 CSV 文件时,如果其中一个 Excel 文件没有任何电子邮件,则意味着您最终将删除 CSV 并且不会重新创建它,这可能会导致到所描述的行为。

因此,您不应删除 CSV 文件,而应检查它是否存在并追加到其中。如果不存在,请使用标头创建它。 您不需要每次想要附加到 CSV 文件时都创建一个新的 DataFrame。相反,您可以使用 CSV 模块附加数据。

import os
import re
import pandas as pd
import csv

# Regular expression pattern to match email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'


# Function to extract email addresses from a string
def extract_emails(text):
    return re.findall(email_pattern, text)


# Function to extract emails from an Excel file
def extract_emails_from_excel(file_path):
    email_list = []
    try:
        df = pd.read_excel(file_path)
        for column in df.columns:
            for cell in df[column]:
                if isinstance(cell, str):
                    emails = extract_emails(cell)
                    email_list.extend(emails)
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
    return email_list


# Specify the folder containing Excel files
folder_path = r'E:\1DW\Excel'

# Specify the path for the output CSV file
output_csv_file = r'C:\Users\HAL-2023\Desktop\Py_out\output_emails.csv'

# Check if the CSV file exists, if not create it with a header
if not os.path.exists(output_csv_file):
    with open(output_csv_file, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(['Email Addresses'])

# Loop through Excel files in the folder
for filename in os.listdir(folder_path):
    if filename.endswith('.xlsx') or filename.endswith('.xls'):
        input_file_path = os.path.join(folder_path, filename)
        email_addresses = extract_emails_from_excel(input_file_path)

        # Append extracted email addresses to the CSV file
        if email_addresses:
            with open(output_csv_file, 'a', newline='') as f:
                writer = csv.writer(f)
                writer.writerows([[email] for email in email_addresses])

print(f"Extracted email addresses written to {output_csv_file}")
© www.soinside.com 2019 - 2024. All rights reserved.