我正在尝试从多个 Excel 文件中提取电子邮件并将它们附加到 CSV 文件中。该程序在过去几天一直在运行。但现在它不会在输出文件夹中创建 CSV 文件。我什至尝试手动制作一个,但一旦我运行代码,它就会删除手动创建的 CSV 文件。
这是我的程序
import os
import re
import pandas as pd
# Regular expression pattern to match email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'
# Function to extract email addresses from a string
def extract_emails(text):
return re.findall(email_pattern, text)
# Function to extract emails from an Excel file
def extract_emails_from_excel(file_path):
email_list = []
try:
df = pd.read_excel(file_path)
for column in df.columns:
for cell in df[column]:
if isinstance(cell, str):
emails = extract_emails(cell)
email_list.extend(emails)
except Exception as e:
print(f"Error processing {file_path}: {e}")
return email_list
# Specify the folder containing Excel files
folder_path = r'E:\1DW\Excel'
# Specify the path for the output CSV file
output_csv_file = r'C:\Users\HAL-2023\Desktop\Py_out\output_emails.csv'
# Ensure the output CSV file is empty or create it if it doesn't exist
if os.path.exists(output_csv_file):
os.remove(output_csv_file)
# Loop through Excel files in the folder
for filename in os.listdir(folder_path):
if filename.endswith('.xlsx') or filename.endswith('.xls'):
input_file_path = os.path.join(folder_path, filename)
email_addresses = extract_emails_from_excel(input_file_path)
# Append extracted email addresses to the CSV file
if email_addresses:
df = pd.DataFrame({'Email Addresses': email_addresses})
df.to_csv(output_csv_file, mode='a', index=False, header=False)
print(f"Extracted email addresses written to {output_csv_file}")
结果
C:\Users\HAL-2023\Desktop\Python\venv\Scripts\python.exe C:\Users\HAL-2023\Desktop\Python\email_from_excel.py
C:\Users\HAL-2023\Desktop\Python\venv\lib\site-packages\openpyxl\styles\stylesheet.py:226: UserWarning: Workbook contains no default style, apply openpyxl's default
warn("Workbook contains no default style, apply openpyxl's default")
Extracted email addresses written to C:\Users\HAL-2023\Desktop\Py_out\output_emails.csv
Process finished with exit code 0
但是该文件夹中没有名为“output_emails.csv”的文件。
您的问题似乎是删除脚本开头的
output_emails.csv
文件然后尝试附加到它的方法。当您迭代多个 Excel 文件并尝试将电子邮件写入 CSV 文件时,如果其中一个 Excel 文件没有任何电子邮件,则意味着您最终将删除 CSV 并且不会重新创建它,这可能会导致到所描述的行为。
因此,您不应删除 CSV 文件,而应检查它是否存在并追加到其中。如果不存在,请使用标头创建它。 您不需要每次想要附加到 CSV 文件时都创建一个新的 DataFrame。相反,您可以使用 CSV 模块附加数据。
import os
import re
import pandas as pd
import csv
# Regular expression pattern to match email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'
# Function to extract email addresses from a string
def extract_emails(text):
return re.findall(email_pattern, text)
# Function to extract emails from an Excel file
def extract_emails_from_excel(file_path):
email_list = []
try:
df = pd.read_excel(file_path)
for column in df.columns:
for cell in df[column]:
if isinstance(cell, str):
emails = extract_emails(cell)
email_list.extend(emails)
except Exception as e:
print(f"Error processing {file_path}: {e}")
return email_list
# Specify the folder containing Excel files
folder_path = r'E:\1DW\Excel'
# Specify the path for the output CSV file
output_csv_file = r'C:\Users\HAL-2023\Desktop\Py_out\output_emails.csv'
# Check if the CSV file exists, if not create it with a header
if not os.path.exists(output_csv_file):
with open(output_csv_file, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Email Addresses'])
# Loop through Excel files in the folder
for filename in os.listdir(folder_path):
if filename.endswith('.xlsx') or filename.endswith('.xls'):
input_file_path = os.path.join(folder_path, filename)
email_addresses = extract_emails_from_excel(input_file_path)
# Append extracted email addresses to the CSV file
if email_addresses:
with open(output_csv_file, 'a', newline='') as f:
writer = csv.writer(f)
writer.writerows([[email] for email in email_addresses])
print(f"Extracted email addresses written to {output_csv_file}")