我正在使用三个不同的工作表操作一个 Excel 文件: 理想情况下,虚拟列必须默认填充零,除非学生的名字是“Roberto”或“Leonardo”,在这种情况下,虚拟列值应保留为空白。
{'Sheet_1': ID Name Surname Grade favourite color favourite sport Dummy
0 104 Eleanor Rigby 6 blue American football NaN
1 168 Barbara Ann 8 pink Hockey 0.0
2 450 Polly Cracker 7 black Skateboarding NaN
3 90 Little Josy 10 orange Cycling NaN,
'Sheet_2': ID Name Surname Grade favourite color favourite sport Dummy
0 106 Lucy Sky 8 yellow Tennis NaN
1 128 Delilah Perez 5 light green Basketball 0.0
2 100 Christina Rodwell 3 black Badminton NaN
3 40 Ziggy Stardust 7 red Squash NaN,
'Sheet_3': ID Name Surname Grade favourite color favourite sport Dummy
0 22 Lucy Diamonds 9 brown Judo NaN
1 50 Grace Kelly 7 white Taekwondo NaN
2 105 Uma Thurman 7 purple videogames NaN
3 29 Lola McQueen 3 red Surf 0.0}
{'Sheet_1': ID Name Surname Grade favourite color favourite sport Dummy
0 104 Eleanor Rigby 6 blue American football NaN
1 168 Barbara Ann 8 pink Hockey NaN
2 450 Polly Cracker 7 black Skateboarding NaN
3 90 Little Josy 10 orange Cycling NaN,
'Sheet_2': ID Name Surname Grade favourite color favourite sport Dummy
0 106 Lucy Sky 8 yellow Tennis 0
1 128 Delilah Perez 5 light green Basketball 0
2 100 Christina Rodwell 3 black Badminton 0
3 40 Ziggy Stardust 7 red Squash 0,
'Sheet_3': ID Name Surname Grade favourite color favourite sport Dummy
0 22 Lucy Diamonds 9 brown Judo NaN
1 50 Grace Kelly 7 white Taekwondo NaN
2 105 Uma Thurman 7 purple videogames NaN
3 29 Lola McQueen 3 red Surf NaN}
感谢任何帮助。我发现很难弄清楚如何使程序考虑每张纸中写入的信息,因此,如果发现“罗伯托”或“莱昂纳多”在其中,则应用循环,条件是让该列为空白B2.下面的代码只是在所有工作表中用零填充虚拟列(这不是我预期的输出):
# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
import openpyxl
from openpyxl import Workbook, load_workbook
# Defining the file path
file_path = r'C:/Users/machukovich/stack_2.xlsx'
# Load workbook as openpyxl
reference_workbook = openpyxl.load_workbook(file_path)
wb = load_workbook(file_path)
# We will mantain the workbook open
wb = wb.active
# Loading the file into a dictionary of Dataframes
dict_of_df = pd.read_excel(file_path, sheet_name=None, skiprows=2)
# Reading up the B2 cell for later use:
student_name = wb['B2'].value
# Writting the loop itself (it fills all the 'Dummy' columns with zeros in all sheets):
for sheet_name, df in dict_of_df.items():
df['Dummy'] = df['Dummy'].fillna(0)
编辑:我已将代码修改为可复制,并考虑到单元格 B2 未出现在上述 DF 中。这就是为什么我跳过每个 Excel 工作表中的前两行。 Leonardo 和 Roberto 分别出现在 Sheet_1 和 Sheet_3 的 B2 中。
试试这个:
import pandas as pd
import openpyxl
# Defining the file path
file_path = r'C:/Users/machukovich/stack_2.xlsx'
# Load workbook as openpyxl
wb = openpyxl.load_workbook(file_path)
# Loading the file into a dictionary of DataFrames
dict_of_df = pd.read_excel(file_path, sheet_name=None, skiprows=2)
# Writting the loop itself
for sheet_name, df in dict_of_df.items():
# Read the student's name from cell B2 of the current sheet
student_name = wb[sheet_name]['B2'].value
# If the student's name is 'Roberto' or 'Leonardo', keep 'Dummy' column NaN, else fill with 0
df['Dummy'] = df['Dummy'].apply(lambda x: x if student_name in ['Roberto', 'Leonardo'] else 0)
# Save the modified DataFrame back to the Excel file
with pd.ExcelWriter(file_path) as writer:
for sheet_name, df in dict_of_df.items():
df.to_excel(writer, sheet_name=sheet_name, index=False)