我打开了一个由三张纸组成的Excel文件作为
OrderedDict
。
dataframe
的所有行都重复三遍。numpy
?pandas
提出另一种解决方案吗?我原来的有序字典具有以下形状:
{'Sheet_1': ID Name Surname Grade
0 104 Eleanor Rigby 6
1 168 Barbara Ann 8
2 450 Polly Cracker 7
3 90 Little Joe 10,
'Sheet_2': ID Name Surname Grade
0 106 Lucy Sky 8
1 128 Delilah Gonzalez 5
2 100 Christina Rodwell 3
3 40 Ziggy Stardust 7,
'Sheet_3': ID Name Surname Grade
0 22 Lucy Diamonds 9
1 50 Grace Kelly 7
2 105 Uma Thurman 7
3 29 Lola King 3}
我想要的有序字典具有以下形状:
{'Sheet_1': ID Name Surname Grade
0 104 Eleanor Rigby 6
1 104 Eleanor Rigby 6
2 104 Eleanor Rigby 6
3 168 Barbara Ann 8
4 168 Barbara Ann 8
5 168 Barbara Ann 8
6 450 Polly Cracker 7
7 450 Polly Cracker 7
8 450 Polly Cracker 7
9 90 Little Joe 10
10 90 Little Joe 10
11 90 Little Joe 10 ,
'Sheet_2': ID Name Surname Grade \
0 106 Lucy Sky 8
1 106 Lucy Sky 8
2 106 Lucy Sky 8
3 128 Delilah Gonzalez 5
4 128 Delilah Gonzalez 5
5 128 Delilah Gonzalez 5
6 100 Christina Rodwell 3
7 100 Christina Rodwell 3
8 100 Christina Rodwell 3
9 40 Ziggy Stardust 7
10 40 Ziggy Stardust 7
11 40 Ziggy Stardust 7 ,
'Sheet_3': ID Name Surname Grade
0 22 Lucy Diamonds 9
1 22 Lucy Diamonds 9
2 22 Lucy Diamonds 9
3 50 Grace Kelly 7
4 50 Grace Kelly 7
5 50 Grace Kelly 7
6 105 Uma Thurman 7
7 105 Uma Thurman 7
8 105 Uma Thurman 7
9 29 Lola King 3
10 29 Lola King 3
11 29 Lola King 3 }
到目前为止我尝试过的代码:
# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# Defining the two file paths
path_excel_file = r'C:\Users\machukovich\Desktop\stack.xlsx'
# Loading the files into a dictionary of Dataframes
dfs = pd.read_excel(path_excel_file, sheet_name=None, skiprows=2)
# Trying to repeat each row in every dataframe three times
for sheet_name, df in dfs.items():
df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns))
# Adding up the list as a new column (opinion) in each sheet.
mylist = ['good song','average song', 'bad song']
for sheet_name, df in dfs.items():
df = dfs['opinion'] = np.resize(mylist, len(dfs))
# Creating a new column for the concatenation
for sheet_name, df in dfs.items():
df = dfs.insert(5, 'concatenation', dfs['Name'].map(str) + dfs['Surname'].map(str) + dfs['opinion'].map(str))
# We try to create a new excel file with the manipulated data
Path_new_file = r'C:\Users\machukovich\Desktop\new_file.xlsx'
# Create a Pandas Excel writer using XlsxWriter as the engine.
with pd.ExcelWriter(Path_new_file, engine='xlsxwriter') as writer:
for sheet_name, df in dfs.items():
df.to_excel(writer, sheet_name=sheet_name, startrow=2, index=False)
# I am not obtaining my desired output but an excel file on which each sheet is equal to one single column of one sheet out of my three excel sheets.
编辑:我没有获得所需的输出,我相信我每行重复三遍的代码行一定有问题。感谢任何帮助。
Numpy 解决方案
您似乎在解决方案中正确使用了
np.repeat
。问题是
for sheet_name, df in dfs.items():
df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns))
在循环内覆盖 df
不会修改
dfs
,因为
dfs.items()
创建了
dfs
的“视图”以进行迭代。解决办法是直接设置
dfs
的值:
for sheet_name, df in dfs.items():
dfs[sheet_name] = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns))
熊猫解决方案
您可以使用 pd.concat
对 pandas 执行此操作,为其提供数据框副本列表:
dfs[sheet_name] = pd.concat([df, df, df])
或
dfs[sheet_name] = pd.concat([df for _ in range(3)])
如果您尝试其中任何一个,您会注意到索引值也是重复的(numpy 不像 pandas 那样跟踪那些),并且行不符合您想要的顺序,因为我们实际上只是连接了数据帧端的副本-到结束。我们可以使用经典的 pandas 方法链来解决这个问题,我们可以在该方法链中进行排序,然后重置索引:
dfs[sheet_name] = pd.concat([df for _ in range(3)]).sort_index().reset_index(drop = True)