如何编写循环来重复有序字典的 df 中的行?

问题描述 投票:0回答:1

我打开了一个由三张纸组成的Excel文件作为

OrderedDict

  1. 我希望我原来的
    dataframe
    的所有行都重复三遍。
  2. 我想使用
    numpy
  3. 您可以使用
    pandas
    提出另一种解决方案吗?

我原来的有序字典具有以下形状:

    {'Sheet_1':     ID      Name  Surname  Grade
     0  104  Eleanor     Rigby      6
     1  168  Barbara       Ann      8
     2  450    Polly   Cracker      7
     3   90   Little       Joe     10,
     'Sheet_2':     ID       Name   Surname  Grade
     0  106       Lucy       Sky      8
     1  128    Delilah  Gonzalez      5
     2  100  Christina   Rodwell      3
     3   40      Ziggy  Stardust      7,
     'Sheet_3':     ID   Name   Surname  Grade
     0   22   Lucy  Diamonds      9
     1   50  Grace     Kelly      7
     2  105    Uma   Thurman      7
     3   29   Lola      King      3}

我想要的有序字典具有以下形状:

{'Sheet_1':      ID      Name  Surname  Grade
 0   104  Eleanor     Rigby      6          
 1   104  Eleanor     Rigby      6    
 2   104  Eleanor     Rigby      6            
 3   168  Barbara       Ann      8            
 4   168  Barbara       Ann      8      
 5   168  Barbara       Ann      8              
 6   450    Polly   Cracker      7          
 7   450    Polly   Cracker      7    
 8   450    Polly   Cracker      7            
 9    90   Little       Joe     10             
 10   90   Little       Joe     10       
 11   90   Little       Joe     10              ,
 'Sheet_2':      ID       Name   Surname  Grade        \
 0   106       Lucy       Sky      8      
 1   106       Lucy       Sky      8    
 2   106       Lucy       Sky      8       
 3   128    Delilah  Gonzalez      5       
 4   128    Delilah  Gonzalez      5    
 5   128    Delilah  Gonzalez      5        
 6   100  Christina   Rodwell      3      
 7   100  Christina   Rodwell      3    
 8   100  Christina   Rodwell      3        
 9    40      Ziggy  Stardust      7       
 10   40      Ziggy  Stardust      7    
 11   40      Ziggy  Stardust      7         ,
 'Sheet_3':      ID   Name   Surname  Grade                 
 0    22   Lucy  Diamonds      9     
 1    22   Lucy  Diamonds      9  
 2    22   Lucy  Diamonds      9      
 3    50  Grace     Kelly      7     
 4    50  Grace     Kelly      7  
 5    50  Grace     Kelly      7      
 6   105    Uma   Thurman      7     
 7   105    Uma   Thurman      7  
 8   105    Uma   Thurman      7      
 9    29   Lola      King      3    
 10   29   Lola      King      3  
 11   29   Lola      King      3      }

到目前为止我尝试过的代码:

# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook

# Defining the two file paths
path_excel_file = r'C:\Users\machukovich\Desktop\stack.xlsx'

# Loading the files into a dictionary of Dataframes
dfs = pd.read_excel(path_excel_file, sheet_name=None, skiprows=2)

# Trying to repeat each row in every dataframe three times
for sheet_name, df in dfs.items():
    df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns))
    
# Adding up the list as a new column (opinion) in each sheet.
mylist = ['good song','average song', 'bad song']
for sheet_name, df in dfs.items():
    df = dfs['opinion'] = np.resize(mylist, len(dfs))
    
# Creating a new column for the concatenation
for sheet_name, df in dfs.items():
    df = dfs.insert(5, 'concatenation', dfs['Name'].map(str)  + dfs['Surname'].map(str) + dfs['opinion'].map(str))
    
# We try to create a new excel file with the manipulated data

Path_new_file = r'C:\Users\machukovich\Desktop\new_file.xlsx'

# Create a Pandas Excel writer using XlsxWriter as the engine.
with pd.ExcelWriter(Path_new_file, engine='xlsxwriter') as writer:
    for sheet_name, df in dfs.items():
        df.to_excel(writer, sheet_name=sheet_name, startrow=2, index=False)
        
        
# I am not obtaining my desired output but an excel file on which each sheet is equal to one single column of one sheet out of my three excel sheets.

编辑:我没有获得所需的输出,我相信我每行重复三遍的代码行一定有问题。感谢任何帮助。

pandas dataframe numpy for-loop ordereddictionary
1个回答
0
投票

Numpy 解决方案

您似乎在解决方案中正确使用了

np.repeat
。问题是

for sheet_name, df in dfs.items():
    df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns))
在循环内覆盖

df

 不会修改 
dfs
,因为 
dfs.items()
 创建了 
dfs
 的“视图”以进行迭代。解决办法是直接设置
dfs
的值:

for sheet_name, df in dfs.items(): dfs[sheet_name] = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns))

熊猫解决方案 您可以使用 pd.concat

 对 pandas 执行此操作,为其提供数据框副本列表:

dfs[sheet_name] = pd.concat([df, df, df])

dfs[sheet_name] = pd.concat([df for _ in range(3)])
如果您尝试其中任何一个,您会注意到索引值也是重复的(numpy 不像 pandas 那样跟踪那些),并且行不符合您想要的顺序,因为我们实际上只是连接了数据帧端的副本-到结束。我们可以使用经典的 pandas 方法链来解决这个问题,我们可以在该方法链中进行排序,然后重置索引:

dfs[sheet_name] = pd.concat([df for _ in range(3)]).sort_index().reset_index(drop = True)
    
© www.soinside.com 2019 - 2024. All rights reserved.