Create a Python script which compares several excel files (snapshots) and compares and creates a new dataframe with diffirent

问题描述 投票:0回答:0

我是 Python 的新手,非常感谢您的帮助。

我想创建一个 python 脚本,它通过使用我的第一个文件 excel_file[0] 作为 df1 并将其与其他几个 excel_file[0:100] 进行比较并在与 df1 比较时循环遍历它们并附加那些行来执行数据验证与新数据框 df3 不同。尽管我有多个列,但我还是希望将我的比较基于包含主键列的两列;这样如果两个数据帧中的键匹配;然后比较 df1 和 df2(循环)。

这是我试过的..

## import python module: pandasql which allows SQL syntax for Pandas; 
It needs installation first though:pip install -U pandasql
    from pandasql import sqldf
    pysqldf = lambda q: sqldf(q, locals(), globals() )
    dateTimeObj = dt.datetime.now()
    print('start file merge: ' ,dateTimeObj)
#path = os.getcwd()
##files = os.listdir(path1)
files=os.path.abspath(mydrive')
files
dff1 = pd.DataFrame()
##df2 = pd.DataFrame()
# method 1
excel_files = glob.glob(files+ "\*.xlsx")
##excel_files = [f for f in files if f[-4:] == '\*.xlsx' or f[-3:] == '*.xls']
df1=pd.read_excel(excel_files[14])
for f in excel_files[0:100]:
    
    df2 = pd.read_excel(f)
    ## Lets drop the any unanamed column
    ##df1=df1.drop(df1.iloc[:, [0]], axis = 1)
    ### Gets all Rows and columns which are diffirent after comparing the two dataframes ; The 
    clause " _key HAVING COUNT(*)= 1" resolves to True if the two dataframes are diffirent
    ### Else we use The clause " _key HAVING COUNT(*)= 2" to output similar rows and columns
    data=pysqldf("SELECT * FROM ( SELECT * FROM df1 UNION ALL SELECT * FROM df2) df1 GROUP BY _key 
    HAVING COUNT(*) = 1 ;")
    
   ## df = dff1.append(data).reset_index(drop = True)
print(dt.datetime.now().strftime("%x %X")+': files appended to make a Master file')
pandas dataframe validation compare difference
© www.soinside.com 2019 - 2024. All rights reserved.