比较两个Excel文件的差异

问题描述 投票:0回答:3

我想比较两个输入 csv 文件以查看是否添加或删除了行。解决这个问题的最佳方法是什么?我没有使用列名称,因为所有文件的列名称不一致。

private void compare_btn_Click(object sender, EventArgs e)
        {
            string firstFile = firstExcel_txt.Text;
            var results = ReadExcel(openFileDialog1);
            string secondFile = secondExcel_txt.Text;
            var results2 = ReadExcel(openFileDialog2);

        }

阅读:

public object ReadExcel(OpenFileDialog openFileDialog)
        {
            var _excelFile = new ExcelQueryFactory(openFileDialog.FileName);
            var _info = from c in _excelFile.WorksheetNoHeader() select c;
            string header1, header2, header3;
            foreach (var item in _info)
            {
                header1 = item.ElementAt(0);
                header2 = item.ElementAt(1);
                header3 = item.ElementAt(2);
            }
            return _info;
        }

任何有关我如何做到这一点的帮助都会很棒。

c# winforms linq excel csv
3个回答
2
投票

我建议你计算Excel文件每一行的哈希值,然后你可以继续比较每一行的哈希值,看看它是否与另一个文件上的任何哈希值匹配(请参阅源代码中的注释)

我还提供了一些类来存储 Excel 文件的内容

using System.Security.Cryptography;

private void compare_btn_Click(object sender, EventArgs e)
{
    string firstFile = firstExcel_txt.Text;
    ExcelInfo file1 = ReadExcel(openFileDialog1);

    string secondFile = secondExcel_txt.Text;
    ExcelInfo file2 = ReadExcel(openFileDialog2);

    CompareExcels(file1,file2) ;
}    

public void CompareExcels(ExcelInfo fileA, ExcelInfo fileB)
{
    foreach(ExcelRow rowA in fileA.excelRows)
    {
        //If the current hash of a row of fileA does not exists in fileB then it was removed 
        if(! fileB.ContainsHash(rowA.hash))
        {
            Console.WriteLine("Row removed" + rowA.ToString());
        }
    }

    foreach(ExcelRow rowB in fileB.excelRows)
    {
        //If the current hash of a row of fileB does not exists in fileA then it was added 
        if(! fileA.ContainsHash(rowB.hash))
        {
            Console.WriteLine("Row added" + rowB.ToString());
        }
    }
}

public Class ExcelRow
{
    public List<String> lstCells ;
    public byte[] hash

    public ExcelRow()
    {
        lstCells = new List<String>() ;
    }
    public override string ToString()
    {
        string resp ;

        resp = string.Empty ;

        foreach(string cellText in lstCells)
        {
            if(resp != string.Empty)
            {
                resp = resp + "," + cellText ;
            }
            else
            {
                resp = cellText ;
            }   
        }
        return resp ;
    }       
    public void CalculateHash()
    {
        byte[] rowBytes ;
        byte[] cellBytes ;
        int pos ;
        int numRowBytes ;

        //Determine how much bytes are required to store a single excel row
        numRowBytes = 0 ;
        foreach(string cellText in lstCells)
        {
            numRowBytes += NumBytes(cellText) ;
        }       

        //Allocate space to calculate the HASH of a single row

        rowBytes= new byte[numRowBytes]
        pos = 0 ;

        //Concatenate the cellText of each cell, converted to bytes,into a single byte array
        foreach(string cellText in lstCells)
        {
            cellBytes = GetBytes(cellText) ;
            System.Buffer.BlockCopy(cellBytes, 0, rowBytes, pos, cellBytes.Length);
            pos = cellBytes.Length ;

        }

        hash = new MD5CryptoServiceProvider().ComputeHash(rowBytes);

    }
    static int NumBytes(string str)
    {
        return str.Length * sizeof(char);
    }

    static byte[] GetBytes(string str)
    {
        byte[] bytes = new byte[NumBytes(str)];
        System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }
}
public Class ExcelInfo
{
    public List<ExcelRow> excelRows ;

    public ExcelInfo()
    {
        excelRows = new List<ExcelRow>();
    }
    public bool ContainsHash(byte[] hashToLook)
    {
        bool found ;

        found = false ;

        foreach(ExcelRow eRow in excelRows)
        {
            found = EqualHash(eRow.hash, hashToLook) ;

            if(found)
            {
                break ;
            }
        }

        return found ;
    }
    public static EqualHash(byte[] hashA, byte[] hashB)
    {
        bool bEqual ;
        int i ;

        bEqual  = false;
        if (hashA.Length == hashB.Length)
        {
            i = 0;
            while ((i < hashA.Length) && (hashA[i] == hashB[i]))
            {
                i++ ;
            }
            if (i == hashA.Length)
            {
                bEqual = true;
            }
        }
        return bEqual ;
    }
}

public ExcelInfo ReadExcel(OpenFileDialog openFileDialog)
{
    var _excelFile = new ExcelQueryFactory(openFileDialog.FileName);
    var _info = from c in _excelFile.WorksheetNoHeader() select c;

    ExcelRow excelRow ;
    ExcelInfo resp ;

    resp = new ExcelInfo() ;

    foreach (var item in _info)
    {
        excelRow = new ExcelRow() ;

        //Add all the cells (with a for each)
        excelRow.lstCells.Add(item.ElementAt(0));
        excelRow.lstCells.Add(item.ElementAt(1));
        ....
        //Add the last cell of the row
        excelRow.lstCells.Add(item.ElementAt(N));

        //Calculate the hash of the row
        excelRow.CalculateHash() ;

        //Add the row to the ExcelInfo object
        resp.excelRows.Add(excelRow) ;
    }
    return resp ;
}

0
投票

最准确的方法是将它们都转换为字节数组,检查两者转换为数组时是否存在差异,使用以下链接获取简单的示例,了解如何将Excel表格转换为字节数组 :

将 Excel 转换为 Byte[]

现在您已将两个 Excel 工作表转换为 byte[],您应该通过检查字节数组是否相等来检查它们是否存在差异。

可以通过多种方式进行检查,如下所示,使用

linq
:

using System.Linq; //SequenceEqual

 byte[] FirstExcelFileBytes = null;
 byte[] SecondExcelFileBytes = null;

 FirstExcelFileBytes = GetFirstExcelFile();
 SecondExcelFileBytes = GetSecondExcelFile();

 if (FirstExcelFileBytes.SequenceEqual<byte>(SecondExcelFileBytes) == true)
 {
      MessageBox.Show("Arrays are equal");
 }
 else
 {
     MessageBox.Show("Arrays don't match");
 }

有足够多的其他方法来查找比较字节数组,您应该做一些研究,了解哪种方法最适合您。

使用以下链接,检查诸如

Row added
row removed
等内容。

比较Excel表格


0
投票

我最近正在处理相同的问题陈述,我必须比较两个 Excel 文件并打印两个文件之间的任何差异。

我采用的方法是将两个 Excel 文件导出到 DataTable,然后逐个单元格进行比较。

使用 Syncfusion.XlsIO 的 C# 代码片段:

public static DataTable SaveExcelToDataTable(string filePath)
{
    //Create an instance of ExcelEngine
    ExcelEngine excelEngine = new ExcelEngine();

    //Initialize application
    IApplication application = excelEngine.Excel;
    application.DefaultVersion = ExcelVersion.Xlsx;

    //Open existing workbook with data entered
    FileStream inputStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
    IWorkbook workbook = application.Workbooks.Open(inputStream);

    //Access first worksheet from the workbook instance
    IWorksheet worksheet = workbook.Worksheets[0];

    //Export Excel to DataTable
    DataTable dataTable = worksheet.ExportDataTable(worksheet.UsedRange, ExcelExportDataTableOptions.ColumnNames);

    return dataTable;
}

public static bool CompareDataTables(DataTable dt1, DataTable dt2)
{

    if (dt1.Rows.Count != dt2.Rows.Count || dt1.Columns.Count != dt2.Columns.Count)
        return false;

    for (int i = 0; i < dt1.Rows.Count; i++)
    {
        for (int c = 0; c < dt1.Columns.Count; c++)
        {
            if (!Equals(dt1.Rows[i][c], dt2.Rows[i][c]))
                return false;
        }
    }
    return true;
}
© www.soinside.com 2019 - 2024. All rights reserved.