Java Javascript。文件逐行比较，而忽略某些部分。

Question

问题：是否有更好的方法来比较两个低大小(100Kb)的文件，同时有选择地忽略一部分文本。

寻找defaultexisting java库或任何windows本地应用程序

以下是方案。

预期文件1位于D:\预期文件A_61613.txt .实际文件2位于D:\actuals\文件A_61613.txt。

预期文件中的内容

Some first line hereThere may be whitespaces, line breaks, indentation and here is another lineKey : SomeValueDate : 01022012Time : 18:20key2 : Value2key3 : Value3key4 : Value4key5 : Value5Some other text again to indicate that his is end of this file.

实际要比较的文件。

Some first line hereThere may be whitespaces, line breaks, indentation and here is another lineKey : SomeValueDate : 18092013Timestamp : 15:10.345+10.00key2 : Value2key3 : Value3key4 : Something Differentkey5 : Value5Some other text again to indicate his is end of this file.

文件1和文件2需要逐行比较。WITHOUT ignoring 空格、缩进、换行等。

比较的结果应该像下面这样。第8行 - 预计时间，但实际时间戳第8行 - 预计HH.mm，但实际HH.mm。.345+10.00 第11行--预期有N个空格，但实际只有X个空格第13行--预期有换行符，但没有换行符。

下面也有变化，但 应予忽视 : 第7行--预期为01022012，但实际为18092013（确切地说只有10个字符）第8行--预期为18:20，但实际为:15:20（确切地说只有5个字符应该被忽略）。注 : 剩余的 .345+10.00 应报

即使结果只包含行号，不分析失败原因也可以。但它不应该只在第8行报告失败并退出。它应该报告所有的变化，除了排除的 "date "和 "time "值。

一些搜索结果指向使用Perl的解决方案。但寻找Java Javascript的解决方案。解决方案的输入将是两个文件的完整文件路径。

我目前的工作方法是 用'#'代替要忽略的文本.当进行比较时，如果我们遇到#，不要认为是差异.下面是我的工作代码。但我需要知道是否可以使用一些默认的现有库或函数来实现。

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;

public class fileComparison {
    public static void main(String[] args) throws IOException {
        FileInputStream fstream1 = new FileInputStream(
                "D:\\expected\\FileA_61613.txt");
        FileInputStream fstream2 = new FileInputStream(
                "D:\\actuals\\FileA_61613.txt");
        DataInputStream in1 = new DataInputStream(fstream1);
        BufferedReader br1 = new BufferedReader(new InputStreamReader(in1));
        DataInputStream in2 = new DataInputStream(fstream2);
        BufferedReader br2 = new BufferedReader(new InputStreamReader(in2));
        int lineNumber = 0;
        String strLine1 = null;
        String strLine2 = null;
        StringBuilder sb = new StringBuilder();
        System.out.println(sb);
        boolean isIgnored = false;

        while (((strLine1 = br1.readLine()) != null)
                && ((strLine2 = br2.readLine()) != null)) {
            lineNumber++;
            if (!strLine1.equals(strLine2)) {
                int strLine1Length = strLine1.length();
                int strLine2Length = strLine2.length();
                int maxIndex = Math.min(strLine1Length, strLine2Length);
                if (maxIndex == 0) {
                    sb.append("Mismatch at line " + lineNumber
                            + " all characters " + '\n');
                    break;
                }
                int i;
                for (i = 0; i < maxIndex; i++) {
                    if (strLine1.charAt(i) == '#') {
                        isIgnored = true;
                        continue;
                    }
                    if (strLine1.charAt(i) != strLine2.charAt(i)) {
                        isIgnored = false;
                        break;
                    }
                }
                if (isIgnored) {
                    sb.append("Ignored line " + lineNumber + '\n');
                } else {
                    sb.append("Mismatch at line " + lineNumber + " at char "
                            + i + '\n');
                }
            }
        }
        System.out.println(sb.toString());
        br1.close();
        br2.close();

    }
}

我能够得到的输出是.NET，但当有多个差异时，我需要知道是否可以使用一些默认的现有库或函数来实现。

Ignored line 7
Mismatch at line 8 at char 4
Mismatch at line 11 at char 13
Mismatch at line 12 at char 8
Mismatch at line 14 all characters

但是，当同一行有多个差异时，我无法将它们全部记录下来。我无法将它们全部记录下来，因为我是逐字比较而不是逐字比较。我不喜欢一个字一个字的比较，因为，我认为这将是不可能比较行间休息和空白。我的理解是否正确？

Answer 1

java.lang.StringIndexOutOfBoundsException 来自于这段代码。

for (int i = 0; i < strLine1.length(); i++) {
   if (strLine1.charAt(i) != strLine2.charAt(i)) {
       System.out.println("char not same at " + i);
   }   
}

当你滚动较大时 String strLine到一个索引，大于strLine2的长度（第二个文件比第一个文件小），你就会得到这个异常。它的出现，是因为strLine2在这些索引上没有值，当它较短时。

Java Javascript。文件逐行比较，而忽略某些部分。

问题描述投票：-4回答：1

1个回答

最新问题

Java Javascript。文件逐行比较，而忽略某些部分。

问题描述 投票：-4回答：1

1个回答

最新问题

问题描述投票：-4回答：1