如何读取大文件并避免java.lang.OutOfMemoryError?

问题描述 投票:0回答:1

我有一个程序,它从文件中读取,然后搜索一组唯一的字符串行,然后将它们分成不相交的组。

当我读取大于或等于 1 GB 的大文件时出现的错误。

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
    at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228)
    at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:582)
    at java.base/java.lang.StringBuilder.append(StringBuilder.java:179)
    at org.example.Main.main(Main.java:149).

我知道我可以在设置中更改堆大小,但我想以编程方式解决这个问题。

public class HugeFileReader {

    public static void main(String[] args) throws IOException {
        String OutputFile = "File created";
        StringBuilder stringBuilder = new StringBuilder();
        LineIterator bufferedReader = FileUtils.lineIterator(new File(args[0]),"UTF-8");
        BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(OutputFile));
        List<Set<String>> numberOfGroups = new ArrayList<>();
        List<Map<String, Integer>> positionOfNumbers = new ArrayList<>();
        String line = bufferedReader.nextLine();

        while (bufferedReader.hasNext()) {
            String[] columns = getColumns(line);
            Integer numOfGroup = null;

            for (int i = 0; i < Math.min(positionOfNumbers.size(), columns.length); i++) {
                Integer numOfGroup2 = positionOfNumbers.get(i).get(columns[i]);

                if (numOfGroup2 != null) {
                    if (numOfGroup == null) {
                        numOfGroup = numOfGroup2;
                    } else if (!numOfGroup.equals(numOfGroup2)) {
                        for (String numbersOfGroup : numberOfGroups.get(numOfGroup2)) {

                            numberOfGroups.get(numOfGroup).add(numbersOfGroup);

                            for (int ii = 0; ii < getColumns(numbersOfGroup).length; ii++) {

                                if (getColumns(numbersOfGroup)[ii].isEmpty()) {
                                    continue;
                                }
                                if (ii < positionOfNumbers.size()) {
                                    positionOfNumbers.get(ii).put(getColumns(numbersOfGroup)[ii], numOfGroup);

                                } else {
                                    HashMap<String, Integer> map = new HashMap<>();

                                    map.put(getColumns(numbersOfGroup)[ii], numOfGroup);
                                    positionOfNumbers.add(map);
                                }
                            }
                        }
                        numberOfGroups.set(numOfGroup2, new HashSet<>());
                    }
                }
            }

            if (numOfGroup == null) {
                if (Arrays.stream(columns).anyMatch(s -> !s.isEmpty())) {
                    numberOfGroups.add(new HashSet<>(List.of(line)));

                    for (int ii = 0; ii < columns.length; ii++) {
                        if (columns[ii].isEmpty()) {
                            continue;
                        }

                        if (ii < positionOfNumbers.size()) {
                            positionOfNumbers.get(ii).put(columns[ii], numberOfGroups.size() - 1);
                        } else {
                            HashMap<String, Integer> map = new HashMap<>();
                            map.put(columns[ii], numberOfGroups.size() - 1);
                            positionOfNumbers.add(map);
                        }
                    }
                }
            } else {
                numberOfGroups.get(numOfGroup).add(line);

                for (int ii = 0; ii < columns.length; ii++) {
                    if (columns[ii].isEmpty()) {
                        continue;
                    }

                    if (ii < positionOfNumbers.size()) {
                        positionOfNumbers.get(ii).put(columns[ii], numOfGroup);
                    } else {
                        HashMap<String, Integer> map = new HashMap<>();
                        map.put(columns[ii], numOfGroup);
                        positionOfNumbers.add(map);
                    }
                }
            }
            line = bufferedReader.nextLine();
        }


        stringBuilder.append("group that contains the highest amount of elements ").append(numberOfGroups.stream().filter(s -> s.size() > 1).count());

        numberOfGroups.sort(Comparator.comparingInt(s -> -s.size()));
        int iterationOfGroups = 0;

        for (Set<String> perGroup : numberOfGroups) {
            iterationOfGroups++;
            stringBuilder.append("\n").append("Группа ").append(iterationOfGroups).append("\n");

            for (String setsOfNumbers : perGroup) {
                stringBuilder.append(setsOfNumbers).append("\n");
            }
        }

        bufferedWriter.write(stringBuilder.toString());
        bufferedWriter.close();
        bufferedReader.close();
    }


    private static String[] getColumns(String line) {
        for (int i = 1; i < line.length() - 1; i++) {
            if (line.charAt(i - 1) != ';' && line.charAt(i + 1) != ';' && line.charAt(i) == '"') {
                return new String[0];
            }
        }
        return line.replaceAll("\"", "").split(";");
    }
}
java out-of-memory heap-memory
1个回答
0
投票

当您写入 StringBuilder 时,会发生内存不足异常。因此,您可以将整个文件读入内存,但在将其写入文件之前,您可以将完整的输出缓冲到 StringBuilder 的 RAM 中。这不是必需的,您应该能够直接将输出写入文件并通过以下方式减少 RAM 的使用:

bufferedWriter.write("group that contains the highest amount of elements " + numberOfGroups.stream().filter(s -> s.size() > 1).count());
bufferedWriter.newLine();

numberOfGroups.sort(Comparator.comparingInt(s -> -s.size()));
int iterationOfGroups = 0;

for (Set<String> perGroup : numberOfGroups) {
    iterationOfGroups++;
    bufferedWriter.write("Group " + iterationOfGroups);
    bufferedWriter.newLine();

    for (String setsOfNumbers : perGroup) {
        bufferedWriter.write(setsOfNumbers);
        bufferedWriter.newLine();
    }
}

bufferedWriter.close();
bufferedReader.close();
© www.soinside.com 2019 - 2024. All rights reserved.