我有一个程序,它从文件中读取,然后搜索一组唯一的字符串行,然后将它们分成不相交的组。
当我读取大于或等于 1 GB 的大文件时出现的错误。
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:582)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:179)
at org.example.Main.main(Main.java:149).
我知道我可以在设置中更改堆大小,但我想以编程方式解决这个问题。
public class HugeFileReader {
public static void main(String[] args) throws IOException {
String OutputFile = "File created";
StringBuilder stringBuilder = new StringBuilder();
LineIterator bufferedReader = FileUtils.lineIterator(new File(args[0]),"UTF-8");
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(OutputFile));
List<Set<String>> numberOfGroups = new ArrayList<>();
List<Map<String, Integer>> positionOfNumbers = new ArrayList<>();
String line = bufferedReader.nextLine();
while (bufferedReader.hasNext()) {
String[] columns = getColumns(line);
Integer numOfGroup = null;
for (int i = 0; i < Math.min(positionOfNumbers.size(), columns.length); i++) {
Integer numOfGroup2 = positionOfNumbers.get(i).get(columns[i]);
if (numOfGroup2 != null) {
if (numOfGroup == null) {
numOfGroup = numOfGroup2;
} else if (!numOfGroup.equals(numOfGroup2)) {
for (String numbersOfGroup : numberOfGroups.get(numOfGroup2)) {
numberOfGroups.get(numOfGroup).add(numbersOfGroup);
for (int ii = 0; ii < getColumns(numbersOfGroup).length; ii++) {
if (getColumns(numbersOfGroup)[ii].isEmpty()) {
continue;
}
if (ii < positionOfNumbers.size()) {
positionOfNumbers.get(ii).put(getColumns(numbersOfGroup)[ii], numOfGroup);
} else {
HashMap<String, Integer> map = new HashMap<>();
map.put(getColumns(numbersOfGroup)[ii], numOfGroup);
positionOfNumbers.add(map);
}
}
}
numberOfGroups.set(numOfGroup2, new HashSet<>());
}
}
}
if (numOfGroup == null) {
if (Arrays.stream(columns).anyMatch(s -> !s.isEmpty())) {
numberOfGroups.add(new HashSet<>(List.of(line)));
for (int ii = 0; ii < columns.length; ii++) {
if (columns[ii].isEmpty()) {
continue;
}
if (ii < positionOfNumbers.size()) {
positionOfNumbers.get(ii).put(columns[ii], numberOfGroups.size() - 1);
} else {
HashMap<String, Integer> map = new HashMap<>();
map.put(columns[ii], numberOfGroups.size() - 1);
positionOfNumbers.add(map);
}
}
}
} else {
numberOfGroups.get(numOfGroup).add(line);
for (int ii = 0; ii < columns.length; ii++) {
if (columns[ii].isEmpty()) {
continue;
}
if (ii < positionOfNumbers.size()) {
positionOfNumbers.get(ii).put(columns[ii], numOfGroup);
} else {
HashMap<String, Integer> map = new HashMap<>();
map.put(columns[ii], numOfGroup);
positionOfNumbers.add(map);
}
}
}
line = bufferedReader.nextLine();
}
stringBuilder.append("group that contains the highest amount of elements ").append(numberOfGroups.stream().filter(s -> s.size() > 1).count());
numberOfGroups.sort(Comparator.comparingInt(s -> -s.size()));
int iterationOfGroups = 0;
for (Set<String> perGroup : numberOfGroups) {
iterationOfGroups++;
stringBuilder.append("\n").append("Группа ").append(iterationOfGroups).append("\n");
for (String setsOfNumbers : perGroup) {
stringBuilder.append(setsOfNumbers).append("\n");
}
}
bufferedWriter.write(stringBuilder.toString());
bufferedWriter.close();
bufferedReader.close();
}
private static String[] getColumns(String line) {
for (int i = 1; i < line.length() - 1; i++) {
if (line.charAt(i - 1) != ';' && line.charAt(i + 1) != ';' && line.charAt(i) == '"') {
return new String[0];
}
}
return line.replaceAll("\"", "").split(";");
}
}
当您写入 StringBuilder 时,会发生内存不足异常。因此,您可以将整个文件读入内存,但在将其写入文件之前,您可以将完整的输出缓冲到 StringBuilder 的 RAM 中。这不是必需的,您应该能够直接将输出写入文件并通过以下方式减少 RAM 的使用:
bufferedWriter.write("group that contains the highest amount of elements " + numberOfGroups.stream().filter(s -> s.size() > 1).count());
bufferedWriter.newLine();
numberOfGroups.sort(Comparator.comparingInt(s -> -s.size()));
int iterationOfGroups = 0;
for (Set<String> perGroup : numberOfGroups) {
iterationOfGroups++;
bufferedWriter.write("Group " + iterationOfGroups);
bufferedWriter.newLine();
for (String setsOfNumbers : perGroup) {
bufferedWriter.write(setsOfNumbers);
bufferedWriter.newLine();
}
}
bufferedWriter.close();
bufferedReader.close();