用例 巨大的 CSV 文件 ~500MB,需要非常快地读取并且不加载内存。
我用过的想法。
逐行读取csv并将转换后的数据直接保存到数据库中。 (稍后我将从数据库获取数据并将其发送到另一个服务,但现在不相关)
public void importData() {
try (
Reader reader = reader.readData();
BufferedReader bufferedReader = new BufferedReader(reader);
) {
String line;
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
bufferedReader.readLine();
while ((line = bufferedReader.readLine()) != null) {
String[] parts = line.split(",");
LocalDate date= !parts[2].isEmpty() ? LocalDate.parse(parts[2], formatter) : null;
String partThree = parts[3];
String partZero= parts[0];
String partOne= parts[1];
String partFour= parts[4];
String partFive= parts.length >= 6 ? parts[5] : null;
service.saveDog(DogEntry.builder()
.breed(partZero)
.originSystem(partOne)
.date(date)
.state(partThree )
.center(partFour)
.partFive(partFive)
.build());
}
} catch (IOException e) {
throw new DOGException(ErrorCodes.CODES, "Cannot read Dog data", e);
}
}
服务方式
public void saveDog(DogEntry entry) {
LOGGER.info("Receiving Dog {}",
entry.getBreed());
final Dog dog = updateOrCreateDog(entry);
dogRepository.save(dog);
}
private Dog updateOrCreateDog(final DogEntry entry) {
Optional<Dog> existingDog = dogRepository.findByBreedAndOrigin(entry.getBreed(), entry.getOrigin());
return existingDog.map(dog -> getUpdatedDog(dog, entry)).orElseGet(() -> createNewDog(entry));
}
private Dog getUpdatedDog(Dog existingDog, DogEntry entry) {
existingDog.setBreed(entry.getBreed());
existingDog.setOrigin(entry.getOriginSystem());
existingDog.setStatus(entry.getState());
existingDog.setCenter(entry.getCenter());
return existingDog;
}
private Dog createNewDog(final DogEntry entry) {
return Dog.builder()
.breed(entry.getBreed())
.origin(entry.getOriginSystem())
.status(entry.getState())
.center(entry.getCenter())
.build();
}
问题是我无法从csv中获取所有信息并将其存储在列表中,因为会导致OOM。
有没有更快的方法,这样我在尝试读取和处理 csv 文件时就不会超时?
您可以尝试使用
BufferedReader.lines()
方法。它看起来类似于下面的代码。此方法逐行读取文件,而不是将整个文件加载到内存中。
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class Test{
public static void main(String[] args) {
Path path = Paths.get("customers-2000000.csv");
try (BufferedReader reader = Files.newBufferedReader(path)) {
reader.lines()
.forEach(Test::createObject);
} catch (IOException e) {
e.printStackTrace();
}
}
private static void createObject(String s) {
System.out.println("s = " + s);
}
}