读取CSV文件java的最快方法

问题描述 投票:3回答:4

我一直在尝试使用openCSV读取几个csv文件(大约20 MB),但到目前为止它一直很慢。我试图读取我加载到堆中的4个csv文件,这是我设计的。我想知道,如果还有其他方法可以在更短的时间内完成。

private Heap<VOMovingViolations> datosHeap; 

public void loadMovingViolations()
{
    Runtime garbage = Runtime.getRuntime(); 
    garbage.gc();
    try 
    {
        FileReader fileReaderMes1 = new FileReader(FECHAS[0]);
        FileReader fileReaderMes2 = new FileReader(FECHAS[1]); 
        FileReader fileReaderMes3 = new FileReader(FECHAS[2]); 
        FileReader fileReaderMes4 = new FileReader(FECHAS[3]); 
        CSVReader enero = new CSVReaderBuilder(fileReaderMes1).withSkipLines(1).build();
        CSVReader febrero = new CSVReaderBuilder(fileReaderMes2).withSkipLines(1).build();
        CSVReader marzo = new CSVReaderBuilder(fileReaderMes3).withSkipLines(1).build();
        CSVReader abril = new CSVReaderBuilder(fileReaderMes4).withSkipLines(1).build();

        String[] row; 


        while((row = enero.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = febrero.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = marzo.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = abril.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

    } 
    catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } 


}

我真的很感激任何帮助或任何有人可以请你给我的想法。

java eclipse csv data-structures reader
4个回答
4
投票

tl;dr

读取20 MB CSV文件并每行实例化一个对象,总耗时不到1秒。

Details

你没有定义术语“慢”。所以我做了一个实验,一个休闲的基准测试。

首先,我们创建一个包含40,000个Person记录的20 MB文件。每个Person都有法语的名字和名字,UUID,以及一些任意文字作为描述。数据在CSVUTF-8文件中写成四列。我使用Apache Commons CSV库来编写和阅读。

其次,读取该文件。每行数据都被读入内存,然后用于实例化和收集Person对象。

读取此文件,并为每行实例化Person对象所花费的总时间不到一秒。每排大约需要20K nanoseconds。实际上,这包括两次读取文件,因为我们进行扫描以计算数据行数以设置收集实例的初始容量。此外,我们正在将十六进制字符串输入解析为UUID的128位值,因此我们花了一些时间在数据处理上(而不仅仅是读取)。

这是Person类。

package work.basil.example;

import java.util.UUID;

public class Person
{
    // Static
   static public  String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    // Member variables.
    public String givenName, surname, description;
    public UUID id;

    public Person ( String givenName , String surname , UUID id , String description)
    {
        this.givenName = givenName;
        this.surname = surname;
        this.id = id;
        this.description = description ;
    }

    @Override
    public String toString ()
    {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

这是完整的应用程序,写入然后读取20 MB文件。请学习和评论,因为我快速地鞭打了这个。我没有仔细检查过我的工作。

你会发现一个write方法和一个read方法。 main方法调用两者,并跟踪时间。

package work.basil.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;

public class CsvSpeed
{
    public List < Person > read ( Path path )
    {
        // TODO: Add a check for valid file existing.

        List < Person > list = List.of();  // Default to empty list.
        try
        {
            // Prepare list.
            int initialCapacity = ( int ) Files.lines( path ).count();
            list = new ArrayList <>( initialCapacity );

            // Read CSV file. For each row, instantiate and collect `DailyProduct`.
            BufferedReader reader = Files.newBufferedReader( path );
            Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
            for ( CSVRecord record : records )
            {
                String givenName = record.get( "givenName" );
                String surname = record.get( "surname" );
                UUID id = UUID.fromString( record.get( "id" ) );
                String description = record.get( "description" );
                // Instantiate `Person` object, and collect it.
                Person person = new Person( givenName , surname , id , description );
                list.add( person );
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
        return list;
    }

    public void write ( final Path path )
    {
        ThreadLocalRandom random = ThreadLocalRandom.current();
        try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "givenName" , "surname" , "id" , "description" ).print( path , StandardCharsets.UTF_8 ) ; )
        {
            int limit = 40_000;  // 40_000 yields about 20 MB of data.
            List < String > givenNames = List.of( "Adrien" , "Aimon" , "Alerion" , "Alexis" , "Alezan" , "Ancil" , "Andre" , "Antoine" , "Archard" , "Aurélien" , "Averill" , "Baptiste" , "Barnard" , "Bartelemy" , "Bastien" , "Baylee" , "Beale" , "Beau" , "Beaumont" , "Beauregard" , "Bellamy" , "Berger" , "Blaize" , "Blondel" , "Boyce" , "Bruce" , "Brunelle" , "Brys" , "Burcet" , "Burnell" , "Burrell" , "Byron" , "Canaan" , "Carden" , "Carolas" , "Cavell" , "Chace" , "Chanler" , "Chante" , "Chappel" , "Charles" , "Chasen" , "Chason" , "Chemin" , "Chene" , "Cher" , "Chevalier" , "Cheyne" , "Clément" , "Clemence" , "Corbin" , "Coty" , "Cygne" , "Damien" , "Dandre" , "Dariel" , "Darl" , "Dauphine" , "Davet" , "Dax" , "Dean" , "Delice" , "Delmon" , "Destin" , "Dominique" , "Donatien" , "Duke" , "Eliott" , "Elroy" , "Enzo" , "Erwan" , "Etalon" , "Ethan" , "Fabron" , "Ferrand" , "Filberte" , "Florent" , "Florian" , "Fontaine" , "Forest" , "Fortune" , "Franchot" , "Francois" , "Fraser" , "Frayne" , "Gaëtan" , "Gabin" , "Gage" , "Gaige" , "Garland" , "Garner" , "Gaston" , "Gauge" , "Gaylord" , "Germain" , "Germaine" , "German" , "Gervaise" , "Giles" , "Gilles" , "Gitan" , "Grosvener" , "Guifford" , "Guion" , "Guy" , "Guzman" , "Henri" , "Holland" , "Hugo" , "Hugues" , "Hyacinthe" , "Jérémy" , "Jacquan" , "Jacques" , "Jacquez" , "Janvier" , "Jardan" , "Jay" , "Jaye" , "Jehan" , "Jemond" , "Jocquez" , "Jonathan" , "Jules" , "Julien" , "Justus" , "Karoly" , "Lado" , "Lafayette" , "Lamond" , "Lancelin" , "Landis" , "Landry" , "Laron" , "Larrimore" , "Laurent" , "LaValle" , "Leandre" , "Leggett" , "Leonce" , "Leron" , "Leverett" , "Lilian" , "Loïc" , "Lorenzo" , "Louis" , "Lowell" , "Luc" , "Lucien" , "Lukas" , "Macaire" , "Mace" , "Mahieu" , "Maison" , "Malleville" , "Manneville" , "Mantel" , "Marc" , "Marcel" , "Marion" , "Marius" , "Markez" , "Markis" , "Marmion" , "Marquis" , "Marquise" , "Marshall" , "Martial" , "Maslin" , "Mason" , "Matheo" , "Mathias" , "Mathys" , "Matthieu" , "Maxence" , "Mayson" , "Mehdi" , "Merle" , "Merville" , "Montague" , "Montaigu" , "Monte" , "Montgomery" , "Montreal" , "Montrel" , "Moore" , "Morel" , "Mortimer" , "Nerville" , "Neuveville" , "Nicolas" , "Noë" , "Noah" , "Noe" , "Norman" , "Norville" , "Nouel" , "Olivier" , "Onfroi" , "Paien" , "Parfait" , "Parnell" , "Pascal" , "Patrice" , "Paul" , "Peppin" , "Percival" , "Percy" , "Pernell" , "Peverell" , "Philipe" , "Pierpont" , "Pierre" , "Pomeroy" , "Prewitt" , "Purvis" , "Quennell" , "Quentin" , "Quincey" , "Quincy" , "Quintin" , "Rémi" , "Rafaelle" , "Ranger" , "Raoul" , "Raphaël" , "Rapier" , "Rawlins" , "Ray" , "Raynard" , "Remi" , "René" , "Renard" , "Rene" , "Reule" , "Reynard" , "Robin" , "Romain" , "Rondel" , "Roy" , "Royal" , "Ruff" , "Rush" , "Russel" , "Rustin" , "Sabastien" , "Sacha" , "Salomon" , "Samuel" , "Satordi" , "Saville" , "Scoville" , "Sebastien" , "Sennett" , "Severin" , "Shant" , "Shantae" , "Sidney" , "Siffre" , "Simeon" , "Simon" , "Sinclair" , "Sofiane" , "Somer" , "Stephane" , "Sully" , "Sydney" , "Sylvain" , "Talbot" , "Talon" , "Telford" , "Tempest" , "Teppo" , "Théo" , "Thayer" , "Thibault" , "Thibaut" , "Thiery" , "Tiennan" , "Tiennot" , "Titouan" , "Toussaint" , "Travaris" , "Tyson" , "Urson" , "Vachel" , "Valentin" , "Valere" , "Vallis" , "Verdun" , "Victoir" , "Victor" , "Waltier" , "William" , "Wyatt" , "Yanis" , "Yann" , "Yves" , "Yvon" , "Zosime" , "Abrial" , "Abrielle" , "Abril" , "Adele" , "Alair" , "Alerion" , "Amee" , "Angelique" , "Annette" , "Antonella" , "Arian" , "Ariane" , "Armandina" , "Aubree" , "Aubrielle" , "Audra" , "Avril" , "Bella" , "Berneta" , "Bette" , "Blaise" , "Blanche" , "Blasa" , "Bonte" , "Brie" , "Brienne" , "Brigit" , "Cachay" , "Calice" , "Camille" , "Camylle" , "Caprice" , "Caressa" , "Caroline" , "Catin" , "Celesta" , "Celeste" , "Cera" , "Cerise" , "Chablis" , "Chalice" , "Chambray" , "Champagne" , "Chandell" , "Chaney" , "Chantal" , "Chante" , "Chanterelle" , "Chantile" , "Chantilly" , "Chantrice" , "Charla" , "Charlotte" , "Charmane" , "Chaton" , "Chemin" , "Chenetta" , "Cher" , "Chere" , "Cheri" , "Cheryl" , "Christine" , "Cidney" , "Cinderella" , "Claire" , "Claudette" , "Colette" , "Cordelle" , "Cydnee" , "Daeja" , "Daija" , "Daja" , "Damzel" , "Darelle" , "Darlene" , "Darselle" , "Dejanelle" , "Deleena" , "Delice" , "Demeri" , "Deni" , "Denise" , "Desgracias" , "Desire" , "Desiree" , "Destanee" , "Destiny" , "Dior" , "Domanique" , "Dominique" , "Elaina" , "Elaine" , "Elayna" , "Elise" , "Eloisa" , "Elyse" , "Emeline" , "Emmaline" , "Emmeline" , "Estella" , "Estrella" , "Etiennette" , "Evette" , "Fabienne" , "Fabrienne" , "Fanchon" , "Fancy" , "Fawna" , "Fayana" , "Fayette" , "Fifi" , "Fleur" , "Fleurette" , "Fontanna" , "Fosette" , "Francine" , "Frederique" , "Gabriel" , "Gabriele" , "Gabrielle" , "Gaby" , "Garcelle" , "Gena" , "Genie" , "Georgette" , "Germaine" , "Gervaise" , "Gitana" , "Harriet" , "Heloisa" , "Holland" , "Honnetta" , "Isabelle" , "Ivette" , "Ivonne" , "Jacqueena" , "Jacquetta" , "Jacquiline" , "Jacyline" , "Jaime" , "Jakqueline" , "Janeen" , "Janelly" , "Janina" , "Janiqua" , "Janique" , "Jannnelle" , "Jaquita" , "Jardena" , "Jeanetta" , "Jermaine" , "Jessamine" , "Jewel" , "Jewell" , "Joli" , "Jolie" , "Josephine" , "Jozephine" , "Julieta" , "Karessa" , "Karmaine" , "Klara" , "Laine" , "Lanelle" , "Laramie" , "Layne" , "Layney" , "Leala" , "Leonette" , "Lissette" , "Lizette" , "Lourdes" , "Lucienne" , "Ly" , "Lyla" , "Lysette" , "Madelaine" , "Malerie" , "Manette" , "Marais" , "Marcelle" , "Marché" , "Mardi" , "Margo" , "Marguerite" , "Marie" , "Marie Claude" , "Marie Frances" , "Marie Joelle" , "Marie Pascale" , "Marie Sophie" , "Marjolaine" , "Marquise" , "Marvella" , "Mathieu" , "Matisse" , "Maurelle" , "Maurissa" , "Mavis" , "Melisande" , "Michelle" , "Miette" , "Mignon" , "Mimi" , "Mirya" , "Monet" , "Moniqua" , "Monteen" , "Musetta" , "Myrlie" , "Nadeen" , "Nadia" , "Nadiyah" , "Naeva" , "Nanon" , "Natalle" , "Naudia" , "Nettie" , "Nicholas" , "Nicki" , "Nicky" , "Nicole" , "Nicolette" , "Nicolina" , "Nicolle" , "Nikolette" , "Ninette" , "Ninon" , "Noelle" , "Nycole" , "Odelette" , "Opaline" , "Orane" , "Orva" , "Page" , "Parisa" , "Parnel" , "Parris" , "Patrice" , "Peridot" , "Pippi" , "Prairie" , "Rachele" , "Rachelle" , "Racquel" , "Raphaelle" , "Raquelle" , "Remi" , "Renée" , "Renea" , "Renelle" , "Renita" , "Risette" , "Rochelle" , "Romy" , "Rosabel" , "Rosiclara" , "Ruba" , "Russhell" , "Saleena" , "Salina" , "Satin" , "Sedona" , "Serene" , "Shandelle" , "Shanta" , "Shante" , "Shariah" , "Sharita" , "Sharleen" , "Sheree" , "Shereen" , "Sherell" , "Sherice" , "Sherry" , "Sidnee" , "Sidney" , "Sidnie" , "Sidonie" , "Sinclaire" , "Solange" , "Solen" , "Sorrel" , "Suzette" , "Sydnee" , "Sydney" , "Tallis" , "Tempest" , "Toinette" , "Turquoise" , "Veronique" , "Vignette" , "Villette" , "Violeta" , "Virginie" , "Voleta" , "Vonny" );
            List < String > surnames = List.of( "Arceneau" , "Aucoin" , "Babin" , "Babineaux" , "Benoit" , "Bergeron" , "Bernard" , "Bertrand" , "Bessette" , "Blanc" , "Blanchard" , "Bonnet" , "Boucher" , "Bourg" , "Bourque" , "Boutin" , "Bouvier" , "Braud" , "Broussard" , "Brun" , "Chevalier" , "David" , "Depaul" , "Desmarais" , "Disney" , "Dubois" , "Dupont" , "Dupuis" , "Durand" , "Fortescue" , "Fournier" , "Garnier" , "Gaudet" , "Gillet" , "Gillette" , "Girard" , "Gravois" , "Grosvenor" , "Lambert" , "Landry" , "Laroche" , "Laurent" , "Lefevre" , "Leroy" , "Leveque" , "Lisle" , "Martin" , "Michel" , "Molyneux" , "Moreau" , "Morel" , "Neville" , "Pelletier" , "Petit" , "Prideux" , "Renard" , "Richard" , "Robert" , "Rousseau" , "Roux" , "Rufus" , "Simon" , "Thomas" );
            for ( int i = 1 ; i <= limit ; i++ )
            {
                String givenName = givenNames.get( random.nextInt( 0 , givenNames.size() ) );
                String surname = surnames.get( random.nextInt( 0 , surnames.size() ) );
                UUID id = UUID.randomUUID();
                String description = Person.LOREM_IPSUM;
                printer.printRecord( givenName , surname , id , description );
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
    }

    public static void main ( final String[] args )
    {
        // Launch the app.
        CsvSpeed app = new CsvSpeed();

        // Write.
        String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
        Path pathOutput = Paths.get( "/Users/basilbourque/persons.csv" );
        app.write( pathOutput );
        System.out.println( "Writing file: " + pathOutput );

        // Read.
        long start = System.nanoTime();
        Path pathInput = Paths.get( "/Users/basilbourque/persons.csv" );
        List < Person > list = app.read( pathInput );
        long stop = System.nanoTime();

        // Time.
        long elapsed = ( stop - start );
        Duration d = Duration.ofNanos( elapsed );
        System.out.println( "Reading elapsed: " + d );
        System.out.println( "Reading took nanos per row: " + ( elapsed / list.size() ) );
        System.out.println( "nanos elapsed: " + elapsed + "  |  list.size: " + list.size() );
    }
}

运行时:

编写文件:/Users/basilbourque/persons.csv

读取已过:PT0.857816234S

阅读每行纳米:21445

nanos逝去:857816234 | list.size:40000

技术堆栈:

  • Java 11.0.2 - Zulu by Azul Systems(由OpenJDK构建)
  • 在IntelliJ 2019.1中运行
  • macOS Mojave
  • MacBook Pro(Retina,15英寸,2013年末)
  • 处理器:2.3 GHz Intel Core i7(4核,8超)
  • 16 GB 1600 MHz DDR3
  • 存储:Apple内置的固态

3
投票

除了使用@Basil建议的java.nio之外,简单地用FileReader包装BufferedReader应该可以实现显着的加速。

FileReader fileReaderMes1 = new BufferedReader( new FileReader(FECHAS[0]));

3
投票

csv-parsers-comparison,我们可以找到CSV Reader / Writer-s之间的比较。最快的是uniVocity CSV parser。第三是Jackson,我个人更喜欢。使用@Basil Bourque很好的例子我改变了一点并使用了Jackson类。方法读取返回MappingIterator,您可以使用它来初始化堆对象(请参阅我如何向List添加元素)。我没有包含时间细节,但您可以使用Basil's和此解决方案自行完成:

import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.databind.ObjectReader;
import com.fasterxml.jackson.databind.ObjectWriter;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

import java.io.File;
import java.io.FileWriter;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;

public class CsvSpeed {

    public static void main(String[] args) throws Exception {
        File csvFile = new File("./resource/persons.csv").getAbsoluteFile();

        CsvSchema schema = CsvSchema.builder()
                .addColumn("givenName")
                .addColumn("surname")
                .addColumn("id")
                .addColumn("description")
                .build().withHeader();

        CsvSpeed csvSpeed = new CsvSpeed();
        csvSpeed.write(csvFile, schema);

        // Read.
        long start = System.nanoTime();
        MappingIterator<Person> personMappingIterator = csvSpeed.read(csvFile, schema);
        List<Person> persons = new ArrayList<>(40_000);
        personMappingIterator.forEachRemaining(persons::add);

        long stop = System.nanoTime();

        System.out.println(persons.size());

        // Time.
        long elapsed = (stop - start);
        Duration d = Duration.ofNanos(elapsed);
        System.out.println("Reading elapsed: " + d);
        System.out.println("Reading took nanos per row: " + (elapsed / persons.size()));
        System.out.println("nanos elapsed: " + elapsed + "  |  list.size: " + persons.size());
    }

    public MappingIterator<Person> read(final File path, CsvSchema schema) throws Exception {
        CsvMapper csvMapper = new CsvMapper();

        ObjectReader reader = csvMapper.readerFor(Person.class).with(schema);
        return reader.readValues(path);
    }

    public void write(final File path, CsvSchema schema) throws Exception {
        ThreadLocalRandom random = ThreadLocalRandom.current();

        CsvMapper csvMapper = new CsvMapper();
        ObjectWriter writer = csvMapper.writerFor(Person.class).with(schema);

        try (FileWriter fileWriter = new FileWriter(path)) {
            List<String> givenNames = Arrays.asList("Adrien", "Aimon", "Alerion", "Alexis", "Alezan", "Ancil", "Andre", "Antoine", "Archard", "Aurélien", "Averill", "Baptiste", "Barnard", "Bartelemy", "Bastien", "Baylee", "Beale", "Beau", "Beaumont", "Beauregard", "Bellamy", "Berger", "Blaize", "Blondel", "Boyce", "Bruce", "Brunelle", "Brys", "Burcet", "Burnell", "Burrell", "Byron", "Canaan", "Carden", "Carolas", "Cavell", "Chace", "Chanler", "Chante", "Chappel", "Charles", "Chasen", "Chason", "Chemin", "Chene", "Cher", "Chevalier", "Cheyne", "Clément", "Clemence", "Corbin", "Coty", "Cygne", "Damien", "Dandre", "Dariel", "Darl", "Dauphine", "Davet", "Dax", "Dean", "Delice", "Delmon", "Destin", "Dominique", "Donatien", "Duke", "Eliott", "Elroy", "Enzo", "Erwan", "Etalon", "Ethan", "Fabron", "Ferrand", "Filberte", "Florent", "Florian", "Fontaine", "Forest", "Fortune", "Franchot", "Francois", "Fraser", "Frayne", "Gaëtan", "Gabin", "Gage", "Gaige", "Garland", "Garner", "Gaston", "Gauge", "Gaylord", "Germain", "Germaine", "German", "Gervaise", "Giles", "Gilles", "Gitan", "Grosvener", "Guifford", "Guion", "Guy", "Guzman", "Henri", "Holland", "Hugo", "Hugues", "Hyacinthe", "Jérémy", "Jacquan", "Jacques", "Jacquez", "Janvier", "Jardan", "Jay", "Jaye", "Jehan", "Jemond", "Jocquez", "Jonathan", "Jules", "Julien", "Justus", "Karoly", "Lado", "Lafayette", "Lamond", "Lancelin", "Landis", "Landry", "Laron", "Larrimore", "Laurent", "LaValle", "Leandre", "Leggett", "Leonce", "Leron", "Leverett", "Lilian", "Loïc", "Lorenzo", "Louis", "Lowell", "Luc", "Lucien", "Lukas", "Macaire", "Mace", "Mahieu", "Maison", "Malleville", "Manneville", "Mantel", "Marc", "Marcel", "Marion", "Marius", "Markez", "Markis", "Marmion", "Marquis", "Marquise", "Marshall", "Martial", "Maslin", "Mason", "Matheo", "Mathias", "Mathys", "Matthieu", "Maxence", "Mayson", "Mehdi", "Merle", "Merville", "Montague", "Montaigu", "Monte", "Montgomery", "Montreal", "Montrel", "Moore", "Morel", "Mortimer", "Nerville", "Neuveville", "Nicolas", "Noë", "Noah", "Noe", "Norman", "Norville", "Nouel", "Olivier", "Onfroi", "Paien", "Parfait", "Parnell", "Pascal", "Patrice", "Paul", "Peppin", "Percival", "Percy", "Pernell", "Peverell", "Philipe", "Pierpont", "Pierre", "Pomeroy", "Prewitt", "Purvis", "Quennell", "Quentin", "Quincey", "Quincy", "Quintin", "Rémi", "Rafaelle", "Ranger", "Raoul", "Raphaël", "Rapier", "Rawlins", "Ray", "Raynard", "Remi", "René", "Renard", "Rene", "Reule", "Reynard", "Robin", "Romain", "Rondel", "Roy", "Royal", "Ruff", "Rush", "Russel", "Rustin", "Sabastien", "Sacha", "Salomon", "Samuel", "Satordi", "Saville", "Scoville", "Sebastien", "Sennett", "Severin", "Shant", "Shantae", "Sidney", "Siffre", "Simeon", "Simon", "Sinclair", "Sofiane", "Somer", "Stephane", "Sully", "Sydney", "Sylvain", "Talbot", "Talon", "Telford", "Tempest", "Teppo", "Théo", "Thayer", "Thibault", "Thibaut", "Thiery", "Tiennan", "Tiennot", "Titouan", "Toussaint", "Travaris", "Tyson", "Urson", "Vachel", "Valentin", "Valere", "Vallis", "Verdun", "Victoir", "Victor", "Waltier", "William", "Wyatt", "Yanis", "Yann", "Yves", "Yvon", "Zosime", "Abrial", "Abrielle", "Abril", "Adele", "Alair", "Alerion", "Amee", "Angelique", "Annette", "Antonella", "Arian", "Ariane", "Armandina", "Aubree", "Aubrielle", "Audra", "Avril", "Bella", "Berneta", "Bette", "Blaise", "Blanche", "Blasa", "Bonte", "Brie", "Brienne", "Brigit", "Cachay", "Calice", "Camille", "Camylle", "Caprice", "Caressa", "Caroline", "Catin", "Celesta", "Celeste", "Cera", "Cerise", "Chablis", "Chalice", "Chambray", "Champagne", "Chandell", "Chaney", "Chantal", "Chante", "Chanterelle", "Chantile", "Chantilly", "Chantrice", "Charla", "Charlotte", "Charmane", "Chaton", "Chemin", "Chenetta", "Cher", "Chere", "Cheri", "Cheryl", "Christine", "Cidney", "Cinderella", "Claire", "Claudette", "Colette", "Cordelle", "Cydnee", "Daeja", "Daija", "Daja", "Damzel", "Darelle", "Darlene", "Darselle", "Dejanelle", "Deleena", "Delice", "Demeri", "Deni", "Denise", "Desgracias", "Desire", "Desiree", "Destanee", "Destiny", "Dior", "Domanique", "Dominique", "Elaina", "Elaine", "Elayna", "Elise", "Eloisa", "Elyse", "Emeline", "Emmaline", "Emmeline", "Estella", "Estrella", "Etiennette", "Evette", "Fabienne", "Fabrienne", "Fanchon", "Fancy", "Fawna", "Fayana", "Fayette", "Fifi", "Fleur", "Fleurette", "Fontanna", "Fosette", "Francine", "Frederique", "Gabriel", "Gabriele", "Gabrielle", "Gaby", "Garcelle", "Gena", "Genie", "Georgette", "Germaine", "Gervaise", "Gitana", "Harriet", "Heloisa", "Holland", "Honnetta", "Isabelle", "Ivette", "Ivonne", "Jacqueena", "Jacquetta", "Jacquiline", "Jacyline", "Jaime", "Jakqueline", "Janeen", "Janelly", "Janina", "Janiqua", "Janique", "Jannnelle", "Jaquita", "Jardena", "Jeanetta", "Jermaine", "Jessamine", "Jewel", "Jewell", "Joli", "Jolie", "Josephine", "Jozephine", "Julieta", "Karessa", "Karmaine", "Klara", "Laine", "Lanelle", "Laramie", "Layne", "Layney", "Leala", "Leonette", "Lissette", "Lizette", "Lourdes", "Lucienne", "Ly", "Lyla", "Lysette", "Madelaine", "Malerie", "Manette", "Marais", "Marcelle", "Marché", "Mardi", "Margo", "Marguerite", "Marie", "Marie Claude", "Marie Frances", "Marie Joelle", "Marie Pascale", "Marie Sophie", "Marjolaine", "Marquise", "Marvella", "Mathieu", "Matisse", "Maurelle", "Maurissa", "Mavis", "Melisande", "Michelle", "Miette", "Mignon", "Mimi", "Mirya", "Monet", "Moniqua", "Monteen", "Musetta", "Myrlie", "Nadeen", "Nadia", "Nadiyah", "Naeva", "Nanon", "Natalle", "Naudia", "Nettie", "Nicholas", "Nicki", "Nicky", "Nicole", "Nicolette", "Nicolina", "Nicolle", "Nikolette", "Ninette", "Ninon", "Noelle", "Nycole", "Odelette", "Opaline", "Orane", "Orva", "Page", "Parisa", "Parnel", "Parris", "Patrice", "Peridot", "Pippi", "Prairie", "Rachele", "Rachelle", "Racquel", "Raphaelle", "Raquelle", "Remi", "Renée", "Renea", "Renelle", "Renita", "Risette", "Rochelle", "Romy", "Rosabel", "Rosiclara", "Ruba", "Russhell", "Saleena", "Salina", "Satin", "Sedona", "Serene", "Shandelle", "Shanta", "Shante", "Shariah", "Sharita", "Sharleen", "Sheree", "Shereen", "Sherell", "Sherice", "Sherry", "Sidnee", "Sidney", "Sidnie", "Sidonie", "Sinclaire", "Solange", "Solen", "Sorrel", "Suzette", "Sydnee", "Sydney", "Tallis", "Tempest", "Toinette", "Turquoise", "Veronique", "Vignette", "Villette", "Violeta", "Virginie", "Voleta", "Vonny");
            List<String> surnames = Arrays.asList("Arceneau", "Aucoin", "Babin", "Babineaux", "Benoit", "Bergeron", "Bernard", "Bertrand", "Bessette", "Blanc", "Blanchard", "Bonnet", "Boucher", "Bourg", "Bourque", "Boutin", "Bouvier", "Braud", "Broussard", "Brun", "Chevalier", "David", "Depaul", "Desmarais", "Disney", "Dubois", "Dupont", "Dupuis", "Durand", "Fortescue", "Fournier", "Garnier", "Gaudet", "Gillet", "Gillette", "Girard", "Gravois", "Grosvenor", "Lambert", "Landry", "Laroche", "Laurent", "Lefevre", "Leroy", "Leveque", "Lisle", "Martin", "Michel", "Molyneux", "Moreau", "Morel", "Neville", "Pelletier", "Petit", "Prideux", "Renard", "Richard", "Robert", "Rousseau", "Roux", "Rufus", "Simon", "Thomas");
            Iterable<Person> persons = () -> {
                return new Iterator<Person>() {
                    int counter = 40_000; //0_000;  // 40_000 yields about 20 MB of data.

                    @Override
                    public boolean hasNext() {
                        return counter-- > 0;
                    }

                    @Override
                    public Person next() {
                        String givenName = givenNames.get(random.nextInt(0, givenNames.size()));
                        String surname = surnames.get(random.nextInt(0, surnames.size()));
                        UUID id = UUID.randomUUID();
                        String description = Person.LOREM_IPSUM;
                        return new Person(givenName, surname, id, description);
                    }
                };
            };
            writer.writeValues(fileWriter).writeAll(persons);
        }
    }
}

class Person {
    // Static
    static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    // Member variables.
    private String givenName, surname, description;
    private UUID id;

    public Person() {

    }

    public Person(String givenName, String surname, UUID id, String description) {
        this.givenName = givenName;
        this.surname = surname;
        this.id = id;
        this.description = description;
    }

    public String getGivenName() {
        return givenName;
    }

    public void setGivenName(String givenName) {
        this.givenName = givenName;
    }

    public String getSurname() {
        return surname;
    }

    public void setSurname(String surname) {
        this.surname = surname;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }

    public UUID getId() {
        return id;
    }

    public void setId(UUID id) {
        this.id = id;
    }

    @Override
    public String toString() {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

2
投票

这是我的Basil提供的解决方案版本,但这使用univocity-parsers

public class CsvSpeed {

public static class Person {
    // Static
    static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    // Member variables.
    @Parsed
    public String givenName, surname, description;

    public UUID id;

    @Parsed
    public void id(String id) {
        this.id = UUID.fromString(id);
    }

    @Override
    public String toString() {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

public List<Person> read(Path path) {
    return new CsvRoutines(Csv.parseRfc4180()).parseAll(Person.class, path.toFile(), "UTF-8", 40_000);
}

public void write(final Path path) {
    ThreadLocalRandom random = ThreadLocalRandom.current();
    CsvWriter writer = new CsvWriter(path.toFile(), "UTF-8", Csv.writeRfc4180());
    writer.writeHeaders("givenName" , "surname" , "id" , "description");

    int limit = 40_000;  // 40_000 yields about 20 MB of data.
    List<String> givenNames = List.of("Adrien", "Aimon", "Alerion", "Alexis", "Alezan", "Ancil", "Andre", "Antoine", "Archard", "Aurélien", "Averill", "Baptiste", "Barnard", "Bartelemy", "Bastien", "Baylee", "Beale", "Beau", "Beaumont", "Beauregard", "Bellamy", "Berger", "Blaize", "Blondel", "Boyce", "Bruce", "Brunelle", "Brys", "Burcet", "Burnell", "Burrell", "Byron", "Canaan", "Carden", "Carolas", "Cavell", "Chace", "Chanler", "Chante", "Chappel", "Charles", "Chasen", "Chason", "Chemin", "Chene", "Cher", "Chevalier", "Cheyne", "Clément", "Clemence", "Corbin", "Coty", "Cygne", "Damien", "Dandre", "Dariel", "Darl", "Dauphine", "Davet", "Dax", "Dean", "Delice", "Delmon", "Destin", "Dominique", "Donatien", "Duke", "Eliott", "Elroy", "Enzo", "Erwan", "Etalon", "Ethan", "Fabron", "Ferrand", "Filberte", "Florent", "Florian", "Fontaine", "Forest", "Fortune", "Franchot", "Francois", "Fraser", "Frayne", "Gaëtan", "Gabin", "Gage", "Gaige", "Garland", "Garner", "Gaston", "Gauge", "Gaylord", "Germain", "Germaine", "German", "Gervaise", "Giles", "Gilles", "Gitan", "Grosvener", "Guifford", "Guion", "Guy", "Guzman", "Henri", "Holland", "Hugo", "Hugues", "Hyacinthe", "Jérémy", "Jacquan", "Jacques", "Jacquez", "Janvier", "Jardan", "Jay", "Jaye", "Jehan", "Jemond", "Jocquez", "Jonathan", "Jules", "Julien", "Justus", "Karoly", "Lado", "Lafayette", "Lamond", "Lancelin", "Landis", "Landry", "Laron", "Larrimore", "Laurent", "LaValle", "Leandre", "Leggett", "Leonce", "Leron", "Leverett", "Lilian", "Loïc", "Lorenzo", "Louis", "Lowell", "Luc", "Lucien", "Lukas", "Macaire", "Mace", "Mahieu", "Maison", "Malleville", "Manneville", "Mantel", "Marc", "Marcel", "Marion", "Marius", "Markez", "Markis", "Marmion", "Marquis", "Marquise", "Marshall", "Martial", "Maslin", "Mason", "Matheo", "Mathias", "Mathys", "Matthieu", "Maxence", "Mayson", "Mehdi", "Merle", "Merville", "Montague", "Montaigu", "Monte", "Montgomery", "Montreal", "Montrel", "Moore", "Morel", "Mortimer", "Nerville", "Neuveville", "Nicolas", "Noë", "Noah", "Noe", "Norman", "Norville", "Nouel", "Olivier", "Onfroi", "Paien", "Parfait", "Parnell", "Pascal", "Patrice", "Paul", "Peppin", "Percival", "Percy", "Pernell", "Peverell", "Philipe", "Pierpont", "Pierre", "Pomeroy", "Prewitt", "Purvis", "Quennell", "Quentin", "Quincey", "Quincy", "Quintin", "Rémi", "Rafaelle", "Ranger", "Raoul", "Raphaël", "Rapier", "Rawlins", "Ray", "Raynard", "Remi", "René", "Renard", "Rene", "Reule", "Reynard", "Robin", "Romain", "Rondel", "Roy", "Royal", "Ruff", "Rush", "Russel", "Rustin", "Sabastien", "Sacha", "Salomon", "Samuel", "Satordi", "Saville", "Scoville", "Sebastien", "Sennett", "Severin", "Shant", "Shantae", "Sidney", "Siffre", "Simeon", "Simon", "Sinclair", "Sofiane", "Somer", "Stephane", "Sully", "Sydney", "Sylvain", "Talbot", "Talon", "Telford", "Tempest", "Teppo", "Théo", "Thayer", "Thibault", "Thibaut", "Thiery", "Tiennan", "Tiennot", "Titouan", "Toussaint", "Travaris", "Tyson", "Urson", "Vachel", "Valentin", "Valere", "Vallis", "Verdun", "Victoir", "Victor", "Waltier", "William", "Wyatt", "Yanis", "Yann", "Yves", "Yvon", "Zosime", "Abrial", "Abrielle", "Abril", "Adele", "Alair", "Alerion", "Amee", "Angelique", "Annette", "Antonella", "Arian", "Ariane", "Armandina", "Aubree", "Aubrielle", "Audra", "Avril", "Bella", "Berneta", "Bette", "Blaise", "Blanche", "Blasa", "Bonte", "Brie", "Brienne", "Brigit", "Cachay", "Calice", "Camille", "Camylle", "Caprice", "Caressa", "Caroline", "Catin", "Celesta", "Celeste", "Cera", "Cerise", "Chablis", "Chalice", "Chambray", "Champagne", "Chandell", "Chaney", "Chantal", "Chante", "Chanterelle", "Chantile", "Chantilly", "Chantrice", "Charla", "Charlotte", "Charmane", "Chaton", "Chemin", "Chenetta", "Cher", "Chere", "Cheri", "Cheryl", "Christine", "Cidney", "Cinderella", "Claire", "Claudette", "Colette", "Cordelle", "Cydnee", "Daeja", "Daija", "Daja", "Damzel", "Darelle", "Darlene", "Darselle", "Dejanelle", "Deleena", "Delice", "Demeri", "Deni", "Denise", "Desgracias", "Desire", "Desiree", "Destanee", "Destiny", "Dior", "Domanique", "Dominique", "Elaina", "Elaine", "Elayna", "Elise", "Eloisa", "Elyse", "Emeline", "Emmaline", "Emmeline", "Estella", "Estrella", "Etiennette", "Evette", "Fabienne", "Fabrienne", "Fanchon", "Fancy", "Fawna", "Fayana", "Fayette", "Fifi", "Fleur", "Fleurette", "Fontanna", "Fosette", "Francine", "Frederique", "Gabriel", "Gabriele", "Gabrielle", "Gaby", "Garcelle", "Gena", "Genie", "Georgette", "Germaine", "Gervaise", "Gitana", "Harriet", "Heloisa", "Holland", "Honnetta", "Isabelle", "Ivette", "Ivonne", "Jacqueena", "Jacquetta", "Jacquiline", "Jacyline", "Jaime", "Jakqueline", "Janeen", "Janelly", "Janina", "Janiqua", "Janique", "Jannnelle", "Jaquita", "Jardena", "Jeanetta", "Jermaine", "Jessamine", "Jewel", "Jewell", "Joli", "Jolie", "Josephine", "Jozephine", "Julieta", "Karessa", "Karmaine", "Klara", "Laine", "Lanelle", "Laramie", "Layne", "Layney", "Leala", "Leonette", "Lissette", "Lizette", "Lourdes", "Lucienne", "Ly", "Lyla", "Lysette", "Madelaine", "Malerie", "Manette", "Marais", "Marcelle", "Marché", "Mardi", "Margo", "Marguerite", "Marie", "Marie Claude", "Marie Frances", "Marie Joelle", "Marie Pascale", "Marie Sophie", "Marjolaine", "Marquise", "Marvella", "Mathieu", "Matisse", "Maurelle", "Maurissa", "Mavis", "Melisande", "Michelle", "Miette", "Mignon", "Mimi", "Mirya", "Monet", "Moniqua", "Monteen", "Musetta", "Myrlie", "Nadeen", "Nadia", "Nadiyah", "Naeva", "Nanon", "Natalle", "Naudia", "Nettie", "Nicholas", "Nicki", "Nicky", "Nicole", "Nicolette", "Nicolina", "Nicolle", "Nikolette", "Ninette", "Ninon", "Noelle", "Nycole", "Odelette", "Opaline", "Orane", "Orva", "Page", "Parisa", "Parnel", "Parris", "Patrice", "Peridot", "Pippi", "Prairie", "Rachele", "Rachelle", "Racquel", "Raphaelle", "Raquelle", "Remi", "Renée", "Renea", "Renelle", "Renita", "Risette", "Rochelle", "Romy", "Rosabel", "Rosiclara", "Ruba", "Russhell", "Saleena", "Salina", "Satin", "Sedona", "Serene", "Shandelle", "Shanta", "Shante", "Shariah", "Sharita", "Sharleen", "Sheree", "Shereen", "Sherell", "Sherice", "Sherry", "Sidnee", "Sidney", "Sidnie", "Sidonie", "Sinclaire", "Solange", "Solen", "Sorrel", "Suzette", "Sydnee", "Sydney", "Tallis", "Tempest", "Toinette", "Turquoise", "Veronique", "Vignette", "Villette", "Violeta", "Virginie", "Voleta", "Vonny");
    List<String> surnames = List.of("Arceneau", "Aucoin", "Babin", "Babineaux", "Benoit", "Bergeron", "Bernard", "Bertrand", "Bessette", "Blanc", "Blanchard", "Bonnet", "Boucher", "Bourg", "Bourque", "Boutin", "Bouvier", "Braud", "Broussard", "Brun", "Chevalier", "David", "Depaul", "Desmarais", "Disney", "Dubois", "Dupont", "Dupuis", "Durand", "Fortescue", "Fournier", "Garnier", "Gaudet", "Gillet", "Gillette", "Girard", "Gravois", "Grosvenor", "Lambert", "Landry", "Laroche", "Laurent", "Lefevre", "Leroy", "Leveque", "Lisle", "Martin", "Michel", "Molyneux", "Moreau", "Morel", "Neville", "Pelletier", "Petit", "Prideux", "Renard", "Richard", "Robert", "Rousseau", "Roux", "Rufus", "Simon", "Thomas");
    for (int i = 1; i <= limit; i++) {
        String givenName = givenNames.get(random.nextInt(0, givenNames.size()));
        String surname = surnames.get(random.nextInt(0, surnames.size()));
        UUID id = UUID.randomUUID();
        String description = Person.LOREM_IPSUM;
        writer.writeRow(givenName, surname, id, description);
    }
    writer.close();
}

public static void main(final String[] args) {
    // Launch the app.
    CsvSpeed app = new CsvSpeed();

    // Write.
    String when = Instant.now().truncatedTo(ChronoUnit.SECONDS).toString().replace(":", "•");
    Path pathOutput = Paths.get("/tmp/persons.csv");
    app.write(pathOutput);
    System.out.println("Writing file: " + pathOutput);

    // Read.
    long start = System.nanoTime();
    Path pathInput = Paths.get("/tmp/persons.csv");
    List<Person> list = app.read(pathInput);
    long stop = System.nanoTime();

    // Time.
    long elapsed = (stop - start);
    Duration d = Duration.ofNanos(elapsed);
    System.out.println("Reading elapsed: " + d);
    System.out.println("Reading took nanos per row: " + (elapsed / list.size()));
    System.out.println("nanos elapsed: " + elapsed + "  |  list.size: " + list.size());
}
}

在我的机器上运行我得到以下时间:

Writing file: /tmp/persons.csv
Reading elapsed: PT0.230395859S
Reading took nanos per row: 5759
nanos elapsed: 230395859  |  list.size: 40000

它没有显示一旦你考虑JIT踢入并优化代码,解析器可以获得多快。我已经更改了代码以生成400K记录(导致200 MB文件)。现在代码打印:

Reading elapsed: PT0.993483883S
Reading took nanos per row: 2483
nanos elapsed: 993483883  |  list.size: 400000

并且有4M行(几乎2GB的数据):

Reading elapsed: PT7.961481755S
Reading took nanos per row: 1990
nanos elapsed: 7961481755  |  list.size: 4000000
© www.soinside.com 2019 - 2024. All rights reserved.