这个问题在这里已有答案:
我正在阅读一个包含大约100万条记录的文本文件。每行有6个特定值,用“#”分隔。我正在使用BufferedReader逐行读取它并使用StringTokenzier来相应地打破它。然后我存储令牌将字符串分隔成变量,创建插入查询,将该查询添加到批处理并更新它。但是只需插入1,50,000(约),需要1小时。在某处,我读到BatchUpdate只需6秒即可插入1,50,000条记录。
请指教?这里我通过构造函数生成文件路径,所以忽略它。
这是我的插入代码
class CallLog extends Thread
{
private String var_callerNumber;
private String var_shortCode;
private String var_crbt_callDate;
private String var_crbt_startTime;
private String var_crbt_endTime;
private String var_crbt_duration;
private String val_filename="";
private String filepath="";
String line="";
int nToken=0;
Connection con=null;
BufferedReader reader;
try {
logger.info("Final call_log file path is "+filepath);
Statement st=con.createStatement();
PreparedStatement pst=con.prepareStatement("insert into tbl_crbt_calllog(caller_no,short_code,call_date,start_time,end_time,duration) values(?,?,?,?,?,?)");
File file =new File(filepath);
reader=new BufferedReader(new FileReader(file));
while((line=reader.readLine())!=null)
{
/*line=reader.readLine();*/
StringTokenizer token = new StringTokenizer(line,"#");
nToken=token.countTokens();
if(nToken==6)
{
var_callerNumber=token.nextToken().trim();
var_shortCode=token.nextToken().trim();
var_crbt_callDate=token.nextToken().trim();
var_crbt_startTime=token.nextToken().trim();
var_crbt_endTime=token.nextToken().trim();
var_crbt_duration=token.nextToken().trim();
pst.setString(1, var_callerNumber);
pst.setString(2, var_shortCode);
pst.setString(3, var_crbt_callDate);
pst.setString(4, var_crbt_startTime);
pst.setString(5, var_crbt_endTime);
pst.setString(6, var_crbt_duration);
pst.addBatch();
}
else
{
logger.info("No of Token is greater or less then 6 "+line);
}
}
pst.executeBatch();
con.close();
} catch (Exception e) {
e.printStackTrace();
}
finally
{
if(reader!=null)
{
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
我也尝试过Batch size apporach但它也不适用于我
pst.addBatch();
if(batchsize++==10000)
{
System.out.println("Uploading batchSize of "+
batchsize);
pst.executeBatch();
pst.clearBatch();
batchsize=0;
}
if(batchsize>0)
{
pst.executeBatch();
}
这是我的文件Sample
237664016726#811#20190218#220207#000207#3600 237665946738#811#20190218#222747#002747#3600 237664016726#811#20190218#224234#004234#3600 237661183627#81152#20190219#020741#020900#79
运行sql语句时,需要一次插入多个条目。所以你会像这样使用insert语句:
INSERT INTO tbl_crbt_calllog
(caller_no, short_code, call_date, start_time, end_time, duration)
VALUES
(..., ..., ..., ..., ..., ...),
(..., ..., ..., ..., ..., ...),
(..., ..., ..., ..., ..., ...),
(..., ..., ..., ..., ..., ...),
[...]
(..., ..., ..., ..., ..., ...),
(..., ..., ..., ..., ..., ...),
(..., ..., ..., ..., ..., ...);
而不是这样的:
INSERT INTO tbl_crbt_calllog
(caller_no, short_code, call_date, start_time, end_time, duration)
VALUES
(..., ..., ..., ..., ..., ...);
使用此技术,您可以一次插入多个值,这些值以相同的结果结束,但花费的时间更少。请享用!