我想加载更大的数据行,因此我的计划是将语句划分为多个部分,再按时间戳划分,然后异步运行它。
...
// List to save ResultSets
List<CompletableFuture<AsyncResultSet>> pending = new ArrayList<>();
for(Range range : ranges) {
System.out.println("Asynchronous execute query will be called soon!");
pending.add(executeQuery(session, preparedStatement, range));
}
...
private static CompletableFuture<AsyncResultSet> executeQuery(CqlSession session,
PreparedStatement preparedStatement, Range range) {
return session
.executeAsync(preparedStatement.bind()
.setInstant("startDateTime", range.getStartDateTime().toInstant())
.setInstant("endDateTime", range.getEndDateTime().toInstant())
.setPageSize(1000000))
.toCompletableFuture()
.whenCompleteAsync((asyncResultSet, throwable) -> {
if (throwable == null) {
System.out.println("Range " + range.getStart() + " to " + range.getEnd() +
" has " + asyncResultSet.remaining() + " records.");
fetchResultSet(asyncResultSet, throwable);
if(asyncResultSet.hasMorePages()) {
asyncResultSet.fetchNextPage().whenComplete(LoadCassandraAsync::fetchResultSet);
}
} else {
throwable.printStackTrace();
}
}, Executors.newFixedThreadPool(4))
.exceptionally(throwable -> {
throwable.printStackTrace();
return null;
});
}
我将随机退出代码0(不是从main方法获取),表示已关闭。或者,在进行某些提取后,我什么也不会得到,就像有一个线程在运行但什么都不做。
如果我评论“行获取”部分,则得到:
...
Asynchronous execute query will be called soon!
Asynchronous execute query will be called soon!
Asynchronous execute query will be called soon!
Asynchronous execute query will be called soon!
Range 2020-02-14 00:00:00+0700 to 2020-02-14 01:00:00+0700 has 102974 records.
Range 2020-02-14 01:00:00+0700 to 2020-02-14 02:00:00+0700 has 98201 records.
Range 2020-02-14 06:00:00+0700 to 2020-02-14 07:00:00+0700 has 104529 records.
Range 2020-02-14 08:00:00+0700 to 2020-02-14 09:00:00+0700 has 105257 records.
...
我认为这意味着executeQuery()
方法效果很好。
我做错了什么?
取决于查询的数量,您可能会耗尽cassandra线程-parallel_reads(如果我没记错的话,默认值为250)。如果检查日志(/var/log/cassandra/system.log
),则应该有一条与该问题有关的消息。要解决此问题,例如,在发送200个查询后添加一个人工Thread.wait。