PySpark JDBC写入MySQL（TiDB）

问题描述投票：0回答：1

我正在尝试将pyspark数据帧（数百万行）写入TIDB，（Spark 2.3）

df.write.format('jdbc').options(
  url='jdbc:mysql://<host>:<port>/<table>',
  driver='com.mysql.jdbc.Driver',
  dbtable='<tablename>',
  user='<username>',
  password='<password>',
  batchsize = 30000,
  truncate = True
).mode('overwrite').save()

但是，我一直得到的只是这个错误

Caused by: java.sql.BatchUpdateException: statement count 5001 exceeds the transaction limitation, autocommit = false
....
....
....
Caused by: java.sql.SQLException: statement count 5001 exceeds the transaction limitation, autocommit = false

任何想法我该如何解决？

pyspark pyspark-dataframes tidb

1个回答

1
投票

您应该将?rewriteBatchedStatements=true添加到JDBC URI，以便对DML语句进行批处理。不仅写入速度会更快，而且您也不会如此轻松地达到数据库事务限制。

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.