Spark无法在分区和追加模式下写入新的配置单元表

问题描述 投票:0回答:1
  1. partitionedORC格式在配置单元中创建了一个新表。
  2. 通过使用appendorcpartitioned模式使用spark写入此表。

它失败,但例外:

org.apache.spark.sql.AnalysisException: The format of the existing table test.table1 is `HiveFileFormat`. It doesn't match the specified format `OrcFileFormat`.;
  1. 我在编写时将格式从“ orc”更改为“ hive”。它仍然会失败,并带有以下例外:Spark无法理解表的底层结构。

因此发生此问题,因为spark is not able to write into hive table in append mode , because it cant create a new table。我可以执行overwrite successfully because spark creates a table again

但是我的用例是从一开始就写入附加模式。 InsertInto also does not work specifically for partitioned tables。我的用例几乎被阻止了。任何帮助都会很棒。

Edit1:在HDP 3.1.0环境中工作。

Spark版本为2.3.2

Hive版本为3.1.0

编辑2:

// Reading the table 
val inputdf=spark.sql("select id,code,amount from t1")

//writing into table
inputdf.write.mode(SaveMode.Append).partitionBy("code").format("orc").saveAsTable("test.t2")

编辑3:使用insertInto()

val df2 =spark.sql("select id,code,amount from t1")
df2.write.format("orc").mode("append").insertInto("test.t2");

我收到错误为:

20/05/17 19:15:12 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
20/05/17 19:15:12 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
20/05/17 19:15:13 WARN AcidUtils: Cannot get ACID state for test.t1 from null
20/05/17 19:15:13 WARN AcidUtils: Cannot get ACID state for test.t1 from null
20/05/17 19:15:13 WARN HiveMetastoreCatalog: Unable to infer schema for table test.t1 from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.

如果我重新运行insertInto命令,则会收到以下异常:

20/05/17 19:16:37 ERROR Hive: MetaException(message:The transaction for alter partition did not commit successfully.)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result$alter_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result$alter_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)

Hive Metastore日志中的错误:

2020-05-17T21:17:43,891 INFO  [pool-8-thread-198]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(907)) - 163: alter_partitions : tbl=hive.test.t1
2020-05-17T21:17:43,891 INFO  [pool-8-thread-198]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(349)) - [email protected]  ip=10.10.1.36   cmd=alter_partitions : tbl=hive.test.t1
2020-05-17T21:17:43,891 INFO  [pool-8-thread-198]: metastore.HiveMetaStore (HiveMetaStore.java:alter_partitions_with_environment_context(5119)) - New partition values:[BR]
2020-05-17T21:17:43,913 ERROR [pool-8-thread-198]: metastore.ObjectStore (ObjectStore.java:alterPartitions(4397)) - Alter failed
org.apache.hadoop.hive.metastore.api.MetaException: Cannot change stats state for a transactional table without providing the transactional write state for verification (new write ID -1, valid write IDs null; current state null; new state {}
apache-spark hive apache-spark-sql hdfs orc
1个回答
0
投票

我能够在用例中使用外部表来解决此问题。我们目前在火花方面有一个未解决的问题,与蜂巢的酸性有关。在外部模式下创建配置单元表后,便能够在分区/非分区表中执行追加操作。https://issues.apache.org/jira/browse/SPARK-15348

© www.soinside.com 2019 - 2024. All rights reserved.