连接外部和内部表时 hive 失败

问题描述 投票:0回答:2

我们的环境/版本

hadoop 3.2.3
hive    3.1.3
spark   2.3.0

我们在 hive 中的内表定义为

CREATE TABLE dw.CLIENT
(
client_id integer,
client_abbrev string,
client_name string,
effective_start_ts timestamp,
effective_end_ts timestamp,
active_flag string,
record_version integer
)
stored as orc tblproperties ('transactional'='true');

外部作为

CREATE EXTERNAL TABLE  ClientProcess_21
 ( ClientId string, ClientDescription string, IsActive string, OldClientId string, NewClientId string, Description string, 
 TinyName string, FinanceCode string, ParentClientId string, ClientStatus string, FSPortalClientId string,)   
 ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','   STORED AS TEXTFILE   LOCATION '.../client_extract_20220801.csv/' TBLPROPERTIES ("skip.header.line.count"="1")

我可以从两个表中进行选择。

当我尝试加入它们时,内部表是空的

select
    null, s.*
from  ClientProcess_21  s
join dw.client t
    on s.ClientId = t.client_id

Hive 失败了

SQL Error [3] [42000]: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during runtime. Please check stacktrace for the root cause.
partial stack trace from the Hive log
2022-08-01T18:53:39,012  INFO [RPC-Handler-1] client.SparkClientImpl: Received result for 07a38056-5ba8-45e0-8783-397f25f398cb
2022-08-01T18:53:39,219 ERROR [HiveServer2-Background-Pool: Thread-1667] status.SparkJobMonitor: Job failed with java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions;
        at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.useUTCTimestamp(OrcFile.java:286)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.<init>(OrcFile.java:113)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.writerOptions(OrcFile.java:317)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getOptions(OrcOutputFormat.java:126)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:184)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:61)
        at org.apache.hadoop.hive.ql.exec.Utilities.createEmptyFile(Utilities.java:3458)
        at org.apache.hadoop.hive.ql.exec.Utilities.createDummyFileForEmptyPartition(Utilities.java:3489)
        at org.apache.hadoop.hive.ql.exec.Utilities.access$300(Utilities.java:222)
        at org.apache.hadoop.hive.ql.exec.Utilities$GetInputPathsCallable.call(Utilities.java:3433)
        at org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3370)
        at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.cloneJobConf(SparkPlanGenerator.java:318)
        at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:241)
        at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:113)
        at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:359)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:378)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions;
        at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.useUTCTimestamp(OrcFile.java:286)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.<init>(OrcFile.java:113)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.writerOptions(OrcFile.java:317)
        at org.apache.hadoop.hive.q

********更新 表上的 DML 定义为 ..stored as orc tblproperties ('transactional'='true');

失败了

2022-08-02 09:47:42 ERROR SparkJobMonitor:1250 - Job failed with java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions;
java.util.concurrent.ExecutionException: Exception thrown by job
 ,,
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.222.108.202, executor 0): java.lang.RuntimeException: Error processing row: java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions;
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:149)
..
Caused by: java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions;
        at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.useUTCTimestamp(OrcFile.java:286)
apache-spark hive orc
2个回答
0
投票

我认为这与加入时的数据类型对话有关。一个连接列是字符串,另一个是 int。 你可以试试这个吗

select
    null, s.*
from  ClientProcess_21  s
join dw.client t
    on s.ClientId = cast(t.client_id as string) -- cast it to string

0
投票

通过将 orc jar 复制到 Spark home 来解决

cp $HIVE_HOME/lib/*orc* $SPARK_HOME/jars/
cp $HIVE_HOME/hive-storage-api-2.7.0.jar $SPARK_HOME/jars/
© www.soinside.com 2019 - 2024. All rights reserved.