番石榴/星火问题

问题描述 投票:0回答:1

我的Spark版本为2.2.0,可在本地运行,但在具有相同版本的EMR上,它具有以下异常。

org.apache.spark.SparkException:作业中止。在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun上的org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply $ mcV $ sp(FileFormatWriter.scala:215) org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply(FileFormatWriter.scala:173)位于org.apache.spark.sql处的$ write $ 1.apply(FileFormatWriter.scala:173)。在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .write(FileFormatWriter.scala:173)处的execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala:65)在org.apache.spark.sql.execution.datasources处。在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult $ lzycompute(commands.scala:58)处的InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)在org.apache.spark.sql.execution.command.ExecutedCommandExec处。 org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)的sideEffectResult(commands.scala:56)在org.apache.spark.sql.execution.Sp处位于org.apache.spark.sql.execution的arkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:117),位于org.apache.spark的arkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:117)。 org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)的sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply(SparkPlan.scala:138)在org.apache.spark.sql.execution org.apache.spark.sql.execution上的.SparkPlan.executeQuery(SparkPlan.scala:135).org.org.apache.spark.sql.execution.QueryExecution.toRdd $ lzycompute(QueryExecution(SparkPlan.execute(SparkPlan.scala:116)) .orga:92)位于org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)位于org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438) org.apache上的org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)位于org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)。 spark.sql.execution.command.ExecutedCommandExec.s org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(位于org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)处的ideEffectResult $ lzycompute(commands.scala:58)在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:117)在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute org.apache.spark.sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply(SparkPlan.scala:138)上的$ 1.apply(SparkPlan.scala:117)在org.apache.spark.rdd.RDDOperationScope $ .withScope (RDDOperationScope.scala:151)在org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)在org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)在org.apache.spark上的org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)上的org.apache.spark.sql.execution.QueryExecution.toRdd $ lzycompute(QueryExecution.scala:92)。 org.a上的sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)的pache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)org.apache.spark.sql.DataFrameWriter.json(DataFrameWriter。 scala:488)位于DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:80)处Migration.migrate(Migration.scala:196)处于DataMigrationFramework $$ anonfun $ main $ 6.apply(DataMigrationFramework.scala:257)处于DataMigrationFramework $$ anonfun $ main $ 6.apply(DataMigrationFramework.scala:255)在scala.collection.immutable.Range.foreach(Range.scala:160)在DataMigrationFramework $ .main(DataMigrationFramework.scala:255)在DataMigrationFramework.main(DataMigrationFramework.scala)在.reflect.NativeMethodAccessorImpl.invoke0(本机方法)位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)在java.lang.reflect.Method.invoke (Method.java:498)在org.apache.spark.deploy.yarn.ApplicationMaster $$ a non $ 2.run(ApplicationMaster.scala:635)原因:org.apache.spark.SparkException:由于阶段失败而导致作业中止:阶段4.0中的任务3失败了16次,最近一次失败:阶段4.0中的任务3.15丢失(TID 115,ip-10-124-29-109.ec2.internal,executor 2):org.apache.spark.SparkException:在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org上写行时任务失败$ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:272)在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1 $$ anonfun $ apply $ mcV $ sp $ 1.apply(FileFormatWriter.scala:191)at org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1 $$ anonfun $ apply $ mcV $ sp $ 1.apply(FileFormatWriter.scala:190)在org.apache.spark.executor.Executor $ TaskRunner处在org.apache.spark.scheduler.Task.run(Task.scala:108)在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) .run(Executor.scala:335)在java.util.concurrent.ThreadPoolExecut or.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)原因:java.lang .NoSuchMethodError:com.google.common.util.concurrent.RateLimiter.acquire(I)D在com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1 $$ anonfun $ apply $ 2 $$ anonfun $ apply $ 1 .apply $ mcDI $ sp(DynamoDBRelation.scala:138)位于com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1 $$ anonfun $ apply $ 2 $ anonfun $ apply $ 1.apply(DynamoDBRelation.scala: 137),网址为com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1 $$ anonfun $ apply $ 2 $$ anonfun $ apply $ 1.apply(DynamoDBRelation.scala:137),位于scala.Option.foreach(Option。 scala:257)在com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1 $$ anonfun $ apply $ 2.apply(DynamoDBRelation.scala:137)在com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $ $ anonfun $ scan $ 1 $ anonfun $ apply $ 2.apply(DynamoDBRelation.scala :131)位于com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1.apply(DynamoDBRelation.scala:131)位于com.github.traviscrawford的scala.Option.foreach(Option.scala:257)处。 spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1.apply(DynamoDBRelation.scala:115)在scala.collection.Iterator $$ anon $ 12.nextCur(Iterator.scala:434)在scala.collection.Iterator $$ anon $ 12。在scala.collection.Iterator $$ anon $ 12.hasNext(Iterator.scala:438)处的hasNext(Iterator.scala:440)在org.apache处的scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)处org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)上的.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext(未知源)在org.apache.spark.sql.execution上的。 WholeStageCodegenExec $$ anonfun $ 8 $$ anon $ 1.hasNext(WholeStageCodegenExec.scala:395)在org.apache.spark.sql.execution.datasources.FileFormatWriter $ SingleDirectoryWriteTask.execute(FileFormatWriter.scala:315)在org.apaches。 sql.execu tion.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply(FileFormatWriter.scala:258)位于org.apache.spark.sql.execution.datasources.FileFormatWriter $ $ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply(FileFormatWriter.scala:256)at org.apache.spark.util.Utils $ .tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:261)...还有8个驱动程序stacktrace:位于org.apache .spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1690)位于org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler678。 )在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1677)在scala.collection.mutable.ResizableArray $ class.foreac org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1677)的scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)的h(ResizableArray.scala:59)在org.apache.spark的在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:855for at scala)的.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:855)。 (Option.scala:257)在org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:855)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1905)在org.apache。 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1849)上的spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1860)在org.apache.spark.util.EventLoop $$ anon $ .run。 EventLoop.scala:48)位于org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:671)位于org.apache.spark.SparkContext.runJob(SparkContext.scala:202) 2)位于org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply $ mcV $ sp(FileFormatWriter.scala:188)... 46更多原因:org.apache.spark.SparkException :在org.apache的org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:272)处写行时,任务失败。 spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1 $ ananfun $ apply $ mcV $ sp $ 1.apply(FileFormatWriter.scala:191)在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1 $$ anonfun $ apply $ mcV $ sp $ 1.apply(FileFormatWriter.scala:190)在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)在org.apache.spark.scheduler .org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:335)的.Task.run(Task.scala:108),java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)的。 java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExe cutor.java:624),位于java.lang.Thread.run(Thread.java:748),原因:java.lang.NoSuchMethodError:com.google.common.util.concurrent.RateLimiter.acquire(I)D。 github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1 $$ anonfun $ apply $ 2 $ anonfun $ apply $ 1.apply $ mcDI $ sp(DynamoDBRelation.scala:138)在com.github.traviscrawford.spark.dynamodb .com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1 $ anonfun.DynamoDBRelation $$ anonfun $ scan $ 1 $ anonfun $ apply $ 2 $ anonfun $ apply $ 1.apply(DynamoDBRelation.scala:137)。在scala的$ apply $ 2 $ anonfun $ apply $ 1.apply(DynamoDBRelation.scala:137)在com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1 $在scala.Option.foreach(Option.scala:257)在scala上的$ anonfun $ apply $ 2.apply(DynamoDBRelation.scala:137)在com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1 $$ anonfun $ apply $ 2.apply(DynamoDBRelation.scala:131)处。 Option.foreach(Option.scala:257)在com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1.apply(DynamoDBRelation.scala:131)位于com.github.traviscrawford.spark.dynamodb.DynamoDBRelation $$ anonfun $ scan $ 1.apply(DynamoDBRelation.scala:115)位于scala.collection.Iterator $$ anon $ 12在scala.collection.Iterator $$ anon $ 12.hasNext(Iterator.scala:440)处的.nextCur(Iterator.scala:434)在scala.collection.Iterator $$ anon $ 12.hasNext(Iterator.scala:438)处在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext(Unknown Source)处的collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:408)在org.apache.spark.sql.execution.BufferedRowIterator org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $ anon $ 1的.hasNext(BufferedRowIterator.java:43).org.apache.spark.sql.execution.datasources的.hasNext(WholeStageCodegenExec.scala:395) .FileFormatWriter $ SingleDirectoryWriteTask.execute(FileFormatWriter.scala:315)at org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ execut eTask $ 3.apply(FileFormatWriter.scala:258)at org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask $ 3.apply(FileFormatWriter .scala:256)org.apache.spark.util.Utils $ .tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ executionion $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:261)...还有8个错误!

apache-spark guava emr
1个回答
0
投票

此错误的解决方法是将阴影插件用于maven。将此添加到您的pom.xml:

            <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.1</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <relocations>
                            <relocation>
                                <pattern>com.google.common</pattern>
                                <shadedPattern>shaded.com.google.common</shadedPattern>
                            </relocation>
                        </relocations>
                        <artifactSet>
                            <includes>
                                <include>com.google.guava:guava</include>
                            </includes>
                        </artifactSet>
                    </configuration>
                </execution>
            </executions>
        </plugin>

这将创建一个具有相同名称的上层罐,其中仅包含一个阴影番石榴。

© www.soinside.com 2019 - 2024. All rights reserved.