使用 Flink Kubernetes Operator 在 Flink 上自定义作业错误

问题描述 投票:0回答:1

我需要如何调试的建议,我可以检查下降的 Flink 作业的哪些内容。

我已经安装了 Flink Kubernetes Operator v1.8 ok K8s,从模板“statemachine.jar”创建了作业,并且运行成功。 我已经使用自定义 jar (开发团队应用程序)创建了新的 docker 映像,并且在 pod 启动期间我收到如下错误:

ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error occurred in the cluster entrypoint.
java.util.concurrent.CompletionException: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application.
    at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
    at java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
    at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:337) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:254) ~[flink-dist-1.18.1.jar:1.18.1]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
    at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
    at org.apache.flink.runtime.concurrent.pekko.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:172) ~[?:?]
    at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.pekko.dispatch.TaskInvocation.run(AbstractDispatcher.scala:59) [flink-rpc-akka109c04d3-d759-4da9-a716-b702329308fe.jar:1.18.1]
    at org.apache.pekko.dispatch.ForkJoinExecutorConfigurator$PekkoForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:57) [flink-rpc-akka109c04d3-d759-4da9-a716-b702329308fe.jar:1.18.1]
    at java.util.concurrent.ForkJoinTask.doExec(Unknown Source) [?:?]
    at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) [?:?]
    at java.util.concurrent.ForkJoinPool.scan(Unknown Source) [?:?]
    at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) [?:?]
    at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) [?:?]
Caused by: org.apache.flink.client.deployment.application.ApplicationExecutionException: Could not execute application.
    ... 14 more
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: unknown protocol: local
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:301) ~[flink-dist-1.18.1.jar:1.18.1]
    ... 13 more
Caused by: java.net.MalformedURLException: unknown protocol: local
    at java.net.URL.<init>(Unknown Source) ~[?:?]
    at java.net.URL.<init>(Unknown Source) ~[?:?]
    at java.net.URL.<init>(Unknown Source) ~[?:?]
    at org.apache.flink.configuration.ConfigUtils.decodeListFromConfig(ConfigUtils.java:133) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.cli.ExecutionConfigAccessor.getJars(ExecutionConfigAccessor.java:77) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:77) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.submitAndGetJobClientFuture(EmbeddedExecutor.java:123) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.execute(EmbeddedExecutor.java:104) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2238) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:189) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:118) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099) ~[flink-dist-1.18.1.jar:1.18.1]
    at com.app.Application$.main(Application.scala:50) ~[?:?]
    at com.app.Application.main(Application.scala) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105) ~[flink-dist-1.18.1.jar:1.18.1]
    at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:301) ~[flink-dist-1.18.1.jar:1.18.1]
    ... 13 more

我尝试了不同版本的 flink 镜像: flink:1.19-scala_2.12-java11 flink:1.18.1-scala_2.12-java11 flink:1.17.2-scala_2.12-java11 每次问题都是一样的。

我可以检查什么以及如何找到此错误的根源?

apache-flink java-11
1个回答
0
投票

您提到您正在使用自定义图像,其中包含您的 jar。您是否确保在您的 FlinkDeployment 中特别引用该镜像作为您合适的镜像(通过

flinkImage
)?

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: ...
spec:
  image: "{{ yourCustomImage }}"
  ...
  flinkConfiguration:

您只需确保 Dockerfile 中使用的基础镜像与您想要定位的 Flink 版本一致:

# Dockerfile (example targeting 1.18)
FROM .../flink:1.18.1

COPY .../statemachine.jar /opt/flink/jars/statemachine.jar

然后在部署中将 jar 复制到放置在自定义映像中的适当位置:

# FlinkDeployment
spec:
  job:
    jarURI: local:///opt/flink/jars/statemachine.jar

如果这不起作用,您可以考虑发布一些有关 FlinkDeployment、Dockerfile 等的其他详细信息。如果您的作业针对自定义映像,您还可以考虑使用 Docker 或直接在运行该映像的 pod 中执行以探索确保您的 jar 文件位于您期望的位置。

© www.soinside.com 2019 - 2024. All rights reserved.