通过 Spark Kubernetes Operator 提交 Spring Boot Spark 作业

问题描述 投票:0回答:1

在我的本地服务器上,我有 Spark 集群以独立模式运行,并且我有 Spring Boot Spark 作业,我使用以下命令提交:

spark-submit --conf "spark.driver.userClassPathFirst=true" --conf "spark.executor.userClassPathFirst=true" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///log4j2.xml -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m --add-opens=java.base/sun.nio.ch=ALL-UNNAMED -Dlog4j.debug" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j2.xml --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m -Dlog4j.debug" --driver-java-options "-Xms4096m -XX:+UseG1GC -XX:G1HeapRegionSize=32M --add-opens=java.base/sun.nio.ch=ALL-UNNAMED" --master spark://localhost:7077 --deploy-mode cluster --num-executors 1 --executor-cores 4 --executor-memory 4096m --driver-memory 4096m --conf "spark.driver.memory=4096m" --conf "spark.dynamicAllocation.enabled=true" operatordc1-0.0.1-SNAPSHOT.jar

它运行良好,工作成功。然后,我使用此映像在 Kubernetes 集群中创建了一个 Spark Operator:ghcr.io/kubeflow/spark-operator:v1beta2-1.4.3-3.5.0,Operator 已成功创建,现在我想运行与我相同的作业使用 Spark 运算符在 kubernetes 上的独立 Spark 集群上运行。为此,我创建了这个 yaml 文件:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: operatordc1
  namespace: spark-operator
spec:
  type: Java
  mode: cluster
  image: "focode/spark-custom:release-1.0"
  imagePullPolicy: Always
  mainApplicationFile: "local:///opt/spark/examples/jars/operatordc1-0.0.1-SNAPSHOT.jar"
  sparkVersion: "3.4.2"
  restartPolicy:
    type: Never
  driver:
    cores: 1
    coreLimit: "1000m"
    memory: "1024m"
    javaOptions: >-
      -Dlog4j.configuration=file:///log4j2.xml
      --add-opens=java.base/java.lang=ALL-UNNAMED
      --add-opens=java.base/java.lang.reflect=ALL-UNNAMED
      --add-opens=java.base/java.nio=ALL-UNNAMED
      --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
      --add-opens=java.base/java.util=ALL-UNNAMED
      --add-opens=java.base/java.lang.invoke=ALL-UNNAMED
      --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED
      -XX:+UseG1GC
      -XX:G1HeapRegionSize=32M
      -XX:ReservedCodeCacheSize=100M
      -XX:MaxMetaspaceSize=256m
      -XX:CompressedClassSpaceSize=256m
      -Xms1024m
      -Dlog4j.debug
    labels:
      version: "3.4.2"
    serviceAccount: default
  executor:
    cores: 4
    instances: 1
    memory: "1024m"
    javaOptions: >-
      -Dlog4j.configuration=file:///log4j2.xml
      -XX:ReservedCodeCacheSize=100M
      -XX:MaxMetaspaceSize=256m
      -XX:CompressedClassSpaceSize=256m
      --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
      -Dlog4j.debug
    labels:
      version: "3.4.2"
    serviceAccount: default
  sparkConf:
    "spark.driver.userClassPathFirst": "true"
    "spark.executor.userClassPathFirst": "true"
    "spark.driver.memory": "1024m"
    "spark.executor.memory": "1024m"
    "spark.dynamicAllocation.enabled": "true"

我使用以下命令运行它:kubectl apply -f Spark-job-poc-1.yaml -n Spark-operator 但不幸的是这给了我错误如下:

failed to submit SparkApplication operatordc1: failed to run spark-submit for SparkApplication spark-operator/operatordc1: 24/04/27 10:19:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'No FileSystem for scheme "local"'. Please specify one with --class. at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:1047) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:528) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

这意味着它无法选择我的 Spring Boot 作业的主类

我已在此位置上传了 Spring Boot 作业的源代码:https://github.com/focode/operatordc1

我尝试使用此键提供主类: mainClass: "com.dcpoc1.operator.operatordc1.Operatordc1Application" 以及 mainApplicationFile 但它对我不起作用,但有同样的异常。另外,我关心的是,当我使用 Spark Submit 命令在独立 Spark 集群上提交作业时,我没有指定类参数,但 kubernetes 操作员强制我为 Spring Boot 作业添加类参数

java spring-boot apache-spark kubernetes kubernetes-operator
1个回答
0
投票

我得到了解决方案,我用它作为主类:mainClass:“org.springframework.boot.loader.JarLauncher”

© www.soinside.com 2019 - 2024. All rights reserved.