我在 AWS EMR 集群中运行的 Spark 中有以下设置。根据这些设置,Spark 应该为我的作业分配最多 10 个执行程序。但即使没有其他作业正在运行,我也只看到分配了 2 个执行程序。我看到很多未使用的处理器容量。可能是什么问题?我该如何诊断此类问题?
spark.app.id ABC
spark.app.name XYZ
spark.app.startTime 1699623174356
spark.blacklist.decommissioning.enabled true
spark.blacklist.decommissioning.timeout 1h
spark.decommissioning.timeout.threshold 20
spark.default.parallelism 100
spark.driver.cores 4
spark.driver.defaultJavaOptions -XX:OnOutOfMemoryError='kill -9 %p'
spark.driver.extraClassPath ****
spark.driver.extraJavaOptions *********(redacted)
spark.driver.extraLibraryPath *****
spark.driver.host ****
spark.driver.memory 20g
spark.driver.port 41783
spark.driver.userClassPathFirst false
spark.dummy.for.ops 548510a92b094285a4d3568ada11c44a
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.maxExecutors 10
spark.dynamicAllocation.minExecutors 2
spark.emr.default.executor.cores 4
spark.emr.default.executor.memory 18971M
spark.eventLog.dir *****
spark.eventLog.enabled true
spark.executor.cores 4
spark.executor.defaultJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
spark.executor.extraClassPath /lib/spark/jars/*:/home/hadoop/extrajars/*:/var/log/log4j2-config.xml
spark.executor.extraJavaOptions *********(redacted)
spark.executor.extraLibraryPath *********
spark.executor.id driver
spark.executor.memory 20g
spark.executor.memoryOverhead 1500
spark.executor.userClassPathFirst false
spark.executorEnv.AAA_APPLICATION_NAME DoomSpark
spark.executorEnv.AAA_AWS_CLIENT_RELATIONSHIP_KEY_BUCKET_NAME aaastack-prod-na-syncbucketcfnbucketc68d5f32-1mafjvqn9i9gj
spark.executorEnv.CORAL_CONFIG_PATH /home/hadoop/.config/coral-config
spark.executorEnv.DOMAIN prod
spark.executorEnv.ENVROOT /home/hadoop
spark.executorEnv.REALM USAmazon
spark.executorEnv.REDIS_AUTH_TOKEN *********(redacted)
spark.executorEnv.REDIS_PORT 6379
spark.executorEnv.REDIS_URL master.rediscfnreplicationgroup-prod-na.f12qkp.use1.cache.amazonaws.com
spark.files.fetchFailure.unRegisterOutputOnHost true
spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds 2000
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem 2
spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem true
spark.hadoop.mapreduce.output.fs.optimized.committer.enabled true
spark.hadoop.yarn.timeline-service.enabled false
spark.history.fs.logDirectory hdfs:///var/log/spark/apps
spark.history.ui.port 18080
spark.master yarn
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS ip-10-0-52-16.ec2.internal
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES http://ip-10-0-52-16.ec2.internal:20888/proxy/application_1699353854257_0781
spark.resourceManager.cleanupExpiredHost true
spark.scheduler.mode FAIR
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled true
spark.sql.emr.internal.extensions com.amazonaws.emr.spark.EmrSparkSessionExtensions
spark.sql.hive.metastore.sharedPrefixes com.amazonaws.services.dynamodbv2
spark.sql.parquet.fs.optimized.committer.optimization-enabled true
spark.sql.parquet.output.committer.class com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter
spark.sql.warehouse.dir hdfs:///user/spark/warehouse
spark.stage.attempt.ignoreOnDecommissionFetchFailure true
spark.submit.deployMode cluster
spark.submit.pyFiles
spark.ui.filters org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
spark.ui.port 0
spark.yarn.app.container.log.dir /var/log/hadoop-yarn/containers/application_1699353854257_0781/container_1699353854257_0781_01_000001
spark.yarn.app.id application_1699353854257_0781
spark.yarn.appMasterEnv.DOMAIN prod
spark.yarn.appMasterEnv.REALM USAmazon
spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS $(hostname -f)
spark.yarn.executor.memoryOverheadFactor 0.1875
spark.yarn.heterogeneousExecutors.enabled false
spark.yarn.historyServer.address ip-XXXXX.internal:XXXX
spark.yarn.submit.waitAppCompletion false
spark.yarn.tags livy-batch-780-2iPUo7av
您在 Spark 配置中启用了动态分配,
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.maxExecutors 10
spark.dynamicAllocation.minExecutors 2
这是因为动态分配,您的应用程序被缩小到您在 Spark 配置中设置的 minExecutors 数量。
如果需要更多资源,spark 将产生更多执行器,并在不需要时缩小规模。
有关更多信息,请参阅 - https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation