火花作业长时间运行以获取太小的数据

Question

我在主服务器上运行的火花代码如下：

import pyspark
from pyspark import SparkContext
sc =SparkContext()
nums= sc.parallelize([1,2,3,4])
nums.collect()

我的集群配置：Standalone / client模式

中的3个节点（1个主节点+ 2个从节点）

Master config 600mb RAM, 1CPU
Slave1 config 600mb RAM, 1CPU
Slave2 config 16GB RAM, 4CPU

当我使用命令提交工作时，我有一份长期工作

spark-submit --master spark://<MASTER_IP>:7077 --num-executors=6 --conf spark.driver.memory=500M --conf spark.executor.memory=6G --deploy-mode client test.py

登录屏幕：

20/05/11 19:43:09 INFO BlockManagerMaster: Removal of executor 105 requested
20/05/11 19:43:09 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200511193954-0001/106 on worker-20200511192038--MASTER_IP:44249 (MASTER_IP:44249) with 4 core(s)
20/05/11 19:43:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 105
20/05/11 19:43:09 INFO BlockManagerMasterEndpoint: Trying to remove executor 105 from BlockManagerMaster.
20/05/11 19:43:10 INFO StandaloneSchedulerBackend: Granted executor ID app-20200511193954-0001/106 on hostPort MASTER_IP:44249 with 4 core(s), 6.0 GB RAM
^C20/05/11 19:43:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

尝试的解决方案：

由于上述关于资源不足]的错误，我尝试添加新的群集Slave3，但该错误在扩展时仍然存在。

是因为Master节点中的内存较少吗？这里有什么建议吗？

我在主服务器上运行的火花代码如下：从pyspark导入pyspark导入SparkContext sc = SparkContext（）nums = sc.parallelize（[1,2,3,4]）nums.collect（）我的集群配置：3节点（1个master + ...

Answer 1

先尝试以最低要求运行。还要将部署模式更改为群集以使用工作程序节点。在https://spark.apache.org/docs/latest/submitting-applications.html了解更多信息>

spark-submit --master spark://<MASTER_IP>:7077 --num-executors=2 --conf spark.driver.memory=100M  --conf spark.executor.memory=200M --deploy-mode cluster test.py

火花作业长时间运行以获取太小的数据

问题描述投票：-1回答：1

1个回答

最新问题

火花作业长时间运行以获取太小的数据

问题描述 投票：-1回答：1

1个回答

最新问题

问题描述投票：-1回答：1