在python中以编程方式提交pyspark作业,而无需使用Submit pyspark

问题描述 投票:0回答:1

我想将我的本地系统的Spark作业提交到安装了cloudera的远程服务器(YARN上的Spark)。尝试了所有可能性。

尝试创建SparkSession和SparkContext

以下为代码:

1)---------------------------------------------- ----

from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession

appName = 'TEST_ON_SPARK'
masterUrl = 'yarn'

spark = None
try:
    spark = SparkSession.builder.appName(appName).master(masterUrl).\
    config("spark.hadoop.fs.defaultFS", "hdfs://192.168.XX.XX:8020").\
    config("spark.hadoop.yarn.resourcemanager.address", "192.168.XX.XX:8032").getOrCreate() 

except Exception as e:
    print(e)
    raise e
finally:
    if spark is not None:
        spark.stop()

##################Also, tried


try:
    conf = SparkConf().setAppName(appName).setMaster(masterUrl).\
    set("spark.hadoop.fs.defaultFS", "hdfs://192.168.XX.XX").\
    set("spark.hadoop.yarn.resourcemanager.hostname", "192.168.XX.XX").\
    set("spark.submit.deployMode","cluster").\
    set("spark.hadoop.yarn.resourcemanager.address", "192.168.XX.XX:8032").\
    set("spark.hadoop.yarn.resourcemanager.hostname", "resourcemanager.192.168.XX.XX").\
    set("spark.yarn.access.namenodes", "hdfs://192.168.XX.XX:8020,hdfs://192.168.XX.XX:8020").\
    set("spark.yarn.stagingDir", "hdfs://192.168.XX.XX:8020/user/username.surname/")
    for i in conf.getAll():
        print(i)
    print(1)
    sparkContext = SparkContext(conf=conf)
    print(2)
    spark = SparkSession.builder.config(conf=conf).getOrCreate()
except Exception as e:
    print(e)
    raise e
finally:
    if spark is not None:
        spark.stop()
    if sparkContext is not None:
        sparkContext.stop()
pyspark yarn cloudera spark-submit
1个回答
0
投票

目前,Spark提交是通过编程方式提交作业的方式

© www.soinside.com 2019 - 2024. All rights reserved.