sparkCheckInstall(sparkHome,master,deployMode)中的错误:

问题描述 投票:0回答:1

完成以下操作后

devtools::install_github('apache/[email protected]', subdir='R/pkg', force = TRUE)
library(SparkR)

我运行它以将我的数据转换为 spark DataFrame

as.DataFrame(value1)

但是,我收到以下错误信息

getSparkSession() 错误:SparkSession 未初始化

所以,我运行了这个..

sparkR.session()

提示如下:

Will you download and install (or reuse if it exists) Spark package under the cache [/home/analytics/.cache/spark]? (y/n):

如果我点击否,我得到这个...

 Error in sparkCheckInstall(sparkHome, master, deployMode) : 
  Please make sure Spark package is installed in this machine.
- If there is one, set the path in sparkHome parameter or environment variable SPARK_HOME.
- If not, you may run install.spark function to do the job.

但是,如果我点击是,我会收到如下 longggg 消息:

Spark not found in the cache directory. Installation will start.
MirrorUrl not provided.
Looking for preferred site from apache website...
Preferred mirror site found: https://dlcdn.apache.org/spark
Downloading spark-3.3.0 for Hadoop 2.7 from:
- https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz
trying URL 'https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz'
simpleWarning in download.file(remotePath, localPath): cannot open URL 'https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz': HTTP status was '404 Not Found'


To use backup site...
Downloading spark-3.3.0 for Hadoop 2.7 from:
- http://www-us.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz
trying URL 'http://www-us.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz'
simpleWarning in download.file(remotePath, localPath): URL 'http://www-us.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz': status was 'Couldn't resolve host name'


- Unable to download from default mirror site: http://www-us.apache.org/dist/spark
Error in robustDownloadTar(mirrorUrl, version, hadoopVersion, packageName,  : 
  Unable to download Spark spark-3.3.0 for Hadoop 2.7. Please check network connection, Hadoop version, or provide other mirror sites.

如何消除这个错误??

r sparkr
1个回答
0
投票

根据我的理解,您的系统中还需要 Spark 包。

Spark 可以使用这些链接安装:下载 Spark 3.3.0下载 Hadoop 3.0.0Java OpenJDK 11.0.13 LTS.

设置系统环境变量

SPARK_HOME
为之前下载的Spark 3.3.0目录;并类似地设置
HADOOP_HOME
JAVA_HOME
.

然后运行下面的 R 脚本来加载

SparkR
库,方法是将
<spark-lib-path>
更新为之前下载的解压 Spark 安装目录。

library(SparkR, lib.loc = .libPaths(c(file.path('<spark-lib-path>', 'R', 'lib'), .libPaths())))

我之前尝试时,这些步骤对我有用,我将 Spark 3.1.2 与 Hadoop 2.7.4 一起使用。

© www.soinside.com 2019 - 2024. All rights reserved.