在 google collab 上下载 pyspark 时出错

问题描述 投票:0回答:1

我对 python 和 pyspark 很陌生,我有一个项目要做,我们正在 google collab 上使用 pyspark,我正在使用下面的代码,直到今天我似乎无法再安装 Spark 了。如果有人可以帮助我,我将非常感激!

我正在使用此代码:

!wget -q https://dlcdn.apache.org/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz
!tar -xzf spark-3.5.0-bin-hadoop3.tgz
!pip install -q findspark


import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.5.0-bin-hadoop3"

import findspark
findspark.init()

我现在收到这个错误:

tar (child): spark-3.5.0-bin-hadoop3.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
    158         try:
--> 159             py4j = glob(os.path.join(spark_python, "lib", "py4j-*.zip"))[0]
    160         except IndexError:

IndexError: list index out of range

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
1 frames
<ipython-input-20-a05079ca17b7> in <cell line: 12>()
     10 
     11 import findspark
---> 12 findspark.init()

/usr/local/lib/python3.10/dist-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
    159             py4j = glob(os.path.join(spark_python, "lib", "py4j-*.zip"))[0]
    160         except IndexError:
--> 161             raise Exception(
    162                 "Unable to find py4j in {}, your SPARK_HOME may not be configured correctly".format(
    163                     spark_python

Exception: Unable to find py4j in /content/spark-3.5.0-bin-hadoop3/python, your SPARK_HOME may not be configured correctly
apache-spark pyspark google-colaboratory
1个回答
0
投票

在做了更多研究后,我发现有一个新版本,我正在使用的网站突然变得无效,所以这是对我有用的新代码:

!apt-get install openjdk-8-jdk-headless -qq > /dev/null

!wget -q https://downloads.apache.org/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz

!tar xf Spark-3.5.1-bin-hadoop3.tgz

!pip install -q findspark

导入操作系统

os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"

os.environ["SPARK_HOME"] = "/content/spark-3.5.1-bin-hadoop3"

导入findspark

findspark.init()

© www.soinside.com 2019 - 2024. All rights reserved.