我使用的是 Amazon EMR 7.x,默认情况下具有 Python 3.9。
我基于
添加了自定义Python 3.11我将其添加为引导脚本:
#!/usr/bin/env bash
set -e
PYTHON_VERSION=3.11.7
sudo yum --assumeyes install \
bzip2-devel \
expat-devel \
gcc \
libffi-devel \
make \
systemtap-sdt-devel \
tar \
zlib-devel
curl --silent --fail --show-error --location "https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz" | tar -x -J -v
cd "Python-${PYTHON_VERSION}"
export CFLAGS="-march=native"
./configure \
--enable-loadable-sqlite-extensions \
--with-dtrace \
--with-lto \
--enable-optimizations \
--with-system-expat \
--prefix="/usr/local/python${PYTHON_VERSION}"
sudo make altinstall
sudo "/usr/local/python${PYTHON_VERSION}/bin/python${PYTHON_VERSION%.*}" -m pip install --upgrade pip
echo "# Install my Amazon EMR cluster-scoped dependencies"
sudo curl --silent --fail --show-error --location --remote-name --output-dir /usr/lib/spark/jars/ https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.0/sedona-spark-shaded-3.4_2.12-1.5.0.jar
sudo curl --silent --fail --show-error --location --remote-name --output-dir /usr/lib/spark/jars/ https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.0-28.2/geotools-wrapper-1.5.0-28.2.jar
"/usr/local/python${PYTHON_VERSION}/bin/python${PYTHON_VERSION%.*}" -m pip install \
apache-sedona[spark]==1.5.0
我有一个验证Python版本的步骤:
import sys
from pyspark.sql import SparkSession
SparkSession.builder.getOrCreate()
print(sys.version_info)
# sys.version_info(major=3, minor=11, micro=7, releaselevel='final', serial=0)
assert (sys.version_info.major, sys.version_info.minor) == (3, 11)
也成功了:
如果我更改代码以与Python版本进行比较
(3, 9)
,它将失败。所以我知道它确实有效。
当我 ssh 进入 EMR 主节点时,我可以看到文件夹
/usr/local/python3.11.7
。
[hadoop@ip-172-31-177-28 ~]$ cd /usr/local
[hadoop@ip-172-31-177-28 local]$ ls
bin etc games include lib lib64 libexec man python3.11.7 sbin share src
但是,在 JupterLab 中,当我选择 PySpark 内核时,下面的脚本显示我正在使用 Python 3.9:
import sys
print(sys.version_info)
# sys.version_info(major=3, minor=9, micro=16, releaselevel='final', serial=0)
如果我在这个 EMR 集群中的 JupterLab 中打开终端,它会显示
[notebook@ip-10-131-38-159 /]$ cd /usr/local/
[notebook@ip-10-131-38-159 local]$ ls
bin etc games include lib lib64 libexec sbin share src
所以我感觉这个 JupterLab 正在作为 Docker 服务运行。
如何在JupterLab中添加Python 3.11?谢谢!
我发现 JupterLab Python 是独立的。我需要首先使用 Python 3.11 for JupterLab 创建一个新的 conda 环境,然后将其注册为新内核。
这是我更新的引导脚本:
#!/usr/bin/env bash
set -e
PYTHON_VERSION=3.11.7
sudo yum --assumeyes install \
bzip2-devel \
expat-devel \
gcc \
libffi-devel \
make \
systemtap-sdt-devel \
tar \
zlib-devel
curl --silent --fail --show-error --location "https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz" | tar -x -J -v
cd "Python-${PYTHON_VERSION}"
export CFLAGS="-march=native"
./configure \
--enable-loadable-sqlite-extensions \
--with-dtrace \
--with-lto \
--enable-optimizations \
--with-system-expat \
--prefix="/usr/local/python${PYTHON_VERSION}"
sudo make altinstall
sudo "/usr/local/python${PYTHON_VERSION}/bin/python${PYTHON_VERSION%.*}" -m pip install --upgrade pip
echo "# Install my Amazon EMR cluster-scoped dependencies"
sudo curl --silent --fail --show-error --location --remote-name --output-dir /usr/lib/spark/jars/ https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.0/sedona-spark-shaded-3.4_2.12-1.5.0.jar
sudo curl --silent --fail --show-error --location --remote-name --output-dir /usr/lib/spark/jars/ https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.0-28.2/geotools-wrapper-1.5.0-28.2.jar
"/usr/local/python${PYTHON_VERSION}/bin/python${PYTHON_VERSION%.*}" -m pip install \
apache-sedona[spark]==1.5.0
echo "# Install my JupyterLab-scoped dependencies"
sudo /emr/notebook-env/bin/conda create --name="python${PYTHON_VERSION}" python=${PYTHON_VERSION} --yes
sudo "/emr/notebook-env/envs/python${PYTHON_VERSION}/bin/python" -m pip install \
apache-sedona[spark]==1.5.0 \
attrs==23.1.0 \
descartes==1.1.0 \
ipykernel==6.28.0 \
matplotlib==3.8.2 \
pandas==2.1.4 \
shapely==2.0.2
echo "# Add JupyterLab kernel"
sudo "/emr/notebook-env/envs/python${PYTHON_VERSION}/bin/python" -m ipykernel install --name="python${PYTHON_VERSION}"
现在我在 JupterLab 中有一个新的 Python 3.11 内核:
参考: