在 Kubernetes 上使用 Spark 写入输出时出现 chmod 错误

Question

我正在开发一个 POC，用于将 Spark 集群设置为使用 Kubernetes 通过 AKS（Azure Kubernetes 服务）进行资源管理。我正在使用 Spark-submit 以集群模式将 pyspark 应用程序提交到 k8s，并且我已经成功地使应用程序正常运行。

我设置了 Azure 文件共享来存储应用程序脚本和持久卷，以及指向此文件共享的持久卷声明，以允许 Spark 从 Kubernetes 访问脚本。这对于不写入任何输出的应用程序来说效果很好，例如 Spark 源代码中给出的 pi.py 示例，但在此设置中写入任何类型的输出都会失败。我尝试运行一个脚本来获取字数和行

wordCounts.saveAsTextFile(f"./output/counts")

导致一个异常，其中 wordCounts 是一个 rdd。

Traceback (most recent call last):
File "/opt/spark/work-dir/wordcount2.py", line 14, in <module>
wordCounts.saveAsTextFile(f"./output/counts")
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1570, in saveAsTextFile
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o65.saveAsTextFile.
: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted

目录“counts”已经从spark应用程序中创建得很好，所以看起来它具有所需的权限，但是spark尝试在内部执行的后续

chmod

失败了。我无法找出原因以及导致此问题的命令中缺少的确切配置。任何帮助将不胜感激。

我使用的

kubectl

版本是

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"881d4a5a3c0f4036c714cfb601b377c4c72de543", GitTreeState:"clean", BuildDate:"2021-10-21T05:13:01Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

spark版本是2.4.5，我使用的命令是

<SPARK_PATH>/bin/spark-submit --master k8s://<HOST>:443  \
--deploy-mode cluster  \
--name spark-pi3 \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=docker.io/datamechanics/spark:2.4.5-hadoop-3.1.0-java-8-scala-2.11-python-3.7-dm14  \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--verbose /opt/spark/work-dir/wordcount2.py

PV 和 PVC 非常基本。 PV yml 为：

apiVersion: v1
kind: PersistentVolume
metadata:
  name: azure-fileshare-pv
  labels:
    usage: azure-fileshare-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  azureFile:
    secretName: azure-storage-secret
    shareName: dssparktestfs
    readOnly: false
    secretNamespace: spark-operator

PVC yml 是：

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: azure-fileshare-pvc
  # Set this annotation to NOT let Kubernetes automatically create
  # a persistent volume for this volume claim.
  annotations:
    volume.beta.kubernetes.io/storage-class: ""
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  selector:
    # To make sure we match the claim with the exact volume, match the label
    matchLabels:
      usage: azure-fileshare-pv

如果需要更多信息，请告诉我。

Answer 1

所有者和用户是root。

看起来您已经以 root 身份安装了卷。您的问题：

chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted

是因为您试图更改您不是其所有者的文件的权限。所以你需要先更改文件的所有者。

最简单的解决方案是

chown

您要访问的资源。然而，这通常是不可行的，因为它可能导致权限升级，并且图像本身可能会阻止这种可能性。在这种情况下，您可以创建安全上下文。

安全上下文定义 Pod 或容器的权限和访问控制设置。安全上下文设置包括但不限于：
自主访问控制：访问对象（如文件）的权限基于用户 ID (UID) 和组 ID (GID)。

安全增强型 Linux (SELinux)：为对象分配安全标签。

以特权或非特权方式运行。

Linux 功能：授予进程一些权限，但不是 root 用户的所有权限。 >

AppArmor：使用程序配置文件来限制单个程序的功能。

Seccomp：过滤进程的系统调用。
AllowPrivilegeEscalation：控制进程是否可以获得比其父进程更多的权限。该布尔值直接控制是否在容器进程上设置
no_new_privs
标志。当容器满足以下条件时，AllowPrivilegeEscalation 始终为 true：1) 作为特权运行或 2) 具有
CAP_SYS_ADMIN
。
readOnlyRootFilesystem：将容器的根文件系统挂载为只读。
以上项目符号并不是一组完整的安全上下文设置 - 请参阅 SecurityContext 获取完整列表。

有关 Linux 中安全机制的更多信息，请参阅Linux 内核安全功能概述

您可以为 Pod 配置卷权限和所有权更改策略。

默认情况下，Kubernetes 会递归更改每个卷内容的所有权和权限，以匹配挂载该卷时在 Pod 的
fsGroup
中指定的
securityContext
。对于大量数据，检查和更改所有权和权限可能会花费大量时间，从而减慢 Pod 的启动速度。您可以使用
fsGroupChangePolicy
内的
securityContext
字段来控制 Kubernetes 检查和管理卷的所有权和权限的方式。

这是一个例子：

securityContext:
  runAsUser: 1000
  runAsGroup: 3000
  fsGroup: 2000
  fsGroupChangePolicy: "OnRootMismatch"

另请参阅这个类似的问题。

Answer 2

我也有同样的问题。不同之处在于我在 Openshift 集群中工作。当我尝试在持久卷声明的安装路径中写入 df 时，会发生相同的错误。

我尝试了@WytrzymałyWiktor（创建 securityContext）建议的解决方案，结果是相同的。

有人解决这个问题了吗？

在 Kubernetes 上使用 Spark 写入输出时出现 chmod 错误

问题描述投票：0回答：2

2个回答

最新问题

在 Kubernetes 上使用 Spark 写入输出时出现 chmod 错误

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2