kubernetes 气流 hive 操作员错误:[Errno 13] 权限被拒绝:'hive';

问题描述 投票:0回答:1

我在尝试使用 Kubernetes Executor 在 Apache Airflow 中运行 HiveOperator 任务时遇到问题。

我有一个 Dockerfile,在其中安装了必要的依赖项,包括 apache-airflow-providers-apache-hive==6.4.1:

Dockerfile

FROM apache/airflow:2.8.2
COPY requirements.txt /
RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r /requirements.txt
RUN umask 0002; \
    mkdir -p /tmp

在我的 Airflow 任务中,我定义了一个 HiveOperator,如下所示:

hive_select = HiveOperator(
        task_id='hive_select',
        hive_cli_conn_id='hive_conn',
        hql="select * from table LIMIT 10",
        execution_timeout=timedelta(minutes=30)
    )

但是,当我尝试执行此任务时,遇到以下错误:

[2024-02-29, 09:05:15 UTC] {hive.py:275} INFO - hive -hiveconf airflow.ctx.dag_id=hive_con_test -hiveconf airflow.ctx.task_id=hive_select -hiveconf airflow.ctx.execution_date=2024-02-29T09:02:14.534414+00:00 -hiveconf airflow.ctx.try_number=4 -hiveconf airflow.ctx.dag_run_id=manual__2024-02-29T09:02:14.534414+00:00 -hiveconf airflow.ctx.dag_owner=airflow -hiveconf airflow.ctx.dag_email= -hiveconf mapred.job.name=Airflow HiveOperator task for hive-con-test-hive-select-g7oo5n2s.hive_con_test.hive_select.2024-02-29T09:02:14.534414+00:00 -f /tmp/airflow_hiveop_wg4va81k/tmpwyi3r0ow
[2024-02-29, 09:05:15 UTC] {taskinstance.py:2728} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 439, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/apache/hive/operators/hive.py", line 172, in execute
    self.hook.run_cli(hql=self.hql, schema=self.schema, hive_conf=self.hiveconfs)
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/apache/hive/hooks/hive.py", line 276, in run_cli
    sub_process: Any = subprocess.Popen(
                       ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.11/subprocess.py", line 1953, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'hive'
[2024-02-29, 09:05:15 UTC] {taskinstance.py:1149} INFO - Marking task as FAILED. dag_id=hive_con_test, task_id=hive_select, execution_date=20240229T090214, start_date=20240229T090514, end_date=20240229T090515
[2024-02-29, 09:05:15 UTC] {standard_task_runner.py:107} ERROR - Failed to execute job 214 for task hive_select ([Errno 13] Permission denied: 'hive'; 28)
[2024-02-29, 09:05:15 UTC] {local_task_job_runner.py:234} INFO - Task exited with return code 1
[2024-02-29, 09:05:15 UTC] {taskinstance.py:3309} INFO - 0 downstream tasks scheduled from follow-on schedule check

PermissionError: [Errno 13] Permission denied: 'hive'

似乎存在与执行 Hive CLI 相关的权限问题。我尝试按照某些资源中的建议设置 /tmp/ 文件夹的权限,但我不确定是否正确执行。

任何关于如何解决此权限问题并成功运行 HiveOperator 任务的见解将不胜感激。

python kubernetes hive airflow
1个回答
0
投票

ROM apache/airflow:2.8.2 复制需求.txt / 运行 pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r /requirements.txt 运行 umask 0002;
mkdir -p /tmp

© www.soinside.com 2019 - 2024. All rights reserved.