Ambari无法运行用于修改用户配置单元的自定义钩子

问题描述 投票:0回答:1

尝试通过Ambari(v2.7.3.0)(HDP 3.1.0.0-78)将客户端节点添加到群集并看到奇数错误

stderr: 
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 38, in <module>
    BeforeAnyHook().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 31, in hook
    setup_users()
  File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/shared_initialization.py", line 51, in setup_users
    fetch_nonlocal_groups = params.fetch_nonlocal_groups,
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/accounts.py", line 90, in action_create
    shell.checked_call(command, sudo=True)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
    self.save_component_version_to_structured_out(self.command_name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
    stack_select_package_name = stack_select.get_package_name()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
    package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
    supported_packages = get_supported_packages()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
    raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select



 stdout:
2019-11-25 13:07:57,644 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=None -> 3.1
2019-11-25 13:07:57,651 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2019-11-25 13:07:57,652 - Group['livy'] {}
2019-11-25 13:07:57,654 - Group['spark'] {}
2019-11-25 13:07:57,654 - Group['ranger'] {}
2019-11-25 13:07:57,654 - Group['hdfs'] {}
2019-11-25 13:07:57,654 - Group['zeppelin'] {}
2019-11-25 13:07:57,655 - Group['hadoop'] {}
2019-11-25 13:07:57,655 - Group['users'] {}
2019-11-25 13:07:57,656 - User['yarn-ats'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,658 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-25 13:07:57,971 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
    self.save_component_version_to_structured_out(self.command_name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
    stack_select_package_name = stack_select.get_package_name()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
    package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
    supported_packages = get_supported_packages()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
    raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select

Command failed after 1 tries

问题似乎是

resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd

原因

2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']

这进一步得到加强,在将主机添加到群集之前手动添加ambari-hdp-1.repo和yum-installing hdp-select会显示相同的错误消息,只是被截断到stdout / err部分显示在这里。

运行时

[root@HW001 .ssh]# /usr/bin/hdp-select versions
3.1.0.0-78

从ambari服务器节点,我可以看到命令运行。

我看钩子脚本试图运行/访问什么

[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
-rw-r--r-- 1 root root 1.2K Nov 25 10:51 /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
[root@client001~]# ls -lha /var/lib/ambari-agent/data/command-632.json
-rw------- 1 root root 545K Nov 25 13:07 /var/lib/ambari-agent/data/command-632.json
[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY
total 0
drwxr-xr-x 4 root root  34 Nov 25 10:51 .
drwxr-xr-x 8 root root 147 Nov 25 10:51 ..
drwxr-xr-x 2 root root  34 Nov 25 10:51 files
drwxr-xr-x 2 root root 188 Nov 25 10:51 scripts
[root@client001~]# ls -lha /var/lib/ambari-agent/data/structured-out-632.json
ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory
[root@client001~]# ls -lha /var/lib/ambari-agent/tmp
total 96K
drwxrwxrwt  3 root root 4.0K Nov 25 13:06 .
drwxr-xr-x 10 root root  267 Nov 25 10:50 ..
drwxr-xr-x  6 root root 4.0K Nov 25 13:06 ambari_commons
-rwx------  1 root root 1.4K Nov 25 13:06 ambari-sudo.sh
-rwxr-xr-x  1 root root 1.6K Nov 25 13:06 create-python-wrap.sh
-rwxr-xr-x  1 root root 1.6K Nov 25 10:50 os_check_type1574715018.py
-rwxr-xr-x  1 root root 1.6K Nov 25 11:12 os_check_type1574716360.py
-rwxr-xr-x  1 root root 1.6K Nov 25 11:29 os_check_type1574717391.py
-rwxr-xr-x  1 root root 1.6K Nov 25 13:06 os_check_type1574723161.py
-rwxr-xr-x  1 root root  16K Nov 25 10:50 setupAgent1574715020.py
-rwxr-xr-x  1 root root  16K Nov 25 11:12 setupAgent1574716361.py
-rwxr-xr-x  1 root root  16K Nov 25 11:29 setupAgent1574717392.py
-rwxr-xr-x  1 root root  16K Nov 25 13:06 setupAgent1574723163.py

注意,存在ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory。不过,不确定这是否正常。

任何人都知道这是什么原因或由此引起的任何调试提示吗?


UPDATE 01:在错误跟踪中有问题的最后一行附近添加一些日志打印行。 File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages,我打印代码和标准输出:

2
ambari-python-wrap: can't open file '/usr/bin/hdp-select': [Errno 2] No such file or directory

那到底是什么?它希望hdp-select已经存在,但是如果我事先手动安装该二进制文件,则ambari add-host UI会抱怨。当我手动安装它时(使用与其他现有群集节点相同的回购文件),我看到的只是...

0
Packages:
  accumulo-client
  accumulo-gc
  accumulo-master
  accumulo-monitor
  accumulo-tablet
  accumulo-tracer
  atlas-client
  atlas-server
  beacon
  beacon-client
  beacon-server
  druid-broker
  druid-coordinator
  druid-historical
  druid-middlemanager
  druid-overlord
  druid-router
  druid-superset
  falcon-client
  falcon-server
  flume-server
  hadoop-client
  hadoop-hdfs-client
  hadoop-hdfs-datanode
  hadoop-hdfs-journalnode
  hadoop-hdfs-namenode
  hadoop-hdfs-nfs3
  hadoop-hdfs-portmap
  hadoop-hdfs-secondarynamenode
  hadoop-hdfs-zkfc
  hadoop-httpfs
  hadoop-mapreduce-client
  hadoop-mapreduce-historyserver
  hadoop-yarn-client
  hadoop-yarn-nodemanager
  hadoop-yarn-registrydns
  hadoop-yarn-resourcemanager
  hadoop-yarn-timelinereader
  hadoop-yarn-timelineserver
  hbase-client
  hbase-master
  hbase-regionserver
  hive-client
  hive-metastore
  hive-server2
  hive-server2-hive
  hive-server2-hive2
  hive-webhcat
  hive_warehouse_connector
  kafka-broker
  knox-server
  livy-client
  livy-server
  livy2-client
  livy2-server
  mahout-client
  oozie-client
  oozie-server
  phoenix-client
  phoenix-server
  pig-client
  ranger-admin
  ranger-kms
  ranger-tagsync
  ranger-usersync
  shc
  slider-client
  spark-atlas-connector
  spark-client
  spark-historyserver
  spark-schema-registry
  spark-thriftserver
  spark2-client
  spark2-historyserver
  spark2-thriftserver
  spark_llap
  sqoop-client
  sqoop-server
  storm-client
  storm-nimbus
  storm-slider-client
  storm-supervisor
  superset
  tez-client
  zeppelin-server
  zookeeper-client
  zookeeper-server
Aliases:
  accumulo-server
  all
  client
  hadoop-hdfs-server
  hadoop-mapreduce-server
  hadoop-yarn-server
  hive-server

Command failed after 1 tries

UPDATE 02:从File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 322打印一些自定义日志(打印err_msgcodeouterr的值),即

....
    312   if throw_on_failure and not code in returns:
    313     err_msg = Logger.filter_text("Execution of '{0}' returned {1}. {2}".format(command_alias, c        ode, all_output))
    314
    315     #TODO remove
    316     print("\n----------\nMY LOGS\n----------\n")
    317     print(err_msg)
    318     print(code)
    319     print(out)
    320     print(err)
    321
    322     raise ExecutionFailed(err_msg, code, out, err)
    323
    324   # if separate stderr is enabled (by default it's redirected to out)
    325   if stderr == subprocess32.PIPE:
    326     return code, out, err
    327
    328   return code, out
....

我知道

Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
6
usermod: user 'hive' does not exist in /etc/passwd

Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-816.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-816.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-26 10:25:46,928 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed

因此似乎无法创建hive用户(即使在此之前创建yarn-ats用户似乎没有问题)

ambari
1个回答
0
投票

[仅仅屈服并尝试自己手动创建配置单元用户后,我看到

[root@airflowetl ~]# useradd -g hadoop -s /bin/bash hive
useradd: user 'hive' already exists
[root@airflowetl ~]# cat /etc/passwd | grep hive
<nothing>
[root@airflowetl ~]# id hive
uid=379022825(hive) gid=379000513(domain users) groups=379000513(domain users)

该现有用户的uid看起来像这样并且不在/etc/passwd文件中的事实使我认为,已有一些现有的Active Directory用户(此客户端节点通过已安装的SSSD与之同步)已经具有名称蜂巢。检查我们的AD用户,事实证明这是事实。

暂时stopping the SSSD service以停止与AD的同步(service sssd stop)(因为不确定是否可以让服务器忽略单个用户的AD同步),然后重新运行客户端主机,在Ambari中添加为我解决了这个问题。

© www.soinside.com 2019 - 2024. All rights reserved.