如何从Databricks连接到HDInsight Hadoop群集

问题描述 投票:0回答:2

您能帮我从Databricks笔记本中找到与HDInsight Hadoop集群(首先与HDFS进行交互)的正确方法吗?

现在我正尝试使用如下的pyarrow库:

hdfs1 = pa.hdfs.connect(host=host, port=8020, extra_conf=conf, driver='libhdfs3')

主机是我的名称节点;

conf是字典,从HDFS_CLIENT hdfs-site.xml创建

提前感谢您的帮助!

python hadoop databricks hdinsight pyarrow
2个回答
0
投票

我收到错误消息:

ArrowIOError: HDFS connection failed
---------------------------------------------------------------------------
ArrowIOError                              Traceback (most recent call last)
<command-3476367505086664> in <module>
      1 hdfs1 = pa.hdfs.connect(host=host, port=8020, 
----> 2                         extra_conf=conf, driver='libhdfs3')
/databricks/python/lib/python3.7/site-packages/pyarrow/hdfs.py in connect(host, port, user, kerb_ticket, driver, extra_conf)
    209     fs = HadoopFileSystem(host=host, port=port, user=user,
    210                           kerb_ticket=kerb_ticket, driver=driver,
--> 211                           extra_conf=extra_conf)
    212     return fs
/databricks/python/lib/python3.7/site-packages/pyarrow/hdfs.py in __init__(self, host, port, user, kerb_ticket, driver, extra_conf)
     36             _maybe_set_hadoop_classpath()
     37 
---> 38         self._connect(host, port, user, kerb_ticket, driver, extra_conf)
     39 
     40     def __reduce__(self):
/databricks/python/lib/python3.7/site-packages/pyarrow/io-hdfs.pxi in pyarrow.lib.HadoopFileSystem._connect()
/databricks/python/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowIOError: HDFS connection failed

0
投票

而且我对文档中的环境变量也不太清楚:

HADOOP_HOME:已安装的Hadoop发行版的根。通常具有lib / native / libhdfs.so。-我的Hadoop应该在HDInsights群集上,但是libhdfs.so我已与pyarrow一起安装在Databricks上。而且由于我无权访问HDFS-我看不到HADOOP_HOME路径。

© www.soinside.com 2019 - 2024. All rights reserved.