如何获取hdfs文件夹中的子文件夹列表?

问题描述 投票:0回答:1

假设我把我的parquets存储如下:

hdfs://root/folder1/pqt1.pqt
hdfs://root/folder2/pqt2.pqt
hdfs://root/folder3/pqt3.pqt
hdfs://root/folder4/part1/pqt4part1.pqt
hdfs://root/folder4/part2/pqt4part1.pqt
...

如何使用sparklyr列出R中'hdfs:// root'中的子文件夹?期望的输出是(没有递归):

hdfs://root/folder1/
hdfs://root/folder2/
hdfs://root/folder3/
hdfs://root/folder4/
...

和递归:

hdfs://root/folder1/
hdfs://root/folder2/
hdfs://root/folder3/
hdfs://root/folder4/
hdfs://root/folder4/part1/
hdfs://root/folder4/part2/
...
r apache-spark sparklyr
1个回答
0
投票

基地R可能已经足够了

list.dirs(path = "hdfs://root", full.names = TRUE, recursive = TRUE)

© www.soinside.com 2019 - 2024. All rights reserved.