我正在寻找一种方法来识别最顶层目录中的dist.xml文件。
示例,我有此目录清单列表,
/opt/pictures/dist.xml
/opt/docs_old/dist.xml
/opt/public/dist.xml
/opt/documents/server/dist.xml
/opt/documents/dist.xml
/opt/documents/web/dist.xml
/opt/documents/class/dist.xml
/opt/documents/lessons/1/dist.xml
/opt/documents/lessons/2/dist.xml
/opt/documents/lessons/3/dist.xml
/opt/documents/lessons/4/dist.xml
/opt/documents/lessons/5/dist.xml
/opt/music/service/day/dist.xml
/opt/music/service/week/dist.xml
/opt/music/service/month/dist.xml
/opt/music/service/month/1/dist.xml
/opt/music/service/month/2/dist.xml
并且我正在寻找此输出,
/opt/pictures/dist.xml
/opt/docs_old/dist.xml
/opt/public/dist.xml
/opt/documents/dist.xml
/opt/music/service/day/dist.xml
/opt/music/service/week/dist.xml
/opt/music/service/month/dist.xml
我有下面的代码似乎可以完成工作,想知道是否有必要加快或清除代码,
from pathlib import Path
paths = ['/opt/pictures/dist.xml', '/opt/docs_old/dist.xml', '/opt/public/dist.xml', '/opt/documents/server/dist.xml', '/opt/documents/dist.xml', '/opt/documents/web/dist.xml', '/opt/documents/class/dist.xml', '/opt/documents/lessons/1/dist.xml', '/opt/documents/lessons/2/dist.xml', '/opt/documents/lessons/3/dist.xml', '/opt/documents/lessons/4/dist.xml', '/opt/documents/lessons/5/dist.xml', '/opt/music/service/day/dist.xml', '/opt/music/service/week/dist.xml', '/opt/music/service/month/dist.xml', '/opt/music/service/month/1/dist.xml', '/opt/music/service/month/2/dist.xml']
paths = list(set(paths))
paths_folder = [str(Path(p).parent) for p in paths]
to_remove = []
for idx, val in enumerate(paths_folder):
for b in Path(val).parents:
if str(b) in paths_folder:
to_remove.append(idx)
paths_folder = [i for j, i in enumerate(paths_folder) if j not in to_remove]
paths_folder = [p + '/dist.xml' for p in paths_folder]
print(paths_folder)
这是一种更干净的方法,因为它避免了索引跟踪等:
首先对所有path_folders
进行排序,使其首先位于最上面的文件夹。然后像您一样,但使用all()
built-in,在“顶部文件夹”列表中检查每个父文件夹是否存在,只有在所有项目的条件都为true时才为真。然后,将其立即添加到最终文件夹列表中,因为该项目的任何following要么是其他文件夹,要么是当前文件夹的子文件夹;因为之前已经完成过排序。
all()