简化/增强Python过滤算法

问题描述 投票:0回答:1

我正在寻找一种方法来识别最顶层目录中的dist.xml文件。

示例,我有此目录清单列表,

/opt/pictures/dist.xml
/opt/docs_old/dist.xml
/opt/public/dist.xml
/opt/documents/server/dist.xml
/opt/documents/dist.xml
/opt/documents/web/dist.xml
/opt/documents/class/dist.xml
/opt/documents/lessons/1/dist.xml
/opt/documents/lessons/2/dist.xml
/opt/documents/lessons/3/dist.xml
/opt/documents/lessons/4/dist.xml
/opt/documents/lessons/5/dist.xml
/opt/music/service/day/dist.xml
/opt/music/service/week/dist.xml
/opt/music/service/month/dist.xml
/opt/music/service/month/1/dist.xml
/opt/music/service/month/2/dist.xml

并且我正在寻找此输出,

/opt/pictures/dist.xml
/opt/docs_old/dist.xml
/opt/public/dist.xml
/opt/documents/dist.xml
/opt/music/service/day/dist.xml
/opt/music/service/week/dist.xml
/opt/music/service/month/dist.xml

我有下面的代码似乎可以完成工作,想知道是否有必要加快或清除代码,

from pathlib import Path

paths = ['/opt/pictures/dist.xml', '/opt/docs_old/dist.xml', '/opt/public/dist.xml', '/opt/documents/server/dist.xml', '/opt/documents/dist.xml', '/opt/documents/web/dist.xml', '/opt/documents/class/dist.xml', '/opt/documents/lessons/1/dist.xml', '/opt/documents/lessons/2/dist.xml', '/opt/documents/lessons/3/dist.xml', '/opt/documents/lessons/4/dist.xml', '/opt/documents/lessons/5/dist.xml', '/opt/music/service/day/dist.xml', '/opt/music/service/week/dist.xml', '/opt/music/service/month/dist.xml', '/opt/music/service/month/1/dist.xml', '/opt/music/service/month/2/dist.xml']

paths = list(set(paths))
paths_folder = [str(Path(p).parent) for p in paths]

to_remove = []
for idx, val in enumerate(paths_folder):
  for b in Path(val).parents:
    if str(b) in paths_folder:
      to_remove.append(idx)

paths_folder = [i for j, i in enumerate(paths_folder) if j not in to_remove]

paths_folder = [p + '/dist.xml' for p in paths_folder]

print(paths_folder)
python python-3.x algorithm sorting
1个回答
0
投票

这是一种更干净的方法,因为它避免了索引跟踪等:

首先对所有path_folders进行排序,使其首先位于最上面的文件夹。然后像您一样,但使用all() built-in,在“顶部文件夹”列表中检查每个父文件夹是否存在,只有在所有项目的条件都为true时才为真。然后,将其立即添加到最终文件夹列表中,因为该项目的任何following要么是其他文件夹,要么是当前文件夹的子文件夹;因为之前已经完成过排序。

all()
© www.soinside.com 2019 - 2024. All rights reserved.