如何在Python中列出目录的所有文件并将它们添加到list
?
os.listdir()
将为您提供目录中的所有内容 - 文件和目录。
如果你只想要文件,你可以使用os.path
过滤掉它:
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
或者你可以使用os.walk()
,它会为它访问的每个目录产生两个列表 - 为你分割成文件和目录。如果你只想要顶级目录,你可以在它第一次产生时中断
from os import walk
f = []
for (dirpath, dirnames, filenames) in walk(mypath):
f.extend(filenames)
break
def list_files(path):
# returns a list of names (with extension, without full path) of all files
# in folder path
files = []
for name in os.listdir(path):
if os.path.isfile(os.path.join(path, name)):
files.append(name)
return files
如果你正在寻找一个find的Python实现,这是我经常使用的一个配方:
from findtools.find_files import (find_files, Match)
# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)
for found_file in found_files:
print found_file
所以我用它制作了一个PyPI package,还有一个GitHub repository。我希望有人发现它可能对此代码有用。
返回绝对文件路径列表,不会递归到子目录中
L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]
为了获得更好的结果,你可以使用listdir()
模块的os
方法和生成器(生成器是一个保持其状态的强大迭代器,还记得吗?)。以下代码适用于两个版本:Python 2和Python 3。
这是一个代码:
import os
def files(path):
for file in os.listdir(path):
if os.path.isfile(os.path.join(path, file)):
yield file
for file in files("."):
print (file)
listdir()
方法返回给定目录的条目列表。如果给定条目是文件,则os.path.isfile()
方法返回True
。并且yield
运算符退出func但保持其当前状态,并且它仅返回检测为文件的条目的名称。以上所有允许我们循环生成器函数。
import os
import os.path
def get_files(target_dir):
item_list = os.listdir(target_dir)
file_list = list()
for item in item_list:
item_dir = os.path.join(target_dir,item)
if os.path.isdir(item_dir):
file_list += get_files(item_dir)
else:
file_list.append(item_dir)
return file_list
在这里,我使用递归结构。
一位聪明的老师曾经告诉过我:
当有几种既定方法可以做某事时,它们都不适用于所有情况。
因此,我将为问题的一个子集添加一个解决方案:通常,我们只想检查文件是否匹配起始字符串和结束字符串,而不进入子目录。因此,我们想要一个返回文件名列表的函数,例如:
filenames = dir_filter('foo/baz', radical='radical', extension='.txt')
如果您想先声明两个函数,可以这样做:
def file_filter(filename, radical='', extension=''):
"Check if a filename matches a radical and extension"
if not filename:
return False
filename = filename.strip()
return(filename.startswith(radical) and filename.endswith(extension))
def dir_filter(dirname='', radical='', extension=''):
"Filter filenames in directory according to radical and extension"
if not dirname:
dirname = '.'
return [filename for filename in os.listdir(dirname)
if file_filter(filename, radical, extension)]
可以使用正则表达式轻松推广此解决方案(如果您不希望模式始终坚持文件名的开头或结尾,则可能需要添加pattern
参数)。
使用发电机
import os
def get_files(search_path):
for (dirpath, _, filenames) in os.walk(search_path):
for filename in filenames:
yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
print(filename)
Python 3.4+的另一个非常易读的变体是使用pathlib.Path.glob:
from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]
更具体,例如,更简单。只查找非符号链接的Python源文件,也在所有子目录中查找:
[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]
这是我的通用功能。它返回文件路径列表而不是文件名,因为我发现它更有用。它有一些可选参数,使其具有多种功能。例如,我经常使用像pattern='*.txt'
或subfolders=True
这样的参数。
import os
import fnmatch
def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
"""Return a list of the file paths matching the pattern in the specified
folder, optionally including files inside subfolders.
"""
match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
return [os.path.join(root, f)
for root, dirnames, filenames in walked
for f in filenames if match(f, pattern)]
对于python2:pip install rglob
import rglob
file_list=rglob.rglob("/home/base/dir/", "*")
print file_list
我更喜欢使用glob
模块,因为它模式匹配和扩展。
import glob
print(glob.glob("/home/adam/*.txt"))
它将返回包含查询文件的列表:
['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]
dircache是“自2.6版本后不推荐使用:dircache模块已在Python 3.0中删除。”
import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
if len(list[i]) != check:
temp.append(list[i-1])
check = len(list[i])
else:
i = i + 1
count = count - 1
print temp
我将提供一个样本单线程,其中可以提供源路径和文件类型作为输入。该代码返回带有csv扩展名的文件名列表。使用 。如果需要返回所有文件。这也将递归扫描子目录。
[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]
根据需要修改文件扩展名和源路径。
从指定的文件夹(包括子目录)获取所有文件。
import glob
import os
print([entry for entry in glob.iglob("{}/**".format("DIRECTORY_PATH"), recursive=True) if os.path.isfile(entry) == True])
import os
os.listdir("somedirectory")
将返回“somedirectory”中所有文件和目录的列表。
我还在这里制作了一段短片: Python: how to get a list of file in a directory
os.listdir()
或者.....如何获取当前目录中的所有文件(和目录)(Python 3)
在Python 3中将文件放在当前目录中的最简单方法是这样。这很简单;使用os
模块和listdir()
函数,你将拥有该目录中的文件(以及目录中的最终文件夹,但你不会在子目录中有文件,因为你可以使用walk - 我会谈论它后来)。
>>> import os
>>> arr = os.listdir()
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
使用glob
我发现glob更容易选择相同类型的文件或共同的东西。请看以下示例:
import glob
txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
使用列表理解
import glob
mylist = [f for f in glob.glob("*.txt")]
如您所知,您在上面的代码中没有该文件的完整路径。如果你需要有绝对路径,你可以使用os.path
模块的另一个函数_getfullpathname
,把你从os.listdir()
获得的文件作为参数。还有其他方法可以获得完整路径,我们稍后会检查(我更换了,如mexmex所建议,_getfullpathname和abspath
)。
>>> import os
>>> files_path = [os.path.abspath(x) for x in os.listdir()]
>>> files_path
['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']
walk
我发现这对于在许多目录中查找内容非常有用,它帮助我找到了一个我不记得名字的文件:
import os
# Getting the current work directory (cwd)
thisdir = os.getcwd()
# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if ".docx" in file:
print(os.path.join(r, file))
os.listdir():获取当前目录中的文件(Python 2)
在Python 2中,如果您想要当前目录中的文件列表,则必须将参数设置为“。”。或os.listdir方法中的os.getcwd()。
>>> import os
>>> arr = os.listdir('.')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
>>> # Method 1
>>> x = os.listdir('..')
# Method 2
>>> x= os.listdir('/')
>>> import os
>>> arr = os.listdir('F:\\python')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
import os
x = os.listdir("./content")
>>> import os
>>> arr = next(os.walk('.'))[2]
>>> arr
['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']
import glob
print(glob.glob("*"))
out:['content', 'start.py']
>>> import os
>>> arr = []
>>> for d,r,f in next(os.walk("F:\\_python")):
>>> for file in f:
>>> arr.append(os.path.join(r,file))
...
>>> for f in arr:
>>> print(files)
>output
F:\\_python\\dict_class.py
F:\\_python\\programmi.txt
>>> [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]
['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']
os.walk - 获取完整路径 - 子目录中的所有文件
x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
>>>x
['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']
>>> arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
>>> print(arr_txt)
['work.txt', '3ebooks.txt']
>>> import glob
>>> x = glob.glob("*.txt")
>>> x
['ale.txt', 'alunni2015.txt', 'assenze.text.txt', 'text2.txt', 'untitled.txt']
如果我需要文件的绝对路径:
>>> from path import path
>>> from glob import glob
>>> x = [path(f).abspath() for f in glob("F:\\*.txt")]
>>> for f in x:
... print(f)
...
F:\acquistionline.txt
F:\acquisti_2018.txt
F:\bootstrap_jquery_ecc.txt
如果我想要目录中的所有文件:
>>> x = glob.glob("*")
import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)
> output
['a simple game.py', 'data.txt', 'decorator.py']
import pathlib
>>> flist = []
>>> for p in pathlib.Path('.').iterdir():
... if p.is_file():
... print(p)
... flist.append(p)
...
error.PNG
exemaker.bat
guiprova.mp3
setup.py
speak_gui2.py
thumb.PNG
如果你想使用列表理解
>>> flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]
*您也可以使用pathlib.Path()而不是pathlib.Path(“。”)
import pathlib
py = pathlib.Path().glob("*.py")
for file in py:
print(file)
输出:
stack_overflow_list.py
stack_overflow_list_tkinter.py
import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
for f in t:
y.append(f)
>>> y
['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']
>>> import os
>>> x = next(os.walk('F://python'))[2]
>>> x
['calculator.bat','calculator.py']
>>> import os
>>> next(os.walk('F://python'))[1] # for the current dir use ('.')
['python3','others']
walk
>>> for r,d,f in os.walk("F:\\_python"):
... for dirs in d:
... print(dirs)
...
.vscode
pyexcel
pyschool.py
subtitles
_metaprogramming
.ipynb_checkpoints
>>> import os
>>> x = [f.name for f in os.scandir() if f.is_file()]
>>> x
['calculator.bat','calculator.py']
# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.
>>> import os
>>> with os.scandir() as i:
... for entry in i:
... if entry.is_file():
... print(entry.name)
...
ebookmaker.py
error.PNG
exemaker.bat
guiprova.mp3
setup.py
speakgui4.py
speak_gui2.py
speak_gui3.py
thumb.PNG
>>>
在此示例中,我们查找包含在所有目录及其子目录中的文件数。
import os
def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"
print(count("F:\\python"))
> output
>'F:\\\python' : 12057 files'
一个脚本,用于在计算机中查找所有类型的文件(默认值:pptx)并将其复制到新文件夹中。
import os
import shutil
from path import path
destination = "F:\\file_copied"
# os.makedirs(destination)
def copyfile(dir, filetype='pptx', counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print("------------------------")
print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")
for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == '_':
# copyfile(dir, filetype='pdf')
copyfile(dir, filetype='txt')
> Output
_compiti18\Compito Contabilità 1\conti.txt
_compiti18\Compito Contabilità 1\modula4.txt
_compiti18\Compito Contabilità 1\moduloa4.txt
------------------------
==> Found in: `_compiti18` : 3 files
如果您要创建包含所有文件名的txt文件:
import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "\n"
file.write(mylist)
"""We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""
import os
# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
for root, dirs, files in os.walk("D:\\"):
for file in files:
listafile.append(file)
percorso.append(root + "\\" + file)
testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "\n")
with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "\n")
os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
这是以前代码的较短版本。如果需要从其他位置开始,请更改文件夹从哪里开始查找文件。此代码在我的计算机上生成一个50 MB的文本文件,其中包含少于500.000行,文件包含完整路径。
import os
with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\\"):
for file in f:
filewrite.write(f"{r + file}\n")
使用此功能,您可以创建一个txt文件,该文件将具有您查找的文件类型的名称(例如,pngfile.txt),其中包含该类型的所有文件的所有完整路径。我认为它有时会很有用。
import os
def searchfiles(extension='.ttf', folder='H:\\'):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("H:\\"):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}\n")
# looking for png file (fonts) in the hard disk H:\
searchfiles('.png', 'H:\\')
H:\4bs_18\Dolphins5.png
H:\4bs_18\Dolphins6.png
H:\4bs_18\Dolphins7.png
H:\5_18\marketing html\assets\imageslogo2.png
H:\7z001.png
H:\7z002.png
只获取文件列表(无子目录)的单行解决方案:
filenames = next(os.walk(path))[2]
或绝对路径名:
paths = [os.path.join(path,fn) for fn in next(os.walk(path))[2]]
从目录及其所有子目录获取完整文件路径
import os
def get_filepaths(directory):
"""
This function will generate the file names in a directory
tree by walking the tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself),
it yields a 3-tuple (dirpath, dirnames, filenames).
"""
file_paths = [] # List which will store all of the full filepaths.
# Walk the tree.
for root, directories, files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths # Self-explanatory.
# Run the above function and store its results in a variable.
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")
print full_file_paths
将打印列表:
['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']
如果您愿意,可以打开并阅读内容,或只关注扩展名为“.dat”的文件,如下面的代码所示:
for f in full_file_paths:
if f.endswith(".dat"):
print f
/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat
从版本3.4开始,有内置的迭代器,它比os.listdir()
更有效:
pathlib
:版本3.4中的新功能。
>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]
根据PEP 428的说法,pathlib
库的目的是提供一个简单的类层次结构来处理文件系统路径以及用户对它们执行的常见操作。
os.scandir()
:版本3.5中的新功能。
>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]
请注意,os.walk()
使用os.scandir()
而不是3.5版本的os.listdir()
,根据PEP 471,它的速度提高了2-20倍。
我还建议您阅读下面的ShadowRanger评论。
>>> import sys
>>> sys.version
'2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
>>> m = map(lambda x: x, [1, 2, 3]) # Just a dummy lambda function
>>> m, type(m)
([1, 2, 3], <type 'list'>)
>>> len(m)
3
>>> import sys
>>> sys.version
'3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
>>> m = map(lambda x: x, [1, 2, 3])
>>> m, type(m)
(<map object at 0x000001B4257342B0>, <class 'map'>)
>>> len(m)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'map' has no len()
>>> lm0 = list(m) # Build a list from the generator
>>> lm0, type(lm0)
([1, 2, 3], <class 'list'>)
>>>
>>> lm1 = list(m) # Build a list from the same generator
>>> lm1, type(lm1) # Empty list now - generator already consumed
([], <class 'list'>)
E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir"
Folder PATH listing for volume Work
Volume serial number is 00000029 3655:6FED
E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR
¦ file0
¦ file1
¦
+---dir0
¦ +---dir00
¦ ¦ ¦ file000
¦ ¦ ¦
¦ ¦ +---dir000
¦ ¦ file0000
¦ ¦
¦ +---dir01
¦ ¦ file010
¦ ¦ file011
¦ ¦
¦ +---dir02
¦ +---dir020
¦ +---dir0200
+---dir1
¦ file10
¦ file11
¦ file12
¦
+---dir2
¦ ¦ file20
¦ ¦
¦ +---dir20
¦ file200
¦
+---dir3
'.'
和'..'
...
>>> import os
>>> root_dir = "root_dir" # Path relative to current dir (os.getcwd())
>>>
>>> os.listdir(root_dir) # List all the items in root_dir
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))] # Filter items and only keep files (strip out directories)
['file0', 'file1']
一个更详细的例子(code_os_listdir.py):
import os
from pprint import pformat
def _get_dir_content(path, include_folders, recursive):
entries = os.listdir(path)
for entry in entries:
entry_with_path = os.path.join(path, entry)
if os.path.isdir(entry_with_path):
if include_folders:
yield entry_with_path
if recursive:
for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
yield sub_entry
else:
yield entry_with_path
def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
path_len = len(path) + len(os.path.sep)
for item in _get_dir_content(path, include_folders, recursive):
yield item if prepend_folder_name else item[path_len:]
def _get_dir_content_old(path, include_folders, recursive):
entries = os.listdir(path)
ret = list()
for entry in entries:
entry_with_path = os.path.join(path, entry)
if os.path.isdir(entry_with_path):
if include_folders:
ret.append(entry_with_path)
if recursive:
ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
else:
ret.append(entry_with_path)
return ret
def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
path_len = len(path) + len(os.path.sep)
return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
def main():
root_dir = "root_dir"
ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
lret0 = list(ret0)
print(ret0, len(lret0), pformat(lret0))
ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
print(len(ret1), pformat(ret1))
if __name__ == "__main__":
main()
笔记:
有两种实现:
一个使用生成器(当然这里似乎没用,因为我立即将结果转换为列表)
经典之一(函数名以_old结尾)
使用递归(进入子目录)
对于每个实现,有两个功能:
一个以下划线(_)开头的:“私有”(不应该直接调用) - 这样做可以完成所有工作
public one(前一个包装器):它只是从返回的条目中剥离初始路径(如果需要)。这是一个丑陋的实现,但这是我在这一点上可以带来的唯一想法
在性能方面,生成器通常要快一点(考虑创建和迭代时间),但我没有在递归函数中测试它们,而且我在函数内部迭代内部生成器 - 不知道性能如何这是友好的
使用参数来获得不同的结果
输出:
(py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py"
<generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0',
'root_dir\\dir0\\dir00',
'root_dir\\dir0\\dir00\\dir000',
'root_dir\\dir0\\dir00\\dir000\\file0000',
'root_dir\\dir0\\dir00\\file000',
'root_dir\\dir0\\dir01',
'root_dir\\dir0\\dir01\\file010',
'root_dir\\dir0\\dir01\\file011',
'root_dir\\dir0\\dir02',
'root_dir\\dir0\\dir02\\dir020',
'root_dir\\dir0\\dir02\\dir020\\dir0200',
'root_dir\\dir1',
'root_dir\\dir1\\file10',
'root_dir\\dir1\\file11',
'root_dir\\dir1\\file12',
'root_dir\\dir2',
'root_dir\\dir2\\dir20',
'root_dir\\dir2\\dir20\\file200',
'root_dir\\dir2\\file20',
'root_dir\\dir3',
'root_dir\\file0',
'root_dir\\file1']
11 ['dir0\\dir00\\dir000\\file0000',
'dir0\\dir00\\file000',
'dir0\\dir01\\file010',
'dir0\\dir01\\file011',
'dir1\\file10',
'dir1\\file11',
'dir1\\file12',
'dir2\\dir20\\file200',
'dir2\\file20',
'file0',
'file1']
'.'
和'..'
。
使用scandir()而不是listdir()可以显着提高还需要文件类型或文件属性信息的代码的性能,因为如果操作系统在扫描目录时提供了此信息,则os.DirEntry对象会公开此信息。所有os.DirEntry方法都可以执行系统调用,但is_dir()和is_file()通常只需要系统调用符号链接; os.DirEntry.stat()总是需要在Unix上进行系统调用,但在Windows上只需要一个符号链接。
>>> import os
>>> root_dir = os.path.join(".", "root_dir") # Explicitly prepending current directory
>>> root_dir
'.\\root_dir'
>>>
>>> scandir_iterator = os.scandir(root_dir)
>>> scandir_iterator
<nt.ScandirIterator object at 0x00000268CF4BC140>
>>> [item.path for item in scandir_iterator]
['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1']
>>>
>>> [item.path for item in scandir_iterator] # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
[]
>>>
>>> scandir_iterator = os.scandir(root_dir) # Reinitialize the generator
>>> for item in scandir_iterator :
... if os.path.isfile(item.path):
... print(item.name)
...
file0
file1
笔记:
它类似于os.listdir
但它也更灵活(并提供更多功能),更Pythonic(在某些情况下,更快)dirpath
,dirnames
,filenames
)。
>>> import os
>>> root_dir = os.path.join(os.getcwd(), "root_dir") # Specify the full path
>>> root_dir
'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir'
>>>
>>> walk_generator = os.walk(root_dir)
>>> root_dir_entry = next(walk_generator) # First entry corresponds to the root dir (passed as an argument)
>>> root_dir_entry
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
>>>
>>> root_dir_entry[1] + root_dir_entry[2] # Display dirs and files (direct descendants) in a single list
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]] # Display all the entries in the previous list by their full path
['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1']
>>>
>>> for entry in walk_generator: # Display the rest of the elements (corresponding to every subdir)
... print(entry)
...
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])
笔记:
在幕后,它使用os.scandir
(旧版本的os.listdir
)
它通过在子文件夹中重复出现来进行繁重的工作/usr/src/Python-1.5/Makefile
)或相对的(如../../Tools/*/*.gif
),也可以包含shell样式的通配符。结果中包含损坏的符号链接(如在shell中)。
...
在版本3.5中更改:使用“**
”支持递归globs。
>>> import glob, os
>>> wildcard_pattern = "*"
>>> root_dir = os.path.join("root_dir", wildcard_pattern) # Match every file/dir name
>>> root_dir
'root_dir\\*'
>>>
>>> glob_list = glob.glob(root_dir)
>>> glob_list
['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
>>>
>>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list] # Strip the dir name and the path separator from begining
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> for entry in glob.iglob(root_dir + "*", recursive=True):
... print(entry)
...
root_dir\
root_dir\dir0
root_dir\dir0\dir00
root_dir\dir0\dir00\dir000
root_dir\dir0\dir00\dir000\file0000
root_dir\dir0\dir00\file000
root_dir\dir0\dir01
root_dir\dir0\dir01\file010
root_dir\dir0\dir01\file011
root_dir\dir0\dir02
root_dir\dir0\dir02\dir020
root_dir\dir0\dir02\dir020\dir0200
root_dir\dir1
root_dir\dir1\file10
root_dir\dir1\file11
root_dir\dir1\file12
root_dir\dir2
root_dir\dir2\dir20
root_dir\dir2\dir20\file200
root_dir\dir2\file20
root_dir\dir3
root_dir\file0
root_dir\file1
笔记:
使用os.listdir
对于大树(特别是如果启用递归),iglob是首选
允许基于名称的高级过滤(由于通配符)>>> import pathlib
>>> root_dir = "root_dir"
>>> root_dir_instance = pathlib.Path(root_dir)
>>> root_dir_instance
WindowsPath('root_dir')
>>> root_dir_instance.name
'root_dir'
>>> root_dir_instance.is_dir()
True
>>>
>>> [item.name for item in root_dir_instance.glob("*")] # Wildcard searching for all direct descendants
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()] # Display paths (including parent) for files only
['root_dir\\file0', 'root_dir\\file1']
笔记:
这是实现我们目标的一种方式
这是处理路径的OOP风格
提供许多功能os.listdir
的一个(薄)包装器,带有缓存
def listdir(path):
"""List directory contents, using cache."""
try:
cached_mtime, list = cache[path]
del cache[path]
except KeyError:
cached_mtime, list = -1, []
mtime = os.stat(path).st_mtime
if mtime != cached_mtime:
list = os.listdir(path)
list.sort()
cache[path] = mtime, list
return list
#!/usr/bin/env python3
import sys
from ctypes import Structure, \
c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
CDLL, POINTER, \
create_string_buffer, get_errno, set_errno, cast
DT_DIR = 4
DT_REG = 8
char256 = c_char * 256
class LinuxDirent64(Structure):
_fields_ = [
("d_ino", c_ulonglong),
("d_off", c_longlong),
("d_reclen", c_ushort),
("d_type", c_ubyte),
("d_name", char256),
]
LinuxDirent64Ptr = POINTER(LinuxDirent64)
libc_dll = this_process = CDLL(None, use_errno=True)
# ALWAYS set argtypes and restype for functions, otherwise it's UB!!!
opendir = libc_dll.opendir
readdir = libc_dll.readdir
closedir = libc_dll.closedir
def get_dir_content(path):
ret = [path, list(), list()]
dir_stream = opendir(create_string_buffer(path.encode()))
if (dir_stream == 0):
print("opendir returned NULL (errno: {:d})".format(get_errno()))
return ret
set_errno(0)
dirent_addr = readdir(dir_stream)
while dirent_addr:
dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
dirent = dirent_ptr.contents
name = dirent.d_name.decode()
if dirent.d_type & DT_DIR:
if name not in (".", ".."):
ret[1].append(name)
elif dirent.d_type & DT_REG:
ret[2].append(name)
dirent_addr = readdir(dir_stream)
if get_errno():
print("readdir returned NULL (errno: {:d})".format(get_errno()))
closedir(dir_stream)
return ret
def main():
print("{:s} on {:s}\n".format(sys.version, sys.platform))
root_dir = "root_dir"
entries = get_dir_content(root_dir)
print(entries)
if __name__ == "__main__":
main()
笔记:
它从libc(在当前进程中加载)加载三个函数并调用它们(有关更多详细信息,请检查[SO]: How do I check whether a file exists without exceptions? (@CristiFati's answer) - 项目#4中的最后一个注释。)。这将使这种方法非常接近Python / C边缘
LinuxDirent64是来自我的机器的[man7]: dirent.h(0P)(DT_常量)的struct dirent64的ctypes表示:Ubtu 16 x64(4.10.0-40-generic和libc6-dev:amd64)。在其他版本/版本上,结构定义可能不同,如果是,则应更新ctypes别名,否则将产生未定义的行为
它以os.walk
的格式返回数据。我没有费心去做它的递归,但从现有的代码开始,这将是一个相当简单的任务
在Win上一切都是可行的,数据(库,函数,结构,常量,......)也不同
输出:
[cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py
3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]
>>> import os, win32file, win32con
>>> root_dir = "root_dir"
>>> wildcard = "*"
>>> root_dir_wildcard = os.path.join(root_dir, wildcard)
>>> entry_list = win32file.FindFilesW(root_dir_wildcard)
>>> len(entry_list) # Don't display the whole content as it's too long
8
>>> [entry[-2] for entry in entry_list] # Only display the entry names
['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")] # Filter entries and only display dir names (except self and parent)
['dir0', 'dir1', 'dir2', 'dir3']
>>>
>>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)] # Only display file "full" names
['root_dir\\file0', 'root_dir\\file1']
笔记:
win32file.FindFilesW
是[GitHub]: mhammond/pywin32 - Python for Windows (pywin32) Extensions的一部分,它是WINAPIs的Python包装器
文档链接来自ActiveState,因为我没有找到任何PyWin32官方文档笔记:
os.listdir
和os.scandir
使用opendir / readdir / closedir([MS.Docs]: FindFirstFileW function / [MS.Docs]: FindNextFileW function / [MS.Docs]: FindClose function)(通过[GitHub]: python/cpython - (master) cpython/Modules/posixmodule.c)win32file.FindFilesW
也使用那些(Win特定)功能(通过[GitHub]: mhammond/pywin32 - (master) pywin32/win32/src/win32file.i)filter_func=lambda x: True
(这不会删除任何东西)和_get_dir_content里面的东西:if not filter_func(entry_with_path): continue
(如果函数失败了一个输入,它将被跳过),但代码越复杂,执行所需的时间就越长grep
/ findstr
)或输出格式化,但我不会坚持它。另外,我故意使用os.system
而不是subprocess.Popen
。
(py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
dir0
dir1
dir2
dir3
file0
file1
通常,这种方法应该被避免,因为如果某些命令输出格式在OS版本/风格之间略有不同,则解析代码也应该被调整;更不用说语言环境之间的差异了)。我真的很喜欢adamk's answer,建议你使用同名模块中的glob()
。这允许您与*
s进行模式匹配。
但正如其他人在评论中指出的那样,glob()
可能因不一致的斜线方向而被绊倒。为了解决这个问题,我建议你在join()
模块中使用expanduser()
和os.path
函数,也可以在getcwd()
模块中使用os
函数。
例如:
from glob import glob
# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')
以上是非常糟糕的 - 路径已被硬编码,并且只会在驱动器名称和\
s硬编码到路径之间的Windows上工作。
from glob import glob
from os.path import join
# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))
上面的工作更好,但它依赖于文件夹名称Users
,它经常在Windows上找到,而不是经常在其他操作系统上找到。它还依赖于具有特定名称admin
的用户。
from glob import glob
from os.path import expanduser, join
# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))
这适用于所有平台。
另一个很好的例子,它可以跨平台完美运行,并且做一些不同的事情
from glob import glob
from os import getcwd
from os.path import join
# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))
希望这些示例可以帮助您了解在标准Python库模块中可以找到的一些函数的强大功能。