使用正则表达式过滤目录,并将过滤后的文件输出到另一个目录

问题描述 投票:0回答:1

我只是试图创建一个运行在特定目录中所有.sql文件中的python 3程序,然后应用添加的正则表达式;在某个实例之后,将对文件所做的更改写入各自文件名相同的单独目录。

因此,如果我在“ / home / files”目录中有file1.sql和file2.sql,则在运行程序后,输出应将这两个文件写入“ / home / new_files”,而无需更改原始内容文件。

这是我的代码:

import glob
import re
folder_path = "/home/files/d_d"
file_pattern = "/*sql"
folder_contents = glob.glob(folder_path + file_pattern)


for file in folder_contents:
    print("Checking", file)
for file in folder_contents:
    read_file = open(file, 'rt',encoding='latin-1').read()
    #words=read_file.split()
    with open(read_file,"w") as output:
        output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))

我收到文件名错误的时间过长:“ CREATE EXTERNAL TABLe”,而且我也不知道我将在我的代码中放置输出路径(/ home / files / new_dd)的位置。

有什么想法或建议吗?

python regex python-3.x glob os.path
1个回答
0
投票

使用read_file = open(file, 'rt',encoding='latin-1').read(),文件的全部内容为被用作文件描述符。此处提供的代码遍历使用glob.glob模式打开的文件名,该模式打开以读取,处理数据和打开以写入(假设文件夹newfile_sqls已经存在,如果没有,则错误将上升FileNotFoundError: [Errno 2] No such file or directory

import glob
import os
import re

folder_path = "original_sqls"
#original_sqls\file1.sql, original_sqls\file2.sql, original_sqls\file3.sql
file_pattern = "*sql"
# new/modified files folder
output_path = "newfile_sqls"

folder_contents = glob.glob(os.path.join(folder_path,file_pattern))

# iterate over file names
for file_ in [os.path.basename(f) for f in folder_contents]:

    # open to read
    with open(os.path.join(folder_path,file_), "r") as inputf:
        read_file = inputf.read()

    # use variable 'read_file' here
    tmp = re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', read_file, flags=re.DOTALL)

    # open to write to (previouly created) new folder
    with open(os.path.join(output_path,file_), "w") as output:
        output.writelines(tmp)
© www.soinside.com 2019 - 2024. All rights reserved.