如何循环浏览文件夹以运行文件夹中每个项目的脚本?

问题描述 投票:0回答:2

我有一个脚本,该脚本从excel文件中获取示例,并将该示例作为csv吐出。如何遍历具有多个excel文件的文件夹以避免每次脚本运行都要更改文件的任务?我相信我可以使用glob,但这似乎只是将所有excel文件合并在一起。

import pandas as pd
df = pd.read_excel(r"C:\Users\bryanmccormack\Desktop\Test.xlsm")
df2 = df.loc[(df['Track Item']=='Y')]

def sample_per(df2):
    if len(df2) <= 10000:
        return df2.sample(frac=0.05)
    elif len(df2) >= 15000:
        return df2.sample(frac=0.03)
    else:
        return df2.sample(frac=0.01)

def create_dataframe(data):
    dataframe = pd.DataFrame(data)
    return sample_per(df2)

final = sample_per(df2)

df.loc[df['Retailer Item ID'].isin(final['Retailer Item ID']), 'Track Item'] = 'Audit'

df.to_csv('Test.csv',index=False)

这是golb / folder的代码,但是不起作用:导入球以pd格式导入熊猫

def DataFrameCreator(folder):
    all_files = glob.glob(folder + "/*.xlsx")
    df_master = pd.DataFrame()
list_ = []
for file_ in all_files:
    df = pd.read_csv(file_)
    list_.append(df)
df_master = pd.concat(list_, sort=True)
return df_master

test_folder = (r"C:\Users\*******\Desktop\Test_Folder")

DataFrameCreator(test_folder)
python pandas glob
2个回答
0
投票

这将返回目录中可以迭代的所有文件的列表:

from os import walk
from os.path import join

def retrieve_file_paths(dirName):       #Declare the function to return all file paths of the particular directory
    filepaths = []                      #setup file paths variable
    for root, directories, files in walk(dirName):   #Read all directory, subdirectories and file lists
        for filename in files:
            filepath = join(root, filename)     #Create the full filepath by using os module.
            filepaths.append(filepath)

    return filepaths      #return all paths

0
投票

您走在正确的轨道上,但是使用pd.concat()负责合并excel文件。此代码段应为您提供帮助:

import pandas as pd
import glob

# use regex style to get all files with xlsx extension
root_dir = r"excel/*.xlsx"
# this call of glob only gives xlsx files in the root_dir
excel_files = glob.glob(root_dir)

# iterate over the files
for xls in excel_files:
    # read
    df_excel = pd.read_excel(xls)
    # manipulate as you wish here
    df_new = df_excel.sample(frac=0.1)
    # store
    df_new.to_csv(xls.replace("xlsx", "csv"))
© www.soinside.com 2019 - 2024. All rights reserved.