如何将所有csv文件合并为一个文件并将数据堆叠在原始标题下?

问题描述 投票:0回答:3

我是 python 新手,正在尝试了解数据操作。

我有一个包含多个文件的文件夹。其中一些是 csv 的。我想将所有 csv 合并 - 大约 400 个 csv 合并到一个 csv 中,并将所有数据堆叠起来

例如,如果第一个 csv 有一个数据框 -

transcript  confidence  from    to  speaker Negative Neutral    Positive    compound
thank you   0.85    1.39    1.65    0   0   0.754              0.246         0.7351

第二个有一个数据框:

 transcript confidence  from    to  speaker Negative Neutral    Positive compound
    welcome     0.95       1.39   1.65  0   0       0.754        0.201   0.8351

我希望最终的 df 看起来像 -

transcript  confidence from to  speaker Negative Neutral      Positive       compound
thank you   0.85      1.39  1.65    0   0       0.754              0.246         0.7351
welcome     0.95      1.39  1.65    0   0       0.754              0.201         0.8351

我试过了-

import glob
import pandas as pd

# Folder containing the .csv files to merge
file_path = "C:\\Users\\Desktop"

# This pattern \\* selects all files in a directory
pattern = file_path + "\\*"
files = glob.glob(pattern)

# Import first file to initiate the dataframe
df = pd.read_csv(files[0],encoding = "utf-8", delimiter = ",")

# Append all the files as dataframes to the first one
for file in files[1:len(file_list)]:
    df_csv = pd.read_csv(file,encoding = "utf-8", delimiter = ",")
    df = df.append(df_csv)

但是没有成功。我该如何解决这个问题?

python pandas numpy glob
3个回答
1
投票

这应该有帮助:

import pandas as pd
import glob
import os.path

file_path = "C:/Users/Desktop"

data = []
for csvfile in glob.glob(os.path.join(file_path, "*.csv")):
    df = pd.read_csv(csvfile, encoding="utf-8", delimiter=",")
    data.append(df)

data = pd.concat(data, ignore_index=True)

1
投票

注意:- 我建议您不要从桌面获取所有 CSV 文件。请将其保存到一个目录,如果您想将来分析该特定数据集,它也会很有帮助。

解决方案之前的基本要求:-您要合并的所有 CSV 文件应位于同一目录中。

# Import all Important Libraries

# 'os' module will provide a portable way of using an operating system with dependent functionality such as 'Open File', and much more...
import os

# 'glob' module helps to find all the pathnames matched with a specified pattern according to the rules. Such as '*.csv' which is used in our case for finding all CSV Files
import glob

# 'pandas' is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool
import pandas as pd

# First of all declare 'path' variable for finding all the CSV  
path = "C:/Users/Desktop"

# Store all files in 'all_files' using 'glob' function. and a pattern used is '*.csv' Which will find all the CSV and 'join' it
all_files = glob.glob(os.path.join(path, "*.csv"))

# Initialize 'DataFrame' Variable from each fetched CSV file
df_from_each_file = (pd.read_csv(csvfiles) for csvfiles in all_files) 
# if you have 'Seperator' then use 'pd.read_csv(csvfiles, sep='seprator pattern ('\', ',', etc.)')' in above code

# Concat all the 'DataFrame' using 'pd.concat()'
df_merged   = pd.concat(df_from_each_file, ignore_index=True)

# Store Merged CSV Files into 'merged.csv' File
df_merged.to_csv("merged.csv")

0
投票

我使用了@JayPatel 解决方案并升级到automated.py 文件。如果您添加更多 CSV 文件并且之前有旧的合并,它会起作用。没什么特别的,但确实有效。

    import os
    import glob
    import pandas as pd
    
    stop = 0
    while stop != 1:
        if glob.glob('merged.csv') == []:
            all_files = glob.glob(os.path.join('*.csv'))
            df_from_each_file = (pd.read_csv(csvfiles) for csvfiles in all_files) 
            df_merged   = pd.concat(df_from_each_file, ignore_index=True)
            df_merged.to_csv("merged.csv")
            print('The merged.csv file was created successfully')
            stop = 1
        else:
            print('You need to delete a previous merged.csv file first')
            delete = input('Do you want to delete it? (Y/N): ')
            if delete == 'Y':
                os.remove('merged.csv')
            else:
                stop = 1
© www.soinside.com 2019 - 2024. All rights reserved.