如何通过循环将大熊猫添加到数据框中?

问题描述 投票:0回答:1

我正在使用下一个脚本来分析实验中的一些数据。最后,我想将某些数据保存为.csv格式的DataFrame以便继续使用。但是我很难在代码末尾的位置放置此数据框。

import os
import numpy as np
import pandas as pd
import pylab as plt
import scipy
from scipy.optimize import curve_fit


DATE = '2020-02-18'
SAMPLE_NAME = 'TEST'

os.chdir('XXX/' + DATE+ '/' + SAMPLE_NAME)
MainFolder = 'XXX/' + DATE + '/' + SAMPLE_NAME
print('\n' + 'You are working on this directory: \n', os.getcwd(), '\n')

def LoadData(files, subfolders):
''' This function loads the data from the files '''
print('The sweeps in the folder are:')
df = []
for file_name in files:
        if file_name.endswith('.csv'):
            print('      ' + os.path.sep + file_name)
            df.append(pd.read_csv(subfolders + os.path.sep + file_name, delimiter=','))
return df

def fit_sin_LD(t_LD, y_LD):
def fit_sin_APD(t_APD, y_APD):
def Plots():


for root, dirs, files in os.walk(MainFolder, topdown=True):
    for subfolders in dirs:
        print(os.path.sep + subfolders)
        for subpath, subdirs, sweepfiles in os.walk(MainFolder + os.path.sep + subfolders + os.path.sep, topdown=True):
        counter = 1
        for dataFromFiles in LoadData(sweepfiles, subfolders):
            print()
            t_LD  = dataFromFiles['TimeMATH']
            y_LD  = dataFromFiles['VoltsMATH']
            fitting_LD = fit_sin_LD(t_LD, y_LD)
            t_APD = dataFromFiles['TimeCH4']
            y_APD = dataFromFiles['VoltsCH4']
            fitting_APD = fit_sin_APD(t_APD, y_APD)
            Phi = np.array([[fitting_LD["phase_LD"],fitting_APD["phase_APD"]]])
            df = pd.DataFrame(Phi, columns=["phase_LD","phase_APD"])
            print(df)
            print('_'*50)
            while not sweepfiles[counter].endswith('.csv'):
                counter = counter + 1
            print('The sweepfile is:', sweepfiles[counter])
            counter = counter + 1
            print('Phase_Shift:', fitting_LD["phase_LD"]-fitting_APD["phase_APD"])
            print('='*30)
            Plots()

我有这个输出:

You are working on this directory: 
 'XXX/' + DATE + '/' + TEST 

\Run_2
The sweeps in the folder are:
      \TEST_sweep_1.csv
      \TEST_sweep_2.csv
      \TEST_sweep_3.csv
      \TEST_sweep_4.csv
      \TEST_sweep_5.csv

   phase_LD  phase_APD
0  0.799186   0.787802
__________________________________________________
The sweepfile is: TEST_sweep_1.csv
Phase_Shift: 0.01138438229758243
==============================

   phase_LD  phase_APD
0  0.826551   0.810993
__________________________________________________
The sweepfile is: TEST_sweep_2.csv
Phase_Shift: 0.015558041120443344
==============================

   phase_LD  phase_APD
0  0.834952   0.811156
__________________________________________________
The sweepfile is: TEST_sweep_3.csv
Phase_Shift: 0.023795986346148656
==============================

   phase_LD  phase_APD
0  0.856211   0.842482
__________________________________________________
The sweepfile is: TEST_sweep_4.csv
Phase_Shift: 0.013728505278350567
==============================

   phase_LD  phase_APD
0  0.856638   0.833881
__________________________________________________
The sweepfile is: TEST_sweep_5.csv
Phase_Shift: 0.022756757048449816
==============================

我想在循环后获得一个单独的DataFrame,其中收集了所有数据(连接/附加),因此我可以轻松地使用更少的数据,我的意思是这样的:

       phase_LD  phase_APD
0      0.799186   0.787802
1      0.826551   0.810993
2      0.834952   0.811156
3      0.856211   0.842482
4      0.856638   0.833881

任何提示?谢谢!

python pandas loops dataframe
1个回答
2
投票

我将部分数据帧存储在列表中,然后将它们全部合并:

...
elts = []
for root, dirs, files in os.walk(MainFolder, topdown=True):
    for subfolders in dirs:
        ...
        for dataFromFiles in LoadData(sweepfiles, subfolders):
            ...
            df = pd.DataFrame(Phi, columns=["phase_LD","phase_APD"])
            elts.append(df)
            ...
final_df = pd.concat(elts)
© www.soinside.com 2019 - 2024. All rights reserved.