在一系列 csv 上循环函数

问题描述 投票:0回答:1

我这里有一个代码可以查找强度图中特定峰的平均大小和标准差。我让它适用于单个文件,但我希望能够同时运行多个文件,并将平均值和标准差合并为一个平均值和一个标准差。我一直无法使用目录来使其工作。我也无法将它们组合成一个巨大的数据文件,因为重叠会弄乱我的数据,所以我需要先单独计算它们,然后再将它们合并在一起。任何帮助将不胜感激!以下是我的代码

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
from shapely.geometry import LineString


df = pd.read_csv("C:\\Users\\Kyle\\Desktop\\Plot_Values1.csv", skiprows=0) # fill with the position of your CSV file for the intensity plot and skip the first line


Distance = df.iloc[:, 0] # calls the 1st column 
intensity = df.iloc[:, 1] # calls the 2nd column 
intensity_norm = (intensity - intensity.min())/ (intensity.max() - intensity.min()) # normalizing of the intensity

  
x = Distance
y = intensity_norm

#y2= np.full_like(intensity_norm, 1 / math.exp(2))  # array made to be the same size as the intensity filled with the scalar value 1/e^2
y2= np.full_like(intensity_norm, 0.5)  # array made to be the same size as the intensity filled with the scalar value 0.5 (the full width half max)

plt.plot(x, intensity_norm) # intensity plot
#plt.axhline(y=1/math.exp(2), color='r', linestyle='-') # horizontal line at 0.5 (the full width half max)
plt.axhline(y=0.5, color='r', linestyle='-') # horizontal line at 1/e^2

first_line = LineString(np.column_stack((x, y2))) # makes the first linestring based off of the horizontal line
second_line = LineString(np.column_stack((x, y))) # makes line string based off of the normalized intensity plot
intersection = first_line.intersection(second_line) # finds the intersecting points between first and second linestring
xValues = [p.x for p in intersection.geoms] # calls only the X values of the intersecting points
#print(xValues)


diff_list = []
for i in range(1,len(xValues)): # for every number in the range of values excluding the first value becuase there is no number before it
    xV = xValues[i] - xValues[i-1] #subtracts every number in the set by the previous value in the set
    diff_list.append(xV) #creats a new list with the resulting values from previous line


size=[]
for x in range(len(diff_list)):
    if 35<diff_list[x]<70:     # remove every number smaller then a certain value as well as those greater then a certain value determined by you
        size.append(diff_list[x])  #have size represent this new data set
        
average = np.average(size) # average all the sizes 
stdev = np.std(size) # stdev of all the sizes

print(average, stdev)

我尝试将它们结合起来,但这会带来问题,结果对于我正在寻找的内容来说不准确。我也尝试过将它们放在一个目录中并运行代码,但这也不起作用

python pandas function loops directory
1个回答
0
投票

你需要学习使用函数,基本上将你的代码包装在一个函数中,这个函数将在一个文件上工作。

def process_file(file_name):
    ...
    return average, std_dev, N

然后你需要在多个文件上运行它

files = ['file1.csv', 'file2.csv']
results = []
for file in files:
    results.append(process_file(file))

最后你需要合并结果

total_average = 0
total_var = 0
total_N = 0
for (average, std_dev, N) in results:
    total_average += average * N
    total_var += std_dev**2 * N
    total_N += N
total_average = total_average / N
total_std_dev = sqrt(total_var / N)
© www.soinside.com 2019 - 2024. All rights reserved.