我正在使用深度学习对声音进行分类,我的问题是当我尝试使用 librosa 中的 lib.load() 将 .wav 文件转换为频谱图时,我的内存不足。
split = ['train','val']
categories=['URTI', 'Healthy', 'Asthma', 'COPD', 'LRTI', 'Bronchiectasis','Pneumonia', 'Bronchiolitis']
files_loc = "path"
i=0
for s in split:
for cat in categories:
print('-' * 100)
print('working on ' + cat +" "+str(s)+" "+ '...')
print('-' * 100)
files = [f for f in listdir(files_loc + s + '/' + cat + '/') if isfile(join(files_loc + s + '/' + cat + '/', f)) and is_wav(f)]
for f in files:
convert_to_spec_image(file_loc = files_loc, category=cat, filename=f, is_train=(s == 'train'), verbose=False)
i=i+1
print("We have processed: "+str(i)+" "+ str((i/773*100))+" % "+" so far")
函数convert_to_spec_image是这样的:
#create images using librosa spectogram
def convert_to_spec_image(file_loc, filename, category, is_train=False, verbose=False):
'''
Converts audio file to spec image
Input file includes path
Saves the file to a png image in the save_directory
'''
train_ = 'train/'
val_ = 'val/'
loc = file_loc + train_ + category + '/' + filename
if is_train == False:
loc = file_loc + val_ + category + '/' + filename
if verbose == True:
print('reading and converting ' + filename + '...')
y, sr = lb.load(loc)
#Plot signal in
plt.figure(figsize=(10,3))
src_ft = lb.stft(y)
src_db = lb.amplitude_to_db(abs(src_ft))
specshow(src_db, sr=sr, x_axis='time', y_axis='hz')
plt.ylim(0, 5000)
save_directory = "C:/Users/raulf/Desktop/espectograms2/"
filename_img = filename.split('.wav')[0]
save_loc = save_directory + train_ + category + '/' + filename_img + '.png'
if is_train == False:
save_loc = save_directory + val_ + category + '/' + filename_img + '.png'
plt.savefig(save_loc)
plt.close('all')
if verbose == True:
print(filename + ' converted!')
plt.close('all')
我正在尝试重用此 Kaggle Notebook 中的代码: https://www.kaggle.com/danaelisanicolas/cnn-part-3-create-spectrogram-images
提前致谢
我碰巧遇到了同样的问题并添加了我的发现。
librosa.load
可能存在内存泄漏问题,但不是您的情况,请检查他的链接。
您的问题来自
specshow
与 matplotlib
相关。matplotlib
:
import matplotlib
matplotlib.use('Agg')
并关闭所有资源,例如:
y, sr = librosa.load(audio_path)
fig, ax = plt.subplots(nrows=1, ncols=1)
spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)
color_mesh = librosa.display.specshow(spec, sr=sr, x_axis='time', y_axis='log', ax=ax)
ax.set(title='Spectrogram')
fig.colorbar(color_mesh, format='%+2.0f dB', ax=ax)
fig.savefig('Spectrogram.jpg')
fig.clear()
plt.cla()
plt.clf()
plt.close('all')
del color_mesh
del ax
del fig
参考链接