我当前正在运行一个脚本,在其中获取整个音频文件并使用 Python 中的
audiofile
库(反过来,使用 soundfile
库)保存它。
我试图模仿
audiofile.read()
的行为,我给它一个偏移量和持续时间(以秒为单位),并且只返回该特定声音间隔的相应 numpy 数组。这里唯一的区别是,我已经将整个音频文件作为 numpy 数组,并且需要从中提取正确的开始和结束间隔,而不是像库要求的那样接收 .wav
文件。
我尝试复制计算开始和结束的逻辑,然后从
sound_file[start:end]
中切片numpy数组,但这似乎不起作用。我不太熟悉信号处理如何处理音频文件,所以我在这里有点不知所措,任何帮助将不胜感激!
这是我的代码:
我希望它接受一个 numpy 数组,并返回相同的 numpy 数组,切片后仅包含指定的开始时间 + 持续时间。我加载的所有文件最初都是 96KHz,被重新采样到 16KHz 并保存为 numpy 数组。
from audiofile.core.utils import duration_in_seconds
import audmath
def read_from_np(
file,
duration,
offset,
sampling_rate = 16000
):
if duration is not None:
duration = duration_in_seconds(duration, sampling_rate)
if np.isnan(duration):
duration = None
if offset is not None and offset != 0:
offset = duration_in_seconds(offset, sampling_rate)
if np.isnan(offset):
offset = None
# Support for negative offset/duration values
# by counting them from end of signal
#
if offset is not None and offset < 0 or duration is not None and duration < 0:
# Import duration here to avoid circular imports
from audiofile.core.info import duration as get_duration
signal_duration = get_duration(file)
# offset | duration
# None | < 0
if offset is None and duration is not None and duration < 0:
offset = max([0, signal_duration + duration])
duration = None
# None | >= 0
if offset is None and duration is not None and duration >= 0:
if np.isinf(duration):
duration = None
# >= 0 | < 0
elif offset is not None and offset >= 0 and duration is not None and duration < 0:
if np.isinf(offset) and np.isinf(duration):
offset = 0
duration = None
elif np.isinf(offset):
duration = 0
else:
if np.isinf(duration):
offset = min([offset, signal_duration])
duration = np.sign(duration) * signal_duration
orig_offset = offset
offset = max([0, offset + duration])
duration = min([-duration, orig_offset])
# >= 0 | >= 0
elif offset is not None and offset >= 0 and duration is not None and duration >= 0:
if np.isinf(offset):
duration = 0
elif np.isinf(duration):
duration = None
# < 0 | None
elif offset is not None and offset < 0 and duration is None:
offset = max([0, signal_duration + offset])
# >= 0 | None
elif offset is not None and offset >= 0 and duration is None:
if np.isinf(offset):
duration = 0
# < 0 | > 0
elif offset is not None and offset < 0 and duration is not None and duration > 0:
if np.isinf(offset) and np.isinf(duration):
offset = 0
duration = None
elif np.isinf(offset):
duration = 0
elif np.isinf(duration):
duration = None
else:
offset = signal_duration + offset
if offset < 0:
duration = max([0, duration + offset])
else:
duration = min([duration, signal_duration - offset])
offset = max([0, offset])
# < 0 | < 0
elif offset is not None and offset < 0 and duration is not None and duration < 0:
if np.isinf(offset):
duration = 0
elif np.isinf(duration):
duration = -signal_duration
else:
orig_offset = offset
offset = max([0, signal_duration + offset + duration])
duration = min([-duration, signal_duration + orig_offset])
duration = max([0, duration])
# Convert to samples
#
# Handle duration first
# and returned immediately
# if duration == 0
if duration is not None and duration != 0:
duration = audmath.samples(duration, sampling_rate)
if duration == 0:
from audiofile.core.info import channels as get_channels
channels = get_channels(file)
if channels > 1 or always_2d:
signal = np.zeros((channels, 0))
else:
signal = np.zeros((0,))
return signal, sampling_rate
if offset is not None and offset != 0:
offset = audmath.samples(offset, sampling_rate)
else:
offset = 0
start = offset
# duration == 0 is handled further above with immediate return
if duration is not None:
stop = duration + start
return np.expand_dims(file[0, start:stop], 0)
你的代码归结为
return np.expand_dims(file[0, start:stop], 0)
这是正确的。
因此,如果您对结果不满意, 这是由于计算了错误的
(start, stop)
对,
也就是说,错误的 (offset, duration)
对。
采样率显然固定在
16_000
每秒的样本数。
通道数可以是1
或2
,这看起来令人担忧。
有大量的可选行为 与
offset
和 duration
参数相关。
摆脱它。
专注于编写一个接受的simple助手
一个偏移量always是一个非负整数,
持续时间“始终”为正整数。
使用 assert
或 raise
使得 None
或负数
会因致命错误而爆炸。接下来,关注始终具有以下特征的音频片段:
相同数量的通道。
到那时,做对事情就不难了。