如何从表示音频的 numpy 数组中提取持续时间和偏移量?

问题描述 投票:0回答:1

我当前正在运行一个脚本,在其中获取整个音频文件并使用 Python 中的

audiofile
库(反过来,使用
soundfile
库)保存它。

我试图模仿

audiofile.read()
的行为,我给它一个偏移量和持续时间(以秒为单位),并且只返回该特定声音间隔的相应 numpy 数组。这里唯一的区别是,我已经将整个音频文件作为 numpy 数组,并且需要从中提取正确的开始和结束间隔,而不是像库要求的那样接收
.wav
文件。

我尝试复制计算开始和结束的逻辑,然后从

sound_file[start:end]
中切片numpy数组,但这似乎不起作用。我不太熟悉信号处理如何处理音频文件,所以我在这里有点不知所措,任何帮助将不胜感激!

这是我的代码:

我希望它接受一个 numpy 数组,并返回相同的 numpy 数组,切片后仅包含指定的开始时间 + 持续时间。我加载的所有文件最初都是 96KHz,被重新采样到 16KHz 并保存为 numpy 数组。


from audiofile.core.utils import duration_in_seconds
import audmath

def read_from_np(
    file,
    duration,
    offset,
    sampling_rate = 16000
):

    if duration is not None:
        duration = duration_in_seconds(duration, sampling_rate)
        if np.isnan(duration):
            duration = None
    if offset is not None and offset != 0:
        offset = duration_in_seconds(offset, sampling_rate)
        if np.isnan(offset):
            offset = None

    # Support for negative offset/duration values
    # by counting them from end of signal
    #
    if offset is not None and offset < 0 or duration is not None and duration < 0:
        # Import duration here to avoid circular imports
        from audiofile.core.info import duration as get_duration

        signal_duration = get_duration(file)
    # offset | duration
    # None   | < 0
    if offset is None and duration is not None and duration < 0:
        offset = max([0, signal_duration + duration])
        duration = None
    # None   | >= 0
    if offset is None and duration is not None and duration >= 0:
        if np.isinf(duration):
            duration = None
    # >= 0   | < 0
    elif offset is not None and offset >= 0 and duration is not None and duration < 0:
        if np.isinf(offset) and np.isinf(duration):
            offset = 0
            duration = None
        elif np.isinf(offset):
            duration = 0
        else:
            if np.isinf(duration):
                offset = min([offset, signal_duration])
                duration = np.sign(duration) * signal_duration
            orig_offset = offset
            offset = max([0, offset + duration])
            duration = min([-duration, orig_offset])
    # >= 0   | >= 0
    elif offset is not None and offset >= 0 and duration is not None and duration >= 0:
        if np.isinf(offset):
            duration = 0
        elif np.isinf(duration):
            duration = None
    # < 0    | None
    elif offset is not None and offset < 0 and duration is None:
        offset = max([0, signal_duration + offset])
    # >= 0    | None
    elif offset is not None and offset >= 0 and duration is None:
        if np.isinf(offset):
            duration = 0
    # < 0    | > 0
    elif offset is not None and offset < 0 and duration is not None and duration > 0:
        if np.isinf(offset) and np.isinf(duration):
            offset = 0
            duration = None
        elif np.isinf(offset):
            duration = 0
        elif np.isinf(duration):
            duration = None
        else:
            offset = signal_duration + offset
            if offset < 0:
                duration = max([0, duration + offset])
            else:
                duration = min([duration, signal_duration - offset])
            offset = max([0, offset])
    # < 0    | < 0
    elif offset is not None and offset < 0 and duration is not None and duration < 0:
        if np.isinf(offset):
            duration = 0
        elif np.isinf(duration):
            duration = -signal_duration
        else:
            orig_offset = offset
            offset = max([0, signal_duration + offset + duration])
            duration = min([-duration, signal_duration + orig_offset])
            duration = max([0, duration])

    # Convert to samples
    #
    # Handle duration first
    # and returned immediately
    # if duration == 0
    if duration is not None and duration != 0:
        duration = audmath.samples(duration, sampling_rate)
    if duration == 0:
        from audiofile.core.info import channels as get_channels

        channels = get_channels(file)
        if channels > 1 or always_2d:
            signal = np.zeros((channels, 0))
        else:
            signal = np.zeros((0,))
        return signal, sampling_rate
    if offset is not None and offset != 0:
        offset = audmath.samples(offset, sampling_rate)
    else:
        offset = 0


    start = offset
    # duration == 0 is handled further above with immediate return
    if duration is not None:
        stop = duration + start

    return np.expand_dims(file[0, start:stop], 0)

python numpy audio signal-processing soundfile
1个回答
0
投票

你的代码归结为

    return np.expand_dims(file[0, start:stop], 0)

这是正确的。

因此,如果您对结果不满意, 这是由于计算了错误的

(start, stop)
对, 也就是说,错误的
(offset, duration)
对。

采样率显然固定在

16_000
每秒的样本数。 通道数可以是
1
2
,这看起来令人担忧。

有大量的可选行为 与

offset
duration
参数相关。 摆脱它。 专注于编写一个接受的simple助手 一个偏移量always是一个非负整数, 持续时间“始终”为正整数。 使用 assert
raise
使得
None
或负数 会因致命错误而爆炸。
接下来,关注始终具有以下特征的音频片段:
相同数量的通道。

到那时,做对事情就不难了。

© www.soinside.com 2019 - 2024. All rights reserved.