如何从表示音频的 numpy 数组中提取持续时间和偏移量？

Question

我当前正在运行一个脚本，在其中获取整个音频文件并使用 Python 中的

audiofile

库（反过来，使用

soundfile

库）保存它。

我试图模仿

audiofile.read()

的行为，我给它一个偏移量和持续时间（以秒为单位），并且只返回该特定声音间隔的相应 numpy 数组。这里唯一的区别是，我已经将整个音频文件作为 numpy 数组，并且需要从中提取正确的开始和结束间隔，而不是像库要求的那样接收

.wav

文件。

我尝试复制计算开始和结束的逻辑，然后从

sound_file[start:end]

中切片numpy数组，但这似乎不起作用。我不太熟悉信号处理如何处理音频文件，所以我在这里有点不知所措，任何帮助将不胜感激！

这是我的代码：

我希望它接受一个 numpy 数组，并返回相同的 numpy 数组，切片后仅包含指定的开始时间 + 持续时间。我加载的所有文件最初都是 96KHz，被重新采样到 16KHz 并保存为 numpy 数组。


from audiofile.core.utils import duration_in_seconds
import audmath

def read_from_np(
    file,
    duration,
    offset,
    sampling_rate = 16000
):

    if duration is not None:
        duration = duration_in_seconds(duration, sampling_rate)
        if np.isnan(duration):
            duration = None
    if offset is not None and offset != 0:
        offset = duration_in_seconds(offset, sampling_rate)
        if np.isnan(offset):
            offset = None

    # Support for negative offset/duration values
    # by counting them from end of signal
    #
    if offset is not None and offset < 0 or duration is not None and duration < 0:
        # Import duration here to avoid circular imports
        from audiofile.core.info import duration as get_duration

        signal_duration = get_duration(file)
    # offset | duration
    # None   | < 0
    if offset is None and duration is not None and duration < 0:
        offset = max([0, signal_duration + duration])
        duration = None
    # None   | >= 0
    if offset is None and duration is not None and duration >= 0:
        if np.isinf(duration):
            duration = None
    # >= 0   | < 0
    elif offset is not None and offset >= 0 and duration is not None and duration < 0:
        if np.isinf(offset) and np.isinf(duration):
            offset = 0
            duration = None
        elif np.isinf(offset):
            duration = 0
        else:
            if np.isinf(duration):
                offset = min([offset, signal_duration])
                duration = np.sign(duration) * signal_duration
            orig_offset = offset
            offset = max([0, offset + duration])
            duration = min([-duration, orig_offset])
    # >= 0   | >= 0
    elif offset is not None and offset >= 0 and duration is not None and duration >= 0:
        if np.isinf(offset):
            duration = 0
        elif np.isinf(duration):
            duration = None
    # < 0    | None
    elif offset is not None and offset < 0 and duration is None:
        offset = max([0, signal_duration + offset])
    # >= 0    | None
    elif offset is not None and offset >= 0 and duration is None:
        if np.isinf(offset):
            duration = 0
    # < 0    | > 0
    elif offset is not None and offset < 0 and duration is not None and duration > 0:
        if np.isinf(offset) and np.isinf(duration):
            offset = 0
            duration = None
        elif np.isinf(offset):
            duration = 0
        elif np.isinf(duration):
            duration = None
        else:
            offset = signal_duration + offset
            if offset < 0:
                duration = max([0, duration + offset])
            else:
                duration = min([duration, signal_duration - offset])
            offset = max([0, offset])
    # < 0    | < 0
    elif offset is not None and offset < 0 and duration is not None and duration < 0:
        if np.isinf(offset):
            duration = 0
        elif np.isinf(duration):
            duration = -signal_duration
        else:
            orig_offset = offset
            offset = max([0, signal_duration + offset + duration])
            duration = min([-duration, signal_duration + orig_offset])
            duration = max([0, duration])

    # Convert to samples
    #
    # Handle duration first
    # and returned immediately
    # if duration == 0
    if duration is not None and duration != 0:
        duration = audmath.samples(duration, sampling_rate)
    if duration == 0:
        from audiofile.core.info import channels as get_channels

        channels = get_channels(file)
        if channels > 1 or always_2d:
            signal = np.zeros((channels, 0))
        else:
            signal = np.zeros((0,))
        return signal, sampling_rate
    if offset is not None and offset != 0:
        offset = audmath.samples(offset, sampling_rate)
    else:
        offset = 0


    start = offset
    # duration == 0 is handled further above with immediate return
    if duration is not None:
        stop = duration + start

    return np.expand_dims(file[0, start:stop], 0)

Answer 1

你的代码归结为

    return np.expand_dims(file[0, start:stop], 0)

这是正确的。

因此，如果您对结果不满意，这是由于计算了错误的

(start, stop)

对，也就是说，错误的

(offset, duration)

对。

采样率显然固定在

16_000

每秒的样本数。通道数可以是

或

，这看起来令人担忧。

有大量的可选行为与

offset

和

duration

参数相关。摆脱它。专注于编写一个接受的simple助手一个偏移量always是一个非负整数，持续时间“始终”为正整数。使用 assert 或

raise

使得

None

或负数会因致命错误而爆炸。

接下来，关注始终具有以下特征的音频片段：相同数量的通道。

到那时，做对事情就不难了。

如何从表示音频的 numpy 数组中提取持续时间和偏移量？

问题描述投票：0回答：1

1个回答

最新问题

如何从表示音频的 numpy 数组中提取持续时间和偏移量？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1