如何使用 YAMNet TensorFlow lite 模型和给定的音频剪辑进行声音分类

Question

我正在尝试开发 Android 应用程序，该应用程序采用音频剪辑并使用 YAMNet 模型对此音频进行分类

https://tfhub.dev/google/lite-model/yamnet/classification/tflite/1

在我对此进行研究的过程中，我找到了这个解决方案：

添加这些依赖项

// to run yamnet.tflite model
implementation 'org.tensorflow:tensorflow-lite-task-audio:0.2.0'
// prepare input file for model
implementation("com.google.guava:guava:31.0.1-android")
implementation 'com.arthenica:ffmpeg-kit-full:4.5'

运行此代码以准备和处理输入 .wav 文件

val srcFile =File("src_file_path")
// load and prepare model
val classifier = AudioClassifier.createFromFile(this, MODEL_FILE)
val audioTensor = classifier.createInputTensorAudio()
// temp file
val tempFile = File.createTempFile(System.currentTimeMillis().toString(),".wav")

if (!tempFile.exists()) {
tempFile.createNewFile()
}
// make input file required input for model model
FFmpegKit.execute("-i $srcFile -ar 16000 -ac 1 -y ${tempFile.absolutePath}")

val musicList = ArrayList<Short>()
val dis = LittleEndianDataInputStream(FileInputStream(tempFile))
while (true) {
try {
val d = dis.readShort()
musicList.add(d)
} catch (e: EOFException) {
break
}
}
// The input must be normalized to floats between -1 and 1.
// To normalize it, we just need to divide all the values by 2**16 or in our 
//code, MAX_ABS_INT16 = 32768
val floatsForInference = FloatArray(musicList.size)
for ((index, value) in musicList.withIndex()) {
floatsForInference[index] = (value / 32768F)
}
audioTensor.load(floatsForInference)
val output = classifier.classify(audioTensor)

我尝试过这个解决方案。但是，我每次得到的输出（类别：silence，占 80%），这意味着它没有对给定的输入音频进行分类或识别。

例如，如果我使用此音频剪辑作为输入，则输出（类别）预计为 cough，而不是 silence ： https://storage.googleapis.com/audioset/yamalyzer/audio/cough.wav

如何使用代码解决问题？

Answer 1

我没有时间测试完整的代码，但我认为正在发生的是，yamnet 对于输入音频文件有一个非常具体的配置（例如：比特率、采样率）。通常，如果做得不正确，它会给出一些随机结果。

我建议您按照本教程进行操作：https://www.tensorflow.org/lite/examples/audio_classification/overview

它使用了 TFLite 任务库，这可以正确完成所有转换，您不需要自己转换音频。

测试模型的另一个很酷的资源是这个：https://www.tensorflow.org/hub/tutorials/yamnet

我知道它是 Python（不是 Kotlin），但它很简单，替换音频文件也很容易，并且可以让您了解模型所看到的内容

我希望这有帮助

如何使用 YAMNet TensorFlow lite 模型和给定的音频剪辑进行声音分类

问题描述投票：0回答：1

1个回答

最新问题

如何使用 YAMNet TensorFlow lite 模型和给定的音频剪辑进行声音分类

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1