从 wav 文件或处理后的原始音频缓冲区进行语音识别

Question

我正在开发一个 Android 项目，我需要从音频缓冲区原始数据或存储的 wav 文件中将语音转换为文本。在安卓上可以做到这一点吗？更具体地说，我从这里获取音频缓冲区

record.read(audioBuffer, 0, audioBuffer.length);

我处理音频缓冲区并将其存储为波形文件。我需要将处理后的音频缓冲区转换为文本，或者在将音频缓冲区文件保存为波形文件后，我可以使用谷歌离线语音到文本选项将 wav 转换为文本。请让我知道我该怎么做。我在这里看到过其他线程，但它们都很旧了。（比如4、6、7岁...）

Answer 1

我遇到了谷歌的cable voice API，它可以将原始音频文件作为输入并执行异步语音识别。我的应用程序开发经验和 java 经验有限。 https://cloud.google.com/speech/docs/async-recognize此链接显示了如何操作，这里是一些拉长的源代码https://github.com/GoogleCloudPlatform/java-docs-samples/blob/ master/speech/cloud-client/src/main/java/com/example/speech/QuickstartSample.java。但问题是，当我将以下导入语句添加到 android studio mainactivity.java 中的应用程序代码时，它会变灰，有些则标记为红色。

import com.google.cloud.speech.v1.RecognitionAudio;
import com.google.cloud.speech.v1.RecognitionConfig;
import com.google.cloud.speech.v1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1.RecognizeResponse;
import com.google.cloud.speech.v1.SpeechClient;
import com.google.cloud.speech.v1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1.SpeechRecognitionResult;
import com.google.protobuf.ByteString;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

Answer 2

自 Android 13 起，SpeechRecognizer 可以接受文件或实时 PCM 数据作为输入。我设法编写了一个项目来实现上述目标。

有一个技巧，音频源的采样率似乎不适用于每个速率。例如，我用 22050hz 录制了一个 PCM 剪辑，但如果我将 EXTRA_AUDIO_SOURCE_SAMPLING_RATE 设置为 22050，SpeechRecognizer 将失败。更改为16000和24000，可以识别相同的音频片段。

这是我的测试项目的工作原理。我省略了RECORDING_AUDIO权限部分，只需在第一次崩溃后在Android手机设置中打开该权限即可：

第 0 部分。创建 PCM 原始文件，线性 16 位 Little Endian，我使用 22050hz 采样率。

第 1 部分：创建 AndroidStudio 项目。在清单中，在根标记末尾添加以下内容：

   <manifest xmlns:...
       ...
       <uses-permission android:name="android.permission.INTERNET" />
       <uses-permission android:name="android.permission.RECORD_AUDIO" />
       <queries>
           <intent>
               <action android:name="android.speech.RecognitionService" />
           </intent>
       </queries> 
   </manifest>

第 2 部分. 在 MainActivity 类中添加以下所有代码块。我。变量

   // toggle either function of this sample project
   // 1 for PCM file in res/raw
   // 2 for real time PCM data from AudioRecord
   static final int AUDIO_SOURCE_TYPE = 1; 
   android.speech.SpeechRecognizer speechRecognizer = null;
   ParcelFileDescriptor[] m_audioPipe;
   ParcelFileDescriptor mExtraAudioPFD;
   ParcelFileDescriptor.AutoCloseOutputStream mOutputStream;
   AudioRecord audioRec;
   Thread m_hAutoRecordThread;
   boolean m_bTerminateThread;

ii. SpeechRecognizer 生命周期的函数

@RequiresApi(api = Build.VERSION_CODES.TIRAMISU)
private final Intent createSpeechRecognizerIntent() {

    final Intent speechRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS, 3000);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS, 6000);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS, 2000);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-US");

    if (AUDIO_SOURCE_TYPE == 1) {
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE, mExtraAudioPFD);
    } else if (AUDIO_SOURCE_TYPE == 2) {
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE, m_audioPipe[0]);
    }
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_CHANNEL_COUNT, 1);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_ENCODING, AudioFormat.ENCODING_PCM_16BIT);
    speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_SAMPLING_RATE, 24000);

    return speechRecognizerIntent;
}

protected void initRecognizer() {

    speechRecognizer = android.speech.SpeechRecognizer.createSpeechRecognizer(this);
    speechRecognizer.setRecognitionListener(new RecognitionListener() {
        @Override public void onReadyForSpeech(Bundle bundle) { Log.i("recognizer", "onReadyForSpeech"); }
        @Override public void onBeginningOfSpeech() { Log.i("recognizer", "onBeginningOfSpeech"); }
        @Override public void onRmsChanged(float v) {
            Log.i("onRmsChanged", "v = " + v);
        }
        @Override public void onBufferReceived(byte[] bytes) { ; }
        @Override public void onEndOfSpeech() {
            Log.i("recognizer", "onEndOfSpeech");
            stopRecognizer();
        }
        @Override public void onError(int i) { Log.i("recognizer", "onError = " + i); }
        @Override public void onResults(Bundle bundle) {

            Log.i("recognizer", "onResults");
            final ArrayList<String> data = bundle.getStringArrayList(android.speech.SpeechRecognizer.RESULTS_RECOGNITION);

            if (data != null && data.size() > 0) {
                String resultData = data.get(0);
                Log.i("SpeechRecogn", "resultData = " + resultData + ", data.get(0) = " + data.get(0));
            }
        }
        @Override public void onPartialResults(Bundle bundle) {

            Log.i("recognizer", "onPartialResults");
            final ArrayList<String> data = bundle.getStringArrayList(android.speech.SpeechRecognizer.RESULTS_RECOGNITION);

            if (data != null && data.size() > 0) {
                String resultData = data.get(0);
                Log.i("SpeechRecogn", "resultData = " + resultData + ", data.get(0) = " + data.get(0));
            }
        }
        @Override public void onEvent(int i, Bundle bundle) { Log.i("recognizer", "onEvent"); }
    });
}

void stopRecognizer() {

    m_bTerminateThread = true;
    new Handler(Looper.getMainLooper()).post(new Runnable() {
        @Override
        public void run() {

            if (speechRecognizer != null) {
                speechRecognizer.stopListening();
                try {
                    if (mOutputStream != null) {
                        mOutputStream.close();
                        mOutputStream = null;
                    }
                } catch (IOException e) {
                    ;
                }
                speechRecognizer = null;
            }
        }
    });
}

iii. AudioRecord Thread，当您选择实时 PCM 数据时起作用

private class RecordingRunnable implements Runnable {

    @Override
    public void run() {
        while (!m_bTerminateThread) {

            short[] readBuf = new short[1024];
            int readLength = audioRec.read(readBuf, 0, readBuf.length);

            byte[] readBytes = ShortArrayToByteArray(readBuf);
            try {
                if (mOutputStream != null) {
                    mOutputStream.write(readBytes, 0, readBytes.length);
                    mOutputStream.flush();
                }
            } catch (IOException e) {
                ;
            }
        }
    }
}

四。实用功能

protected byte[] ShortArrayToByteArray(short[] sa) {
    byte[] ret = new byte[sa.length * 2];

    ByteBuffer.wrap(ret).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(sa);
    return ret;
}

// function referenced from
// [https://stackoverflow.com/questions/8664468/copying-raw-file-into-sdcard/46244121#46244121][1]
private String copyFiletoStorage(int resourceId, String resourceName){
    String filePath = getFilesDir().getPath() + "/" + resourceName;
    try{
        InputStream in = getResources().openRawResource(resourceId);
        FileOutputStream out = null;
        out = new FileOutputStream(filePath);
        byte[] buff = new byte[1024];
        int read = 0;
        try {
            while ((read = in.read(buff)) > 0) {
                out.write(buff, 0, read);
            }
        } finally {
            in.close();
            out.close();
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return filePath;
}

v.主要功能在onStart()

@Override
protected void onStart() {

    super.onStart();

    if (AUDIO_SOURCE_TYPE == 1) {
        try {
            String testFilePath = copyFiletoStorage(R.raw.test, "test.pcm");
            mExtraAudioPFD = ParcelFileDescriptor.open(new File(testFilePath), ParcelFileDescriptor.MODE_READ_ONLY);
        } catch (FileNotFoundException e) {
            mExtraAudioPFD = null;
        }
    } else if (AUDIO_SOURCE_TYPE == 2) {

        try {
            m_audioPipe = ParcelFileDescriptor.createPipe();
        } catch (IOException e) {
            finishAndRemoveTask();
        }

        mOutputStream = new ParcelFileDescriptor.AutoCloseOutputStream(m_audioPipe[1]);
    }

    initRecognizer();

    if (AUDIO_SOURCE_TYPE == 2) {
        try {
            // omitted permission check and request
            // need manually turn on AUDIO RECORDING PERMISSION to run this code
            audioRec = new AudioRecord(MediaRecorder.AudioSource.DEFAULT, 22050, 1, AudioFormat.ENCODING_PCM_16BIT, 524288);
        } catch (IllegalArgumentException e) {
            Log.e("audioRec", "IllegalArgument");
        } catch (SecurityException e) {
            Log.e("audioRec", "SecurityException!");
        } catch (Exception e) {
            Log.e("audioRec", "any Exception");
        }

        m_bTerminateThread = false;

        audioRec.startRecording();
        m_hAutoRecordThread = new Thread(new RecordingRunnable(), "RecordingThread");
        m_hAutoRecordThread.start();
    }

    final Intent speechRecognizerIntent = createSpeechRecognizerIntent();
    speechRecognizer.startListening(speechRecognizerIntent);

    if (AUDIO_SOURCE_TYPE == 2) {
        new Timer().schedule(
                new TimerTask() {

                    @Override
                    public void run() {

                        stopRecognizer();
                    }
                }, 5000);
    }
}

从 wav 文件或处理后的原始音频缓冲区进行语音识别

问题描述投票：0回答：2

2个回答

最新问题

从 wav 文件或处理后的原始音频缓冲区进行语音识别

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2