我正在开发一个 Android 项目,我需要从音频缓冲区原始数据或存储的 wav 文件中将语音转换为文本。在安卓上可以做到这一点吗?更具体地说,我从这里获取音频缓冲区
record.read(audioBuffer, 0, audioBuffer.length);
我处理音频缓冲区并将其存储为波形文件。我需要将处理后的音频缓冲区转换为文本,或者在将音频缓冲区文件保存为波形文件后,我可以使用谷歌离线语音到文本选项将 wav 转换为文本。请让我知道我该怎么做。我在这里看到过其他线程,但它们都很旧了。 (比如4、6、7岁...)
我遇到了谷歌的cable voice API,它可以将原始音频文件作为输入并执行异步语音识别。我的应用程序开发经验和 java 经验有限。 https://cloud.google.com/speech/docs/async-recognize此链接显示了如何操作,这里是一些拉长的源代码https://github.com/GoogleCloudPlatform/java-docs-samples/blob/ master/speech/cloud-client/src/main/java/com/example/speech/QuickstartSample.java。但问题是,当我将以下导入语句添加到 android studio mainactivity.java 中的应用程序代码时,它会变灰,有些则标记为红色。
import com.google.cloud.speech.v1.RecognitionAudio;
import com.google.cloud.speech.v1.RecognitionConfig;
import com.google.cloud.speech.v1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1.RecognizeResponse;
import com.google.cloud.speech.v1.SpeechClient;
import com.google.cloud.speech.v1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1.SpeechRecognitionResult;
import com.google.protobuf.ByteString;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
自 Android 13 起,SpeechRecognizer 可以接受文件或实时 PCM 数据作为输入。 我设法编写了一个项目来实现上述目标。
有一个技巧,音频源的采样率似乎不适用于每个速率。例如,我用 22050hz 录制了一个 PCM 剪辑,但如果我将 EXTRA_AUDIO_SOURCE_SAMPLING_RATE 设置为 22050,SpeechRecognizer 将失败。更改为16000和24000,可以识别相同的音频片段。
这是我的测试项目的工作原理。 我省略了RECORDING_AUDIO权限部分,只需在第一次崩溃后在Android手机设置中打开该权限即可:
第 0 部分。创建 PCM 原始文件,线性 16 位 Little Endian,我使用 22050hz 采样率。
第 1 部分:创建 AndroidStudio 项目。在清单中,在根标记末尾添加以下内容:
<manifest xmlns:...
...
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<queries>
<intent>
<action android:name="android.speech.RecognitionService" />
</intent>
</queries>
</manifest>
第 2 部分. 在 MainActivity 类中添加以下所有代码块。我。变量
// toggle either function of this sample project
// 1 for PCM file in res/raw
// 2 for real time PCM data from AudioRecord
static final int AUDIO_SOURCE_TYPE = 1;
android.speech.SpeechRecognizer speechRecognizer = null;
ParcelFileDescriptor[] m_audioPipe;
ParcelFileDescriptor mExtraAudioPFD;
ParcelFileDescriptor.AutoCloseOutputStream mOutputStream;
AudioRecord audioRec;
Thread m_hAutoRecordThread;
boolean m_bTerminateThread;
ii. SpeechRecognizer 生命周期的函数
@RequiresApi(api = Build.VERSION_CODES.TIRAMISU)
private final Intent createSpeechRecognizerIntent() {
final Intent speechRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS, 3000);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS, 6000);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS, 2000);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-US");
if (AUDIO_SOURCE_TYPE == 1) {
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE, mExtraAudioPFD);
} else if (AUDIO_SOURCE_TYPE == 2) {
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE, m_audioPipe[0]);
}
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_CHANNEL_COUNT, 1);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_ENCODING, AudioFormat.ENCODING_PCM_16BIT);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_SAMPLING_RATE, 24000);
return speechRecognizerIntent;
}
protected void initRecognizer() {
speechRecognizer = android.speech.SpeechRecognizer.createSpeechRecognizer(this);
speechRecognizer.setRecognitionListener(new RecognitionListener() {
@Override public void onReadyForSpeech(Bundle bundle) { Log.i("recognizer", "onReadyForSpeech"); }
@Override public void onBeginningOfSpeech() { Log.i("recognizer", "onBeginningOfSpeech"); }
@Override public void onRmsChanged(float v) {
Log.i("onRmsChanged", "v = " + v);
}
@Override public void onBufferReceived(byte[] bytes) { ; }
@Override public void onEndOfSpeech() {
Log.i("recognizer", "onEndOfSpeech");
stopRecognizer();
}
@Override public void onError(int i) { Log.i("recognizer", "onError = " + i); }
@Override public void onResults(Bundle bundle) {
Log.i("recognizer", "onResults");
final ArrayList<String> data = bundle.getStringArrayList(android.speech.SpeechRecognizer.RESULTS_RECOGNITION);
if (data != null && data.size() > 0) {
String resultData = data.get(0);
Log.i("SpeechRecogn", "resultData = " + resultData + ", data.get(0) = " + data.get(0));
}
}
@Override public void onPartialResults(Bundle bundle) {
Log.i("recognizer", "onPartialResults");
final ArrayList<String> data = bundle.getStringArrayList(android.speech.SpeechRecognizer.RESULTS_RECOGNITION);
if (data != null && data.size() > 0) {
String resultData = data.get(0);
Log.i("SpeechRecogn", "resultData = " + resultData + ", data.get(0) = " + data.get(0));
}
}
@Override public void onEvent(int i, Bundle bundle) { Log.i("recognizer", "onEvent"); }
});
}
void stopRecognizer() {
m_bTerminateThread = true;
new Handler(Looper.getMainLooper()).post(new Runnable() {
@Override
public void run() {
if (speechRecognizer != null) {
speechRecognizer.stopListening();
try {
if (mOutputStream != null) {
mOutputStream.close();
mOutputStream = null;
}
} catch (IOException e) {
;
}
speechRecognizer = null;
}
}
});
}
iii. AudioRecord Thread,当您选择实时 PCM 数据时起作用
private class RecordingRunnable implements Runnable {
@Override
public void run() {
while (!m_bTerminateThread) {
short[] readBuf = new short[1024];
int readLength = audioRec.read(readBuf, 0, readBuf.length);
byte[] readBytes = ShortArrayToByteArray(readBuf);
try {
if (mOutputStream != null) {
mOutputStream.write(readBytes, 0, readBytes.length);
mOutputStream.flush();
}
} catch (IOException e) {
;
}
}
}
}
四。实用功能
protected byte[] ShortArrayToByteArray(short[] sa) {
byte[] ret = new byte[sa.length * 2];
ByteBuffer.wrap(ret).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(sa);
return ret;
}
// function referenced from
// [https://stackoverflow.com/questions/8664468/copying-raw-file-into-sdcard/46244121#46244121][1]
private String copyFiletoStorage(int resourceId, String resourceName){
String filePath = getFilesDir().getPath() + "/" + resourceName;
try{
InputStream in = getResources().openRawResource(resourceId);
FileOutputStream out = null;
out = new FileOutputStream(filePath);
byte[] buff = new byte[1024];
int read = 0;
try {
while ((read = in.read(buff)) > 0) {
out.write(buff, 0, read);
}
} finally {
in.close();
out.close();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return filePath;
}
v.主要功能在onStart()
@Override
protected void onStart() {
super.onStart();
if (AUDIO_SOURCE_TYPE == 1) {
try {
String testFilePath = copyFiletoStorage(R.raw.test, "test.pcm");
mExtraAudioPFD = ParcelFileDescriptor.open(new File(testFilePath), ParcelFileDescriptor.MODE_READ_ONLY);
} catch (FileNotFoundException e) {
mExtraAudioPFD = null;
}
} else if (AUDIO_SOURCE_TYPE == 2) {
try {
m_audioPipe = ParcelFileDescriptor.createPipe();
} catch (IOException e) {
finishAndRemoveTask();
}
mOutputStream = new ParcelFileDescriptor.AutoCloseOutputStream(m_audioPipe[1]);
}
initRecognizer();
if (AUDIO_SOURCE_TYPE == 2) {
try {
// omitted permission check and request
// need manually turn on AUDIO RECORDING PERMISSION to run this code
audioRec = new AudioRecord(MediaRecorder.AudioSource.DEFAULT, 22050, 1, AudioFormat.ENCODING_PCM_16BIT, 524288);
} catch (IllegalArgumentException e) {
Log.e("audioRec", "IllegalArgument");
} catch (SecurityException e) {
Log.e("audioRec", "SecurityException!");
} catch (Exception e) {
Log.e("audioRec", "any Exception");
}
m_bTerminateThread = false;
audioRec.startRecording();
m_hAutoRecordThread = new Thread(new RecordingRunnable(), "RecordingThread");
m_hAutoRecordThread.start();
}
final Intent speechRecognizerIntent = createSpeechRecognizerIntent();
speechRecognizer.startListening(speechRecognizerIntent);
if (AUDIO_SOURCE_TYPE == 2) {
new Timer().schedule(
new TimerTask() {
@Override
public void run() {
stopRecognizer();
}
}, 5000);
}
}