Azure 语音分类无法正确标记说话者,直到说出 7 秒长的语句为止

问题描述 投票:0回答:1

用于分类的 Azure 语音私人预览版之前设置了“未知”演讲者标签,直到它识别出演讲者长达 7 秒的声明,而公共预览版中的 api 则开始标记 guest-n,这带来了准确性问题,即使 guest-1检测到并收到短句子,它会被标记为 guest-2,直到 guest-2 说出长句子,同样

是否有解决方案可以恢复私人预览行为?

是否有解决方案可以恢复私人预览行为?

根据文档,他们仍然说它将把较短的句子标记为未知

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization?tabs=windows&pivots=programming-language-csharp

使用的sdk版本 实施组:'com.microsoft.cognitiveservices.speech',名称:'client-sdk',版本:'1.34.0'

speech-to-text azure-speech diarization speaker-diarization
1个回答
0
投票

二值化被描述为根据每个片段中说话者的身份将包含多个说话者的音频分割成离散语音片段的过程。

  • 对于理解语音识别管道中的“谁在何时说话”至关重要。

注意: 实时二值化目前处于公共预览版。

  • 这强调了日记化在各种场景中的重要性,包括播客会议、呼叫中心通话、医患互动和团队会议。
  • 它指出,二值化对于为下游 NLP 系统提供上下文至关重要,因为它可以对对话进行建模。
  • 代码取自实时二值化git
    private static String speechKey = "SPEECH_KEY";
    private static String speechRegion = "SPEECH_REGION";

    public static void main(String[] args) throws InterruptedException, ExecutionException {
        
        SpeechConfig speechConfig = SpeechConfig.fromSubscription(speechKey, speechRegion);
        speechConfig.setSpeechRecognitionLanguage("en-US");
        AudioConfig audioInput = AudioConfig.fromWavFileInput("katiesteve.wav");
        
        Semaphore stopRecognitionSemaphore = new Semaphore(0);

        ConversationTranscriber conversationTranscriber = new ConversationTranscriber(speechConfig, audioInput);
        {
            // Subscribes to events.
            conversationTranscriber.transcribing.addEventListener((s, e) -> {
                System.out.println("TRANSCRIBING: Text=" + e.getResult().getText());
            });

            conversationTranscriber.transcribed.addEventListener((s, e) -> {
                if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
                    System.out.println("TRANSCRIBED: Text=" + e.getResult().getText() + " Speaker ID=" + e.getResult().getSpeakerId() );
                }
                else if (e.getResult().getReason() == ResultReason.NoMatch) {
                    System.out.println("NOMATCH: Speech could not be transcribed.");
                }
            });

            conversationTranscriber.canceled.addEventListener((s, e) -> {
                System.out.println("CANCELED: Reason=" + e.getReason());

                if (e.getReason() == CancellationReason.Error) {
                    System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
                    System.out.println("CANCELED: Did you update the subscription info?");
                }

                stopRecognitionSemaphore.release();
            });

            conversationTranscriber.sessionStarted.addEventListener((s, e) -> {
                System.out.println("\n    Session started event.");
            });

            conversationTranscriber.sessionStopped.addEventListener((s, e) -> {
                System.out.println("\n    Session stopped event.");
            });

            conversationTranscriber.startTranscribingAsync().get();

            // Waits for completion.
            stopRecognitionSemaphore.acquire();

            conversationTranscriber.stopTranscribingAsync().get();
        }

        speechConfig.close();
        audioInput.close();
        conversationTranscriber.close();

        System.exit(0);
    }


输出:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.