pyttsx3 打印当前说出的单词

Question

我基本上希望 tts 说话的同时打印出它所说的内容。我几乎复制并粘贴了 pyttsx3 文档来执行此操作，但它不起作用。

import pyttsx3
def onStart(name):
   print ('starting', name)
def onWord(name, location, length):
   print ('word', name, location, length)
def onEnd(name, completed):
   print ('finishing', name, completed)
engine = pyttsx3.init()
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord)
engine.connect('finished-utterance', onEnd)
engine.say('The quick brown fox jumped over the lazy dog.')
engine.runAndWait()

结果就是这样的。单词事件仅在讲话完成后触发，并且实际上没有打印任何单词。

starting None
word None 1 0
finishing None True

iv 已经为此工作了好几天，iv 尝试了其他库，如 win32com.client.Dispatch('SAPI.Spvoice') 和 gtts，但似乎没有一个能够做到我想要的。 Sapi.spvoice 似乎有一个事件可以做我想要的事情，但我似乎也无法让它工作。虽然我也不确定我做得是否正确。 https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms723593(v=vs.85)

from win32com.client import Dispatch
import win32com.client

class ContextEvents():
    def onWord():
        print("the word event occured")
        
        # Work with Result
        
s = Dispatch('SAPI.Spvoice')
e = win32com.client.WithEvents(s, ContextEvents)
s.Speak('The quick brown fox jumped over the lazy dog.')

根据我的理解，需要有一个事件类，并且该类中的事件必须采用 On(event) 的形式。或者其他的东西。我尝试安装 espeak 但也没有成功。请记住，我是 python 的新手，所以如果有人愿意给出彻底的解释，那就太好了。

Answer 1

所以我不熟悉该库，但最有可能发生的情况是在事件能够传递到包装器库之前生成并播放流。我可以说，如果您想使用 AWS 的 Polly，它将输出字级计时信息 - 您需要两次调用 - 一个用于获取音频流，另一个用于获取 ssml 元数据。

Windows .net System.Speech.Synthesis 库确实有您可以监听的进度事件，但我不知道是否有一个 python 库来包装它。

但是，如果您愿意从 python 运行 powershell 命令，那么您可以尝试使用我写的this gist，它包装了 Windows 综合功能并输出单词计时。这是一个应该可以满足您需求的示例：

$text = "hello world! this is a long sentence with many words";
$sampleRate = 24000;

# generate tts and save bytes to memory (powershell variable)
# events holds event timings
# NOTE: assumes out-ssml-winrt.ps1 is in current directory, change as needed...
$events = .\out-ssml-winrt.ps1 $text -Variable 'soundstream' -SampleRate $sampleRate -Channels 1 -SpeechMarkTypes 'words';

# estimate duration based on samplerate (rough)
$estimatedDurationMilliseconds = $global:soundstream.Length / $sampleRate * 1000;

$global:e = $events;

# add a final event at the end of the loop to wait for audio to complete
$events += @([pscustomobject]@{ type = 'end'; time = $estimatedDurationMilliseconds; value = '' });
# create background player
$memstream = [System.IO.MemoryStream]::new($global:soundstream);
$player = [System.Media.SoundPlayer]::new($memstream)
$player.Play();

# loop through word events
$now = 0;
$events | % {
    $word = $_;
    # milliseconds into wav file event happens
    $when = $word.time;
    # distance from last timestamp to this event
    $delta = $when - $now;
    # wait until right time to display
    if ($delta -gt 0) {
        Start-sleep -Milliseconds $delta;
    }
    $now = $when;
    # output word
    Write-Output $word.value;
}
# just to let you know - audio should be finished
Write-Output "Playback Complete";
$player.Stop(); $player.Dispose(); $memstream.Dispose();

Answer 2

这里的问题是，

name

变量是字符串的名称，而不是字符串本身，而

name

默认为

None

。您可以通过将第二个变量传递给

engine.say()

来命名字符串。如果您将

name

设置为与字符串相同，则可以使用

location

和

length

变量将单词从字符串中提取出来。

import pyttsx3
def onStart(name):
   print ('starting', name)
def onWord(name, location, length):
   print ('word', name[location:location+length])
def onEnd(name, completed):
   print ('finishing', name, completed)
engine = pyttsx3.init()
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord)
engine.connect('finished-utterance', onEnd)
text = 'The quick brown fox jumped over the lazy dog.'
engine.say(text, text)
engine.runAndWait()

输出：

starting The quick brown fox jumped over the lazy dog.
word The
word quick
word brown
word fox
word jumped
word over
word the
word lazy
word dog
word .
finishing The quick brown fox jumped over the lazy dog. True

pyttsx3 打印当前说出的单词

问题描述投票：0回答：2

2个回答

最新问题

pyttsx3 打印当前说出的单词

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2