我正在尝试找出使用哪种技术来根据音频突出显示文本。就像
https://speechify.com/
所做的那样。
这是假设我能够运行 TTS 算法并且能够将文本转换为语音。 我尝试了多种来源,但无法确定在音频说话时突出显示文本的确切技术或方法。
任何帮助将不胜感激。我已经在互联网上浪费了两天时间来解决这个问题,但没有运气:(
一个简单的方法是使用 SpeechSynthesisUtterance 边界事件提供的事件监听器来使用普通 JS 突出显示单词。发出的事件为我们提供了字符索引,因此无需疯狂使用正则表达式或超级人工智能的东西:)
首先,请确保 API 可用
const synth = window.speechSynthesis
if (!synth) {
console.error('no tts for you!')
return
}
tts 话语会发出“边界”事件,我们可以用它来突出显示文本。
let text = document.getElementById('text')
let originalText = text.innerText
let utterance = new SpeechSynthesisUtterance(originalText)
utterance.addEventListener('boundary', event => {
const { charIndex, charLength } = event
text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
})
synth.speak(utterance)
完整示例:
const btn = document.getElementById("btn")
const highlight = (text, from, to) => {
let replacement = highlightBackground(text.slice(from, to))
return text.substring(0, from) + replacement + text.substring(to)
}
const highlightBackground = sample => `<span style="background-color:yellow;">${sample}</span>`
btn && btn.addEventListener('click', () => {
const synth = window.speechSynthesis
if (!synth) {
console.error('no tts')
return
}
let text = document.getElementById('text')
let originalText = text.innerText
let utterance = new SpeechSynthesisUtterance(originalText)
utterance.addEventListener('boundary', event => {
const { charIndex, charLength } = event
text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
})
synth.speak(utterance)
})
这是非常基本的,您可以(并且应该)改进它。
糟糕,我忘记了它被标记为 ReactJs。这是 React 的相同示例(codesandbox 链接在评论中):
import React from "react";
const ORIGINAL_TEXT =
"Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.";
const splitText = (text, from, to) => [
text.slice(0, from),
text.slice(from, to),
text.slice(to)
];
const HighlightedText = ({ text, from, to }) => {
const [start, highlight, finish] = splitText(text, from, to);
return (
<p>
{start}
<span style={{ backgroundColor: "yellow" }}>{highlight}</span>
{finish}
</p>
);
};
export default function App() {
const [highlightSection, setHighlightSection] = React.useState({
from: 0,
to: 0
});
const handleClick = () => {
const synth = window.speechSynthesis;
if (!synth) {
console.error("no tts");
return;
}
let utterance = new SpeechSynthesisUtterance(ORIGINAL_TEXT);
utterance.addEventListener("boundary", (event) => {
const { charIndex, charLength } = event;
setHighlightSection({ from: charIndex, to: charIndex + charLength });
});
synth.speak(utterance);
};
return (
<div className="App">
<HighlightedText text={ORIGINAL_TEXT} {...highlightSection} />
<button onClick={handleClick}>klik me</button>
</div>
);
}
tts-react 提供了一个钩子
useTts
,它接受 markTextAsSpoken
参数,该参数将突出显示正在说出的单词。
这是一个例子:
import { useTts } from 'tts-react'
const TTS = ({ children }) => {
const { ttsChildren, play } = useTts({ children, markTextAsSpoken: true })
return (
<div>
<button onClick={play}>
Click to hear the text spoken
</button>
{ttsChildren}
</div>
)
}
const App = () => {
return <TTS>Some text to be spoken.</TTS>
}
您也可以从 CDN 加载它:
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>tts-react UMD example</title>
<script src="https://unpkg.com/react@18/umd/react.development.js"></script>
<script src="https://unpkg.com/react-dom@18/umd/react-dom.development.js"></script>
<script src="https://unpkg.com/@babel/standalone/babel.min.js"></script>
<script src="https://unpkg.com/[email protected]/dist/umd/tts-react.min.js"></script>
</head>
<body>
<div id="root"></div>
<script type="text/babel">
const root = ReactDOM.createRoot(document.getElementById('root'))
const { TextToSpeech, useTts } = TTSReact
const CustomTTS = ({ children }) => {
const { play, ttsChildren } = useTts({ children, markTextAsSpoken: true })
return (
<>
<button onClick={() => play()}>Play</button>
<div>{ttsChildren}</div>
</>
)
}
root.render(
<>
<CustomTTS>
<p>Highlight words as they are spoken.</p>
</CustomTTS>
<TextToSpeech markTextAsSpoken>
<p>Highlight words as they are spoken.</p>
</TextToSpeech>
</>
)
</script>
</body>
</html>
最近,我想在网络上实现文本转语音。然后我做研究:
当我们使用内置的浏览器网络语音合成 API 时,它是免费的,但是,它会带来各种问题。像机器人一样的声音、拼写错误等等。查看所有问题
我们可以通过使用音频文件来实现良好的人声。但是,当我们想要使用音频文件执行 TTS 时,我们需要这样的转录时间戳:
[
{
text: "hello world",
start: 0
end: 1.2
}
]
成绩单时间戳生成理论上需要机器学习。
然后我决定制作React / Vanilla Speechhighlight npm 包可以将 TTS 与 Web 语音合成 API 和使用音频文件的 TTS 的所有优点结合起来。
您可以使用各种语音合成 API 提供商(如 ElevenLabs、Google Cloud TTS、Amazon Polly 和 Open AI)制作音频文件。
使用音频文件执行 TTS 的转录时间戳怎么样?
我制作了转录时间戳检测引擎。所以我的包可以读取音频并生成转录时间戳。
在这里演示网站您可以尝试该功能。
文件
App.css
.highlight-spoken {
color: black !important;
background-color: #ff6f00 !important;
border-radius: 5px;
}
.highlight-sentence {
color: #000000 !important;
background-color: #ffe082 !important;
border-radius: 5px;
}
文件
App.js
import "./App.css";
import { useEffect, useMemo, useRef, useState } from "react";
import { markTheWords, useTextToSpeech } from "react-speech-highlight";
export default function App() {
const text = "Some Input String";
const textEl = useRef();
const lang = "en-US";
const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech({
disableSentenceHL: false,
disableWordHL: false,
autoScroll: false,
lang: lang,
});
const textHL = useMemo(() => markTheWords(text), [text]);
return (
<>
<div ref={textEl}>
<div
dangerouslySetInnerHTML={{
__html: textHL,
}}
></div>
</div>
<PanelControlTTS
isPlay={statusHL == "play" || statusHL == "calibration"}
play={() => {
if (statusHL == "pause") {
controlHL.resume();
} else {
controlHL.play(
textEl.current
);
}
}}
pause={controlHL.pause}
stop={controlHL.stop}
/>
</>
);
}
文件
PanelControlTTS.js
export default function PanelControlTTS({ isPlay, play, pause, stop }) {
return (
<>
<button
onClick={() => {
if (isPlay) {
pause();
} else {
play();
}
}}
>
{isPlay ? "pause" : "play"}
</button>
{isPlay && <button onClick={stop}>stop</button>}
</>
);
}