如何在音频叙述时根据网站上的音频实时突出显示文本

Question

我正在尝试找出使用哪种技术来根据音频突出显示文本。就像

https://speechify.com/

所做的那样。

这是假设我能够运行 TTS 算法并且能够将文本转换为语音。我尝试了多种来源，但无法确定在音频说话时突出显示文本的确切技术或方法。

任何帮助将不胜感激。我已经在互联网上浪费了两天时间来解决这个问题，但没有运气:(

Answer 1

一个简单的方法是使用 SpeechSynthesisUtterance 边界事件提供的事件监听器来使用普通 JS 突出显示单词。发出的事件为我们提供了字符索引，因此无需疯狂使用正则表达式或超级人工智能的东西:)

首先，请确保 API 可用

const synth = window.speechSynthesis
if (!synth) {
  console.error('no tts for you!')
  return
}

tts 话语会发出“边界”事件，我们可以用它来突出显示文本。

let text = document.getElementById('text')
let originalText = text.innerText
let utterance = new SpeechSynthesisUtterance(originalText)
utterance.addEventListener('boundary', event => {
  const { charIndex, charLength } = event
  text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
})
synth.speak(utterance)

完整示例：

const btn = document.getElementById("btn")

const highlight = (text, from, to) => {
  let replacement = highlightBackground(text.slice(from, to))
  return text.substring(0, from) + replacement + text.substring(to)
}
const highlightBackground = sample => `<span style="background-color:yellow;">${sample}</span>`

btn && btn.addEventListener('click', () => {
  const synth = window.speechSynthesis
  if (!synth) {
    console.error('no tts')
    return
  }
  let text = document.getElementById('text')
  let originalText = text.innerText
  let utterance = new SpeechSynthesisUtterance(originalText)
  utterance.addEventListener('boundary', event => {
    const { charIndex, charLength } = event
    text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
   })
  synth.speak(utterance)
})

CodeSandbox 链接

这是非常基本的，您可以（并且应该）改进它。

编辑

糟糕，我忘记了它被标记为 ReactJs。这是 React 的相同示例（codesandbox 链接在评论中）：

import React from "react";

const ORIGINAL_TEXT =
  "Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.";

const splitText = (text, from, to) => [
  text.slice(0, from),
  text.slice(from, to),
  text.slice(to)
];

const HighlightedText = ({ text, from, to }) => {
  const [start, highlight, finish] = splitText(text, from, to);
  return (
    <p>
      {start}
      <span style={{ backgroundColor: "yellow" }}>{highlight}</span>
      {finish}
    </p>
  );
};

export default function App() {
  const [highlightSection, setHighlightSection] = React.useState({
    from: 0,
    to: 0
  });
  const handleClick = () => {
    const synth = window.speechSynthesis;
    if (!synth) {
      console.error("no tts");
      return;
    }

    let utterance = new SpeechSynthesisUtterance(ORIGINAL_TEXT);
    utterance.addEventListener("boundary", (event) => {
      const { charIndex, charLength } = event;
      setHighlightSection({ from: charIndex, to: charIndex + charLength });
    });
    synth.speak(utterance);
  };

  return (
    <div className="App">
      <HighlightedText text={ORIGINAL_TEXT} {...highlightSection} />
      <button onClick={handleClick}>klik me</button>
    </div>
  );
}

Answer 2

tts-react 提供了一个钩子

useTts

，它接受

markTextAsSpoken

参数，该参数将突出显示正在说出的单词。

这是一个例子：

import { useTts } from 'tts-react'

const TTS = ({ children }) => {
  const { ttsChildren, play } = useTts({ children, markTextAsSpoken: true })

  return (
    <div>
      <button onClick={play}>
        Click to hear the text spoken
      </button>
      {ttsChildren}
    </div>

  )
}

const App = () => {
  return <TTS>Some text to be spoken.</TTS>
}

您也可以从 CDN 加载它：

<!DOCTYPE html>
<html lang="en-US">
  <head>
    <title>tts-react UMD example</title>
    <script src="https://unpkg.com/react@18/umd/react.development.js"></script>
    <script src="https://unpkg.com/react-dom@18/umd/react-dom.development.js"></script>
    <script src="https://unpkg.com/@babel/standalone/babel.min.js"></script>
    <script src="https://unpkg.com/[email protected]/dist/umd/tts-react.min.js"></script>
  </head>
  <body>
    <div id="root"></div>
    <script type="text/babel">
      const root = ReactDOM.createRoot(document.getElementById('root'))
      const { TextToSpeech, useTts } = TTSReact
      const CustomTTS = ({ children }) => {
        const { play, ttsChildren } = useTts({ children, markTextAsSpoken: true })

        return (
          <>
            <button onClick={() => play()}>Play</button>
            <div>{ttsChildren}</div>
          </>
        )
      }

      root.render(
        <>
          <CustomTTS>
            <p>Highlight words as they are spoken.</p>
          </CustomTTS>
          <TextToSpeech markTextAsSpoken>
            <p>Highlight words as they are spoken.</p>
          </TextToSpeech>
        </>
      )
    </script>
  </body>
</html>

Answer 3

最近，我想在网络上实现文本转语音。然后我做研究：

查看我对此的完整研究

这就是我得到的：

使用网络语音合成

当我们使用内置的浏览器网络语音合成 API 时，它是免费的，但是，它会带来各种问题。像机器人一样的声音、拼写错误等等。查看所有问题

使用音频文件

我们可以通过使用音频文件来实现良好的人声。但是，当我们想要使用音频文件执行 TTS 时，我们需要这样的转录时间戳：

[
  {
    text: "hello world",
    start: 0
    end: 1.2
  }
]

成绩单时间戳生成理论上需要机器学习。

解决方案

然后我决定制作React / Vanilla Speechhighlight npm 包可以将 TTS 与 Web 语音合成 API 和使用音频文件的 TTS 的所有优点结合起来。

您可以使用各种语音合成 API 提供商（如 ElevenLabs、Google Cloud TTS、Amazon Polly 和 Open AI）制作音频文件。

使用音频文件执行 TTS 的转录时间戳怎么样？

我制作了转录时间戳检测引擎。所以我的包可以读取音频并生成转录时间戳。

在这里演示网站您可以尝试该功能。

查看我对此的完整研究

示例代码

设置荧光笔样式

文件

App.css

.highlight-spoken {
  color: black !important;
  background-color: #ff6f00 !important;
  border-radius: 5px;
}

.highlight-sentence {
  color: #000000 !important;
  background-color: #ffe082 !important;
  border-radius: 5px;
}

代码示例

文件

App.js

import "./App.css";
import { useEffect, useMemo, useRef, useState } from "react";
import { markTheWords, useTextToSpeech } from "react-speech-highlight";

export default function App() {
  const text = "Some Input String";
  const textEl = useRef();
  const lang = "en-US";

  const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech({
    disableSentenceHL: false,
    disableWordHL: false,
    autoScroll: false,
    lang: lang,
  });

  const textHL = useMemo(() => markTheWords(text), [text]);

  return (
    <>
      <div ref={textEl}>
        <div
          dangerouslySetInnerHTML={{
            __html: textHL,
          }}
        ></div>
      </div>

      <PanelControlTTS
        isPlay={statusHL == "play" || statusHL == "calibration"}
        play={() => {
          if (statusHL == "pause") {
            controlHL.resume();
          } else {
            controlHL.play(
              textEl.current
            );
          }
        }}
        pause={controlHL.pause}
        stop={controlHL.stop}
      />
    </>
  );
}

TTS 控制示例

文件

PanelControlTTS.js

export default function PanelControlTTS({ isPlay, play, pause, stop }) {
  return (
    <>
      <button
        onClick={() => {
          if (isPlay) {
            pause();
          } else {
            play();
          }
        }}
      >
        {isPlay ? "pause" : "play"}
      </button>

      {isPlay && <button onClick={stop}>stop</button>}
    </>
  );
}

如何在音频叙述时根据网站上的音频实时突出显示文本

问题描述投票：0回答：3

3个回答

编辑

这就是我得到的：

使用网络语音合成

使用音频文件

解决方案

示例代码

设置荧光笔样式

代码示例

TTS 控制示例

最新问题

如何在音频叙述时根据网站上的音频实时突出显示文本

问题描述 投票：0回答：3

3个回答

编辑

这就是我得到的：

使用网络语音合成

使用音频文件

解决方案

示例代码

设置荧光笔样式

代码示例

TTS 控制示例

最新问题

问题描述投票：0回答：3