往下拉回到首頁
串流 TTS 的文本正規化問題被嚴重忽視,這會影響你聽到的語音品質,但幾乎沒人在討論

串流 TTS 的文本正規化問題被嚴重忽視,這會影響你聽到的語音品質,但幾乎沒人在討論

I can't believe text normalization is so underdiscussed in streaming text-to-speech [D]

Kinda suprises me how little discussion there is around about mistakes in streaming TTS models People look for natural readers, high voice quality, expressive speech. And most models don't look dumb here and fail. They fail when you give them basic stuff like price, dates, URLs, promo codes, phone numbers. So I was looking for some info and found a benchmark that compares commercial real time streaming TTS models in terms of how they pronounce dates, URLs, acronyms, etc. They are checking 1000