XTTSv2 is only slightly behind StyleTTS 2 near the top of the TTS Arena leaderboard, though they are both far behind Eleven Labs: https://huggingface.co/spaces/TTS-AGI/TTS-Arena
Personally I prefer StyleTTS 2, and it has a better license. But XTTSv2 has a streaming mode with pretty low latency which is nice. I did run into hallucination issues though. It will hallucinate nonsense words or insert extra syllables in words, pretty frequently.
As others mentioned they shut down so there won't be any updates to XTTS.
They just shared the paper for XTTS, which got accepted to Interspeech and might be the reason for this being posted now: https://arxiv.org/abs/2406.04904
Interesting. I got quite good results for my longform substack by combining xTTS2 with Nvidia's Nemo.
Anyone have a sense for how these compare to OpenAI’s TTS?
Somewhat unrelated, but given that anyone can vote anonymously, how is the TTS-Arena protecting itself against bots or even rings of humans gaming the system?
Low stakes, I guess
problem is that low stakes divided by low cost of bots is still an acceptable return.