BTW I was really impressed by the results of F5-TTS. The thing I liked best was the "Tagged" TTS, where you can specify a tag to use different tones of your own voice, like
{Angry}What have you done?
{Suprised}Me, I did nothing?
{Shouting}Who else do you think I'm talking to?
{Sad}Why are you always shouting at me?
I wonder if this would also work for "Character" tags, like {Susan}How was your day?
{Peter}I had a great day.
That would open great new ways of having audio books read by cloned voices - switching between characters with the same voice like often done by the real narrators
This feature also greatly interests me, although I'm looking for a system that would allow to slightly alter the pronunciation of individual words. Is anyone aware of such a system?
Especially with TTS in a language other than English (but also with English), the pronunciation of certain words is sometimes jarringly wrong. Until TTS systems can compensate for this themselves, it would be great if it were possible for humans to use such tags to hint the system to pronounce better. Even if you can't specify the exact correction, but the TTS would just generate a 'different' sound, that could help.
Are you not looking for ssml with ipa tag? I think you might be. It’s part of all your standard OS tts - including espeak-ng on Linux. Also in Google cloud, azure, Watson, and Amazon Polly voices.
I didn't know it existed... Thank you very much
Features like artificial breathing, slightly different pronounciation and other "features" are only available in commercial systems... unfortunately I don't remember the name or the video I saw about these, because I'm not interested in non FOSS stuff for my personal projects.
IMHO this should work (in english or chinese). Here i show how it sounds with different tags (in this case emotions and not characters): https://youtu.be/ASFoTNpkM8o?t=27
Here's how it's done: https://youtu.be/ASFoTNpkM8o?t=992
Hey, thorsten-voice himself. Thank you for your contribution to the community. I'm a happy follower of your content.
Can't wait F5-tts to support the german language. Do you know wether this is planned in the near future?
You're very welcome. On f5 github repo is an active discussion (i'm involved too) on supporting other languages including german: https://github.com/SWivid/F5-TTS/issues/87#issuecomment-2418...