Running fine-tuned XTTS models in the demo server #335

wpiman · 2025-03-08T16:07:44Z

wpiman
Mar 8, 2025

I have used Coqui in my announcement setup. (ex. cars coming up driveway and the like)

I get some models off of Hugging face, and then I use Coqui to have those models on a TTS server and then use the /api/tts= to read my announcements throughout the house.

I have downloaded several models... I have a David Attenbough working now and this guys Trump. Works great.

https://huggingface.co/enlyth/baj-tts/tree/main/models

I am using the CPU Docker version right now. Works fine and my announcements are not very long so time is not an issue.

To use my current voice, David Attenbough-- I downloaded the model and have my docker entry point be...

python TTS/server/server.py --model_path /models/david.pth --config_path /models/config.json

Lots of other pretrained voices are on hugging face and should be compatible but I cannot seem to get the server to use them.

I downloaded this voice to my models directory (under /morgan) and I have tried nearly every combination of the server command.

python TTS/server/server.py --model_path /models/morgan --config_path /models/morgan/config.json

This is the directory.

https://huggingface.co/drewThomasson/fineTunedTTSModels/tree/main/xtts-v2/eng/MorganFreeman

This guy developed ebook2audiobook and I can use that voice on Docker there.... I believe he uses Coqui so this should work-- I just imagine I have an error on my end.

eginhard · 2025-03-10T15:57:59Z

eginhard
Mar 10, 2025
Maintainer

You'd need to share the error message...

0 replies

wpiman · 2025-03-10T19:54:37Z

wpiman
Mar 10, 2025
Author

I try to open the server with the voice, and I get the following error to use checkpoint directory, but that is not an option on the server.py.....


root@4e3827439b7f:~.  python TTS/server/server.py --model_path /models/morgan/model.pth --config_path /models/morgan/config.json 
Using model: xtts
Traceback (most recent call last):
  File "/root/TTS/server/server.py", line 95, in <module>
    api = TTS(
  File "/root/TTS/api.py", line 104, in __init__
    self.load_tts_model_by_path(model_path, config_path, gpu=gpu)
  File "/root/TTS/api.py", line 250, in load_tts_model_by_path
    self.synthesizer = Synthesizer(
  File "/root/TTS/utils/synthesizer.py", line 99, in __init__
    self._load_tts(self.tts_checkpoint, self.tts_config_path, use_cuda)
  File "/root/TTS/utils/synthesizer.py", line 215, in _load_tts
    self.tts_model.load_checkpoint(self.tts_config, tts_checkpoint, eval=True)
  File "/root/TTS/tts/models/xtts.py", line 746, in load_checkpoint
    raise ValueError(msg)
ValueError: You passed a file to `checkpoint_dir=`. Use `checkpoint_path=/models/morgan/model.pth` instead.
root@4e3827439b7f:~# python TTS/server/server.py --help
usage: server.py [-h] [--list_models] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH]
                 [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speakers_file_path SPEAKERS_FILE_PATH] [--port PORT] [--device DEVICE]
                 [--use_cuda | --no-use_cuda] [--debug | --no-debug] [--show_details | --no-show_details]

options:
  -h, --help            show this help message and exit
  --list_models         list available pre-trained tts and vocoder models.
  --model_name MODEL_NAME
                        Name of one of the pre-trained tts models in format <language>/<dataset>/<model_name>
  --vocoder_name VOCODER_NAME
                        name of one of the released vocoder models.
  --config_path CONFIG_PATH
                        Path to model config file.
  --model_path MODEL_PATH
                        Path to model file.
  --vocoder_path VOCODER_PATH
                        Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before
                        (WaveRNN).
  --vocoder_config_path VOCODER_CONFIG_PATH
                        Path to vocoder model config file.
  --speakers_file_path SPEAKERS_FILE_PATH
                        JSON file for multi-speaker model.
  --port PORT           port to listen on.
  --device DEVICE       Device to run model on.
  --use_cuda, --no-use_cuda
                        true to use CUDA. (default: False)
  --debug, --no-debug   true to enable Flask debug mode. (default: False)
  --show_details, --no-show_details
                        Generate model detail page. (default: False)
root@4e3827439b7f:~#

1 reply

eginhard Mar 11, 2025
Maintainer

This should work: python TTS/server/server.py --model_path /models/morgan --config_path /models/morgan/config.json

Note that it assumes that /models/morgan contains a file named model.pth. For models other than XTTS --model_path needs to point to the model file directly instead.

wpiman · 2025-03-11T13:36:37Z

wpiman
Mar 11, 2025
Author

When I do that, the server actually shows a bunch of voices to select and they all work great. But none of are the one I expected.. When I launch the server by itself- I get no voice selection....

I then tried to specify the text in the URL


[2025-03-11 13:34:02,834] ERROR in app: Exception on /api/tts [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1511, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 919, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 917, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 902, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/root/TTS/server/server.py", line 178, in tts
    wavs = api.tts(text, speaker=speaker_idx, language=language_idx, style_wav=style_wav)
  File "/root/TTS/api.py", line 323, in tts
    wav = self.synthesizer.tts(
  File "/root/TTS/utils/synthesizer.py", line 358, in tts
    raise ValueError(
ValueError:  [!] Looks like you are using a multi-speaker model. You need to define either a `speaker_idx` or a `speaker_wav` to use a multi-speaker model.
::ffff:192.168.0.6 - - [11/Mar/2025 13:34:02] "GET /api/tts?text=Test&speaker_id=&style_wav=&language_id= HTTP/1.1" 500 -

1 reply

eginhard Mar 11, 2025
Maintainer

You need to pass some reference audio from your target speaker. This is actually possible in the demo server only since coqui-tts version 0.26.0 that I just released yesterday, so you should update to that version, then you should see a field in the form to specify it.

wpiman · 2025-03-11T17:56:31Z

wpiman
Mar 11, 2025
Author

Hot diggity dog. That worked. That was a MASSIVE docker pull.

root@42721a53565b:~# ls /models/morgan/
config.json  model.pth reference.wav  speakers_xtts.pth  vocab.json 
root@42721a53565b:~# 
python TTS/server/server.py --model_path /models/morgan --config_path /models/morgan/config.json

I then put /models/morgan/reference.wav in the box and the server reported...

/api/tts?text=There%20is%20a%20car%20coming%20up%20the%20driveway.&speaker_id=&style_wav=&speaker_wav=/models/morgan/reference.wav&language_id=en

and it spoke. The voices sort of was a speed up version of Morgan Freeman... Not sure how how to alter it but I am off the the races.
Thanks!

I tried a C3PO one that wasn't very good either. I'll try some more.

0 replies

wpiman · 2025-03-11T18:41:58Z

wpiman
Mar 11, 2025
Author

Hot diggity dog. That worked. That was a MASSIVE docker pull. ***@***.***:~# ls /models/morgan/ config.json model.pth reference.wav speakers_xtts.pth vocab.json ***@***.***:~# python TTS/server/server.py --model_path /models/morgan --config_path /models/morgan/config.json I then put /models/morgan/reference.wav in the box and the server reported... /api/tts?text=There%20is%20a%20car%20coming%20up%20the%20driveway.&speaker_id=&style_wav=&speaker_wav=/models/morgan/reference.wav&language_id=en and it spoke. The voices sort of was a speed up version of Morgan Freeman... I am using ebook2audiobook with the same voice and it sounds much more realistic-- but it also takes 30 seconds to process. Thanks!

…

On Tue, Mar 11, 2025 at 10:47 AM Enno Hermann ***@***.***> wrote: You need to pass some reference audio from your target speaker. This is actually possible in the demo server only since coqui-tts version 0.26.0 that I just released yesterday, so you should update to that version, then you should see a field in the form to specify it. — Reply to this email directly, view it on GitHub <#335 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGOME7M2ECMT66XWVNS5GL2T3ZORAVCNFSM6AAAAABYTFZIQ2VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTENBWGI4DSNY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running fine-tuned XTTS models in the demo server #335

{{title}}

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Running fine-tuned XTTS models in the demo server #335

wpiman Mar 8, 2025

Replies: 5 comments · 2 replies

eginhard Mar 10, 2025 Maintainer

wpiman Mar 10, 2025 Author

eginhard Mar 11, 2025 Maintainer

wpiman Mar 11, 2025 Author

eginhard Mar 11, 2025 Maintainer

wpiman Mar 11, 2025 Author

wpiman Mar 11, 2025 Author

wpiman
Mar 8, 2025

Replies: 5 comments 2 replies

eginhard
Mar 10, 2025
Maintainer

wpiman
Mar 10, 2025
Author

eginhard Mar 11, 2025
Maintainer

wpiman
Mar 11, 2025
Author

eginhard Mar 11, 2025
Maintainer

wpiman
Mar 11, 2025
Author

wpiman
Mar 11, 2025
Author