fix: upgrade tts_server.py to use kokoro 1.0, fix audio output device on linux #432

ErikBjare · 2025-02-03T12:10:23Z

Important

Upgrade TTS server to use Kokoro 1.0, prefer virtual audio devices on Linux, and refactor for new Kokoro pipeline.

Behavior:
- get_output_device() in tts.py now prefers virtual audio devices (PulseAudio/PipeWire) on Linux.
- text_to_speech() in tts_server.py uses KPipeline for audio generation, concatenates audio segments, and strips silence.
- Adds language and voice selection via command-line options in tts_server.py.
Dependencies:
- Updates tts_server.py to use kokoro>=0.3.4.
Misc:
- Removes torch and phonemizer dependencies from tts_server.py.
- Refactors init_model() to initialize KPipeline instead of loading models directly.
- Adds LANGUAGE_CODES dictionary for supported languages in tts_server.py.

^{This description was created by}^{for 38584de. It will automatically update as commits are pushed.}

… on linux

ellipsis-dev

❌ Changes requested. Reviewed everything up to 2a56ccc in 1 minute and 41 seconds

More details

Looked at 366 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. gptme/tools/tts.py:286

Draft comment:
Using exact matches ('pulse', 'pipewire') could be too strict. Consider a substring check to allow variants like 'PulseAudio'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment raises a valid point - PulseAudio devices might show up with variant names. However, without seeing actual device names from real systems, we can't be certain this is a real problem. The current approach is conservative and explicit. Making it more permissive with substring matching could potentially match unintended devices. The code already handles fallback cases if no exact match is found.
I don't have data about real PulseAudio/PipeWire device names in the wild. The current exact matching might be too strict, but substring matching might be too permissive.
While the concern is valid, the current conservative approach with fallbacks is safer than potentially matching wrong devices. If this becomes a real issue, it can be fixed with actual device name data.
Delete the comment. While it raises a valid concern, we lack evidence that this is a real problem, and the current conservative approach with fallbacks is safer.

2. scripts/tts_server.py:227

Draft comment:
Consider adding validation for the 'lang' and 'voice' parameters in main() to avoid unexpected behavior when non-supported values are provided.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The code already has validation in place - invalid voices will be caught during initialization through _list_voices() and init_model(). For language codes, we don't have enough context about what KPipeline accepts to know if additional validation is needed. The validation suggestion seems redundant with existing checks.
I could be wrong about KPipeline's behavior - maybe it silently accepts invalid language codes and we should validate them. However, without knowing KPipeline's implementation details, I can't be certain.
Even with uncertainty about KPipeline, adding validation in main() would be redundant with the validation that happens during initialization. Any issues will be caught and handled appropriately when the pipeline starts up.
The comment should be deleted since the code already has appropriate validation in place during initialization, and we don't have enough context about KPipeline to justify additional validation.

3. gptme/tools/tts.py:286

Draft comment:
Consider using substring matching (e.g. if 'pulse' in device_name or 'pipewire' in device_name) instead of strict equality. Device names like "pulseaudio" or "pipewire-jack" may be missed.
Reason this comment was not posted:
Marked as duplicate.

4. scripts/tts_server.py:155

Draft comment:
Avoid using assert pipeline for production logic; replace it with an explicit check and raise an HTTPException if the pipeline is not initialized.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_eQMY3g0I8iAVMaFl

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

scripts/tts_server.py

codecov-commenter · 2025-02-03T12:12:16Z

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 67.66%. Comparing base (964b528) to head (38584de).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
gptme/tools/tts.py	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #432      +/-   ##
==========================================
+ Coverage   67.47%   67.66%   +0.18%     
==========================================
  Files          70       70              
  Lines        6143     6145       +2     
==========================================
+ Hits         4145     4158      +13     
+ Misses       1998     1987      -11

Flag	Coverage Δ
anthropic/claude-3-haiku-20240307	`66.18% <0.00%> (-0.03%)`	⬇️
openai/gpt-4o-mini	`65.82% <0.00%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ErikBjare · 2025-02-03T12:54:46Z

gptme/tools/tts.py

+    # Second try: platform-specific defaults
+    # - On macOS: prefer CoreAudio (hostapi == 2)
+    # - On Linux: prefer PulseAudio/PipeWire virtual devices


Suggested change

# Second try: platform-specific defaults

# - On macOS: prefer CoreAudio (hostapi == 2)

# - On Linux: prefer PulseAudio/PipeWire virtual devices

# Second try: on macOS, prefer CoreAudio (hostapi == 2)

ErikBjare · 2025-02-03T12:55:02Z

gptme/tools/tts.py

+            if d["max_output_channels"] > 0
+            and (
+                # macOS: CoreAudio devices
+                (d["hostapi"] == 2)
+                or
+                # Linux: PulseAudio/PipeWire virtual devices
+                d["name"].lower() in ["pulse", "pipewire"]
+            )


Suggested change

if d["max_output_channels"] > 0

and (

# macOS: CoreAudio devices

(d["hostapi"] == 2)

or

# Linux: PulseAudio/PipeWire virtual devices

d["name"].lower() in ["pulse", "pipewire"]

)

if d["max_output_channels"] > 0 and d["hostapi"] == 2

ellipsis-dev

👍 Looks good to me! Incremental review on 38584de in 48 seconds

More details

Looked at 166 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. gptme/tools/tts.py:299

Draft comment:
Linux virtual device fallback removed; now only filtering for macOS devices. Confirm that relying on the generic fallback is acceptable for non-macOS platforms.
Reason this comment was not posted:
Comment did not seem useful.

2. scripts/tts_server.py:226

Draft comment:
Enhanced health endpoint now returns detailed voice and language info. Consider adding an explicit server version field if needed for client validation.
Reason this comment was not posted:
Confidence changes required: 50%
None

3. scripts/tts_server.py:50

Draft comment:
New LANGUAGE_CODES mapping and extended voice lists improve clarity. Ensure these remain in-sync with the official Kokoro voices documentation.
Reason this comment was not posted:
Comment did not seem useful.

4. gptme/tools/tts.py:299

Draft comment:
Removed Linux-specific condition in device selection (previously checking device name for 'pulse' or 'pipewire') now only filters by hostapi==2 (macOS). Ensure Linux fallback behavior is as intended and add a clarifying comment if necessary.
Reason this comment was not posted:
Marked as duplicate.

5. scripts/tts_server.py:50

Draft comment:
Expanded LANGUAGE_CODES and _list_voices now support multiple languages with extended voice lists. Verify that these voice definitions align with Kokoro 1.0 requirements and update related documentation if needed.
Reason this comment was not posted:
Confidence changes required: 50%
None

Workflow ID: wflow_9JXVz6BpvBQ9sOu7

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

fix: upgrade tts_server.py to use kokoro 1.0, fix audio output device…

2a56ccc

… on linux

ellipsis-dev bot reviewed Feb 3, 2025

View reviewed changes

scripts/tts_server.py Show resolved Hide resolved

ErikBjare commented Feb 3, 2025

View reviewed changes

fix: final fixes for kokoro 1.0

38584de

ellipsis-dev bot reviewed Feb 3, 2025

View reviewed changes

ErikBjare merged commit 007a075 into master Feb 3, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: upgrade tts_server.py to use kokoro 1.0, fix audio output device on linux #432

fix: upgrade tts_server.py to use kokoro 1.0, fix audio output device on linux #432

ErikBjare commented Feb 3, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

codecov-commenter commented Feb 3, 2025 •

edited

Loading

ErikBjare Feb 3, 2025

ErikBjare Feb 3, 2025

ellipsis-dev bot left a comment

fix: upgrade tts_server.py to use kokoro 1.0, fix audio output device on linux #432

fix: upgrade tts_server.py to use kokoro 1.0, fix audio output device on linux #432

Conversation

ErikBjare commented Feb 3, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 3, 2025 • edited Loading

Codecov Report

ErikBjare Feb 3, 2025

Choose a reason for hiding this comment

ErikBjare Feb 3, 2025

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ErikBjare commented Feb 3, 2025 •

edited by ellipsis-dev bot

Loading

codecov-commenter commented Feb 3, 2025 •

edited

Loading