-
-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: upgrade tts_server.py to use kokoro 1.0, fix audio output device on linux #432
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 2a56ccc in 1 minute and 41 seconds
More details
- Looked at
366
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. gptme/tools/tts.py:286
- Draft comment:
Using exact matches ('pulse', 'pipewire') could be too strict. Consider a substring check to allow variants like 'PulseAudio'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment raises a valid point - PulseAudio devices might show up with variant names. However, without seeing actual device names from real systems, we can't be certain this is a real problem. The current approach is conservative and explicit. Making it more permissive with substring matching could potentially match unintended devices. The code already handles fallback cases if no exact match is found.
I don't have data about real PulseAudio/PipeWire device names in the wild. The current exact matching might be too strict, but substring matching might be too permissive.
While the concern is valid, the current conservative approach with fallbacks is safer than potentially matching wrong devices. If this becomes a real issue, it can be fixed with actual device name data.
Delete the comment. While it raises a valid concern, we lack evidence that this is a real problem, and the current conservative approach with fallbacks is safer.
2. scripts/tts_server.py:227
- Draft comment:
Consider adding validation for the 'lang' and 'voice' parameters in main() to avoid unexpected behavior when non-supported values are provided. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The code already has validation in place - invalid voices will be caught during initialization through _list_voices() and init_model(). For language codes, we don't have enough context about what KPipeline accepts to know if additional validation is needed. The validation suggestion seems redundant with existing checks.
I could be wrong about KPipeline's behavior - maybe it silently accepts invalid language codes and we should validate them. However, without knowing KPipeline's implementation details, I can't be certain.
Even with uncertainty about KPipeline, adding validation in main() would be redundant with the validation that happens during initialization. Any issues will be caught and handled appropriately when the pipeline starts up.
The comment should be deleted since the code already has appropriate validation in place during initialization, and we don't have enough context about KPipeline to justify additional validation.
3. gptme/tools/tts.py:286
- Draft comment:
Consider using substring matching (e.g.if 'pulse' in device_name or 'pipewire' in device_name
) instead of strict equality. Device names like "pulseaudio" or "pipewire-jack" may be missed. - Reason this comment was not posted:
Marked as duplicate.
4. scripts/tts_server.py:155
- Draft comment:
Avoid usingassert pipeline
for production logic; replace it with an explicit check and raise an HTTPException if the pipeline is not initialized. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_eQMY3g0I8iAVMaFl
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #432 +/- ##
==========================================
+ Coverage 67.47% 67.66% +0.18%
==========================================
Files 70 70
Lines 6143 6145 +2
==========================================
+ Hits 4145 4158 +13
+ Misses 1998 1987 -11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
gptme/tools/tts.py
Outdated
# Second try: platform-specific defaults | ||
# - On macOS: prefer CoreAudio (hostapi == 2) | ||
# - On Linux: prefer PulseAudio/PipeWire virtual devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Second try: platform-specific defaults | |
# - On macOS: prefer CoreAudio (hostapi == 2) | |
# - On Linux: prefer PulseAudio/PipeWire virtual devices | |
# Second try: on macOS, prefer CoreAudio (hostapi == 2) |
gptme/tools/tts.py
Outdated
if d["max_output_channels"] > 0 | ||
and ( | ||
# macOS: CoreAudio devices | ||
(d["hostapi"] == 2) | ||
or | ||
# Linux: PulseAudio/PipeWire virtual devices | ||
d["name"].lower() in ["pulse", "pipewire"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if d["max_output_channels"] > 0 | |
and ( | |
# macOS: CoreAudio devices | |
(d["hostapi"] == 2) | |
or | |
# Linux: PulseAudio/PipeWire virtual devices | |
d["name"].lower() in ["pulse", "pipewire"] | |
) | |
if d["max_output_channels"] > 0 and d["hostapi"] == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 38584de in 48 seconds
More details
- Looked at
166
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. gptme/tools/tts.py:299
- Draft comment:
Linux virtual device fallback removed; now only filtering for macOS devices. Confirm that relying on the generic fallback is acceptable for non-macOS platforms. - Reason this comment was not posted:
Comment did not seem useful.
2. scripts/tts_server.py:226
- Draft comment:
Enhanced health endpoint now returns detailed voice and language info. Consider adding an explicit server version field if needed for client validation. - Reason this comment was not posted:
Confidence changes required:50%
None
3. scripts/tts_server.py:50
- Draft comment:
New LANGUAGE_CODES mapping and extended voice lists improve clarity. Ensure these remain in-sync with the official Kokoro voices documentation. - Reason this comment was not posted:
Comment did not seem useful.
4. gptme/tools/tts.py:299
- Draft comment:
Removed Linux-specific condition in device selection (previously checking device name for 'pulse' or 'pipewire') now only filters by hostapi==2 (macOS). Ensure Linux fallback behavior is as intended and add a clarifying comment if necessary. - Reason this comment was not posted:
Marked as duplicate.
5. scripts/tts_server.py:50
- Draft comment:
Expanded LANGUAGE_CODES and _list_voices now support multiple languages with extended voice lists. Verify that these voice definitions align with Kokoro 1.0 requirements and update related documentation if needed. - Reason this comment was not posted:
Confidence changes required:50%
None
Workflow ID: wflow_9JXVz6BpvBQ9sOu7
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Upgrade TTS server to use Kokoro 1.0, prefer virtual audio devices on Linux, and refactor for new Kokoro pipeline.
get_output_device()
intts.py
now prefers virtual audio devices (PulseAudio/PipeWire) on Linux.text_to_speech()
intts_server.py
usesKPipeline
for audio generation, concatenates audio segments, and strips silence.tts_server.py
.tts_server.py
to usekokoro>=0.3.4
.torch
andphonemizer
dependencies fromtts_server.py
.init_model()
to initializeKPipeline
instead of loading models directly.LANGUAGE_CODES
dictionary for supported languages intts_server.py
.This description was created by
for 38584de. It will automatically update as commits are pushed.