-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: support deepseek-r1-distill-qwen #2781
Conversation
DeepSeek-R1-Distill-Qwen-14B-GGUF 和 deepseek-r1-distill-qwen-14b-awq 都加载失败 |
Im encountering a weird problem with deepseek-r1-distill-qwen 32b awq. I loaded the model with vllm backend. With each request, the model seems to stop generating after outputting 1000+ tokens. There is no warnings or errors from inference or vllm. |
What's the stop reason? |
There isn't an apparent stop reason other than "finished request xxx". |
I think I found the problem. It seems inference is not passing max_tokens to vllm's inference parameters, so its default, 1024, is used by vllm. |
No description provided.