-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Baichuan 13b model #512
Conversation
A single GPU? for 24GB like 4090, is not possible to load in a single GPU with fp16. |
A100 do have 40GB |
/usr/bin/python3: Error while finding module specification for 'vllm.entrypoints.openai.api_server' (ImportError: cannot import name 'activation_ops' from partially initialized module 'vllm' (most likely due to a circular import) (.local/lib/python3.8/site-packages/vllm/init.py)) |
I solved this problem,only need pip install vllm again |
python3 benchmarks/benchmark_throughput.py --dataset benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json --backend hf --hf-max-batch-size 4 --model .//baichuan-inc--Baichuan-13B-Chat --trust-remote-code
|
Close this PR since #643 is a simpler solution. Please check out the latest main branch to test baichuan-13b! Again, thanks @ericzhou571 for the great work! |
Have you solved it? I also encountered this problem. |
This pull request introduces a server that can be initiated using the following command:
We've conducted tests with a single GPU for baichuan-13b-chat inference. By setting the temperature to 0 and using identical prompts, we were able to achieve consistent outputs from the baichuan-13b-chat model, deployed using a standard FastChat worker.
However, please be aware of the following limitations:
Our code is currently only compatible with non-distributed deployments, i.e., setups involving a single GPU and single model.
While our code is operational with distributed deployment using tensor parallelism, the results it produces are not yet accurate. We are actively looking for community help to rectify this issue.
Any contributions to improving this implementation would be greatly appreciated.