Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Baichuan 13b model #512

Closed
wants to merge 5 commits into from

Conversation

ericzhou571
Copy link

@ericzhou571 ericzhou571 commented Jul 19, 2023

This pull request introduces a server that can be initiated using the following command:

python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 --model /root/github_repos/baichuan-13b-chat/ --gpu-memory-utilization 0.8 --dtype half --trust-remote-code

We've conducted tests with a single GPU for baichuan-13b-chat inference. By setting the temperature to 0 and using identical prompts, we were able to achieve consistent outputs from the baichuan-13b-chat model, deployed using a standard FastChat worker.

However, please be aware of the following limitations:

  1. Our code is currently only compatible with non-distributed deployments, i.e., setups involving a single GPU and single model.

  2. While our code is operational with distributed deployment using tensor parallelism, the results it produces are not yet accurate. We are actively looking for community help to rectify this issue.

Any contributions to improving this implementation would be greatly appreciated.

@luohao123
Copy link

A single GPU? for 24GB like 4090, is not possible to load in a single GPU with fp16.

@ericzhou571
Copy link
Author

A single GPU? for 24GB like 4090, is not possible to load in a single GPU with fp16.

A100 do have 40GB

@Laych7
Copy link

Laych7 commented Jul 27, 2023

/usr/bin/python3: Error while finding module specification for 'vllm.entrypoints.openai.api_server' (ImportError: cannot import name 'activation_ops' from partially initialized module 'vllm' (most likely due to a circular import) (.local/lib/python3.8/site-packages/vllm/init.py))
是我的问题吗

@Storm0921
Copy link

Storm0921 commented Jul 31, 2023

/usr/bin/python3:查找“vllm.entrypoints.openai.api_server”的模块规范时出错(导入错误:无法从部分初始化的模块“vllm”导入名称“activation_ops”(很可能是由于循环导入)(.local/lib/python3.8/site-packages/vllm/init.py)) 是我的问题吗

I solved this problem,only need pip install vllm again

@a1164714
Copy link

a1164714 commented Aug 2, 2023

python3 benchmarks/benchmark_throughput.py --dataset benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json --backend hf --hf-max-batch-size 4 --model .//baichuan-inc--Baichuan-13B-Chat --trust-remote-code

ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` 
`(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': 
'[PAD]'})`.

@zhuohan123
Copy link
Member

Close this PR since #643 is a simpler solution. Please check out the latest main branch to test baichuan-13b! Again, thanks @ericzhou571 for the great work!

@zhuohan123 zhuohan123 closed this Aug 2, 2023
@luhairong11
Copy link

python3 benchmarks/benchmark_throughput.py --dataset benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json --backend hf --hf-max-batch-size 4 --model .//baichuan-inc--Baichuan-13B-Chat --trust-remote-codepython3 benchmarks/benchmark_throughput.py --dataset benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json --backend hf --hf-max-batch-size 4 --model .//baichuan-inc--Baichuan-13B-Chat --trust-remote-代码

ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` 
`(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': 
'[PAD]'})`.

Have you solved it? I also encountered this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants