ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b #572

McCarrtney · 2023-07-25T10:40:48Z

I tried to deploy an API serving using baichuan-7b, but there is an error:

NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=6,7 python -m vllm.entrypoints.openai.api_server --model /root/data/zyy/baichuan-7B --host 0.0.0.0 --port 11114 --tensor-parallel-size 2 --trust-remote-code

(RayWorker pid=53626) No module named 'transformers_modules'
(RayWorker pid=53626) Traceback (most recent call last):
(RayWorker pid=53626)   File "/root/miniconda3/envs/vllm/lib/python3.9/site-packages/ray/_private/serialization.py", line 387, in deserialize_objects
(RayWorker pid=53626)     obj = self._deserialize_object(data, metadata, object_ref)
(RayWorker pid=53626)   File "/root/miniconda3/envs/vllm/lib/python3.9/site-packages/ray/_private/serialization.py", line 268, in _deserialize_object
(RayWorker pid=53626)     return self._deserialize_msgpack_data(data, metadata_fields)
(RayWorker pid=53626)   File "/root/miniconda3/envs/vllm/lib/python3.9/site-packages/ray/_private/serialization.py", line 223, in _deserialize_msgpack_data
(RayWorker pid=53626)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(RayWorker pid=53626)   File "/root/miniconda3/envs/vllm/lib/python3.9/site-packages/ray/_private/serialization.py", line 213, in _deserialize_pickle5_data
(RayWorker pid=53626)     obj = pickle.loads(in_band)
(RayWorker pid=53626) ModuleNotFoundError: No module named 'transformers_modules'

The text was updated successfully, but these errors were encountered:

imhuay · 2023-07-25T13:19:39Z

It seems that baichuan is not supported yet, you can refer to this repository: https://github.com/gameofdimension/vllm-cn

Or refer to this document: https://vllm.readthedocs.io/en/latest/models/adding_model.html

McCarrtney · 2023-07-26T01:59:00Z

Thank you! That's quite helpful

Sanster · 2023-07-27T03:06:06Z

Add this environment variable (replace with your modules directory) can make it work, but the results generated by the model are completely incorrect.

PYTHONPATH=/root/.cache/huggingface/modules

mklf · 2023-07-27T03:14:12Z

this is caused by:

vllm/vllm/engine/llm_engine.py

Line 148 in 58a072b

self._run_workers("init_worker",

self._run_workers("init_worker",
                  get_all_outputs=True,
                  worker_init_fn=lambda: Worker(
                      self.model_config,
                      self.parallel_config,
                       ...
)

this lambda function will capture self object, which has a reference to config from remote huggingface modules

you can try this to avoid capture self object:

import copy
model_config = copy.deepcopy(self.model_config)
parallel_config = copy.deepcopy(self.parallel_config)
scheduler_config = copy.deepcopy(self.scheduler_config)
self._run_workers("init_worker",
                      get_all_outputs=True,
                      worker_init_fn=lambda: Worker(
                          model_config,
                          parallel_config,
                          scheduler_config,
                          None,
                          None,
                      ))

for baichuan, please note that baichuan-7b uses rotary embedding while baichuan-13b uses alibi, you can refer this link #512

Sanster · 2023-07-27T03:26:53Z

@mklf Can you get normal output using biachuan-7b in distribute mode? In my environment MPT-7b works fine in distribute mode, but biachuan-7b/13b always return garbage output

baichuan-7b normal output using 1 gpu

baichuan-7b garbage output using tensor_parallel_size=2

mklf · 2023-07-27T03:30:59Z

no , in this link #512 they mentioned:


Our code is currently only compatible with non-distributed deployments, i.e., setups involving a single GPU and single model.

While our code is operational with distributed deployment using tensor parallelism, the results it produces are not yet accurate. We are actively looking for community help to rectify this issue.

mklf · 2023-07-27T08:07:46Z

@Sanster I fixed the bug, here is the updated code, you can replace load_weights in

vllm/vllm/model_executor/models/baichuan.py

Line 259 in 58a072b

def load_weights(self,

with the following code(tested in baichuan13b):

  def load_weights(self,
                   model_name_or_path: str,
                   cache_dir: Optional[str] = None,
                   use_np_cache: bool = False):
      tensor_model_parallel_rank = get_tensor_model_parallel_rank()
      state_dict = self.state_dict()

      for name, loaded_weight in hf_model_weights_iterator(
              model_name_or_path, cache_dir, use_np_cache):
          if "rotary_emb.inv_freq" in name:
              continue

          is_gate_up_weight = False
          for stride_id, weight_name in enumerate(["gate_proj", "up_proj"]):
              if weight_name not in name:
                  continue
              param = state_dict[name.replace(weight_name, "gate_up_proj")]
              shard_size = param.shape[0] // 2
              loaded_weight = loaded_weight[
                  shard_size * tensor_model_parallel_rank:shard_size *
                  (tensor_model_parallel_rank + 1)]
              param_slice = param.data[shard_size * stride_id:shard_size *
                                       (stride_id + 1)]
              assert param_slice.shape == loaded_weight.shape
              param_slice.copy_(loaded_weight)
              is_gate_up_weight = True
              break
          if is_gate_up_weight:
              continue

          param = state_dict[name]
         
          if "W_pack.weight" in name:  # <----- newly added code here
              head_size = self.config.hidden_size // self.config.num_attention_heads
              loaded_weight = (
                  loaded_weight.contiguous()
                  .view(
                      3,
                      self.config.num_attention_heads,
                      head_size,
                      self.config.hidden_size,
                  )
                  .transpose(0, 1)
                  .contiguous()
                  .view(-1, self.config.hidden_size)
              )

              shard_size = param.shape[0]
              start = shard_size * tensor_model_parallel_rank
              end = shard_size * (tensor_model_parallel_rank + 1)
              loaded_weight = loaded_weight[start:end]
              loaded_weight = loaded_weight.view(
                  -1, 3, head_size, self.config.hidden_size
              )
              loaded_weight = loaded_weight.transpose(0, 1)
              loaded_weight = loaded_weight.reshape(-1, self.config.hidden_size)

          load_tensor_parallel_weights(param, loaded_weight, name,
                                       self._column_parallel_weights,
                                       self._row_parallel_weights,
                                       tensor_model_parallel_rank)

McCarrtney · 2023-07-27T08:11:59Z

We have tried baichaun 7B API serving on a single GPU, it's ok and the generation is good. But the speedup is 4x slower than LLaMA 7B. Is there anything that decelerate baichuan inferencing?

Sanster · 2023-07-27T08:13:38Z

@mklf Thank you for your reminder. I have also found the issue in "load_weights". The method I used to fix it is similar to yours. Here is my implementation: #598

zhuohan123 · 2023-08-07T21:47:30Z

Fixed by #599

Jingru · 2023-08-17T13:39:56Z

I have the same issue when serving local internlm-7b model with tensor_parallel=4. Any ideas?

Jingru · 2023-08-25T09:21:01Z

I have the same issue when serving local internlm-7b model with tensor_parallel=4. Any ideas?

This pull request #871 will solve the problem.

lihkinVerma · 2023-12-20T20:23:10Z

Its a long pending issue in transformer code. To fix it, never use the model_name having periods(.) in the model name if using trust_remote_code feature, change the name to have either underscore(_) or any other sysmbol

nightflight-dk · 2024-09-17T21:13:24Z

workaround for me was to switch to multiprocessing (disable Ray) and remove '.' from the model name (path) if present, before instantiating the engine. e.g. "weights/Phi3.5mini-instruct" -> "weights/Phi35mini-instruct"

to avoid merge conflicts. moving `.buildkite/ci/anyscale` 's gitignore inside

Set vllm-hpu-extension to fb36408, that includes support for non-GQA workloads in PipelinedPA

mklf mentioned this issue Jul 27, 2023

fix ModuleNotFoundError #599

Merged

zhuohan123 closed this as completed Aug 7, 2023

haining78zhang mentioned this issue Mar 24, 2024

ModuleNotFoundError: No module named 'transformers_modules' with API serving using phi-2b #3593

Closed

rickyyx pushed a commit to rickyyx/vllm that referenced this issue Oct 7, 2024

[git] sync gitignore with upstream (vllm-project#572)

a64a50f

to avoid merge conflicts. moving `.buildkite/ci/anyscale` 's gitignore inside

pi314ever pushed a commit to pi314ever/vllm that referenced this issue Dec 3, 2024

Set vllm-hpu-extension to fb36408 (vllm-project#572)

4b502a6

Set vllm-hpu-extension to fb36408, that includes support for non-GQA workloads in PipelinedPA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b #572

ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b #572

McCarrtney commented Jul 25, 2023

imhuay commented Jul 25, 2023 •

edited

Loading

McCarrtney commented Jul 26, 2023

Sanster commented Jul 27, 2023

mklf commented Jul 27, 2023 •

edited

Loading

Sanster commented Jul 27, 2023 •

edited

Loading

mklf commented Jul 27, 2023

mklf commented Jul 27, 2023 •

edited

Loading

McCarrtney commented Jul 27, 2023

Sanster commented Jul 27, 2023

zhuohan123 commented Aug 7, 2023

Jingru commented Aug 17, 2023 •

edited

Loading

Jingru commented Aug 25, 2023

lihkinVerma commented Dec 20, 2023 •

edited

Loading

nightflight-dk commented Sep 17, 2024

ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b #572

ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b #572

Comments

McCarrtney commented Jul 25, 2023

imhuay commented Jul 25, 2023 • edited Loading

McCarrtney commented Jul 26, 2023

Sanster commented Jul 27, 2023

mklf commented Jul 27, 2023 • edited Loading

Sanster commented Jul 27, 2023 • edited Loading

mklf commented Jul 27, 2023

mklf commented Jul 27, 2023 • edited Loading

McCarrtney commented Jul 27, 2023

Sanster commented Jul 27, 2023

zhuohan123 commented Aug 7, 2023

Jingru commented Aug 17, 2023 • edited Loading

Jingru commented Aug 25, 2023

lihkinVerma commented Dec 20, 2023 • edited Loading

nightflight-dk commented Sep 17, 2024

imhuay commented Jul 25, 2023 •

edited

Loading

mklf commented Jul 27, 2023 •

edited

Loading

Sanster commented Jul 27, 2023 •

edited

Loading

mklf commented Jul 27, 2023 •

edited

Loading

Jingru commented Aug 17, 2023 •

edited

Loading

lihkinVerma commented Dec 20, 2023 •

edited

Loading