当前环境
- 8 * H800
- CUDA 11.8
- vllm 0.5.3post1
- python 3.9
我正在使用vllm部署llama3 405B-instruct-FP8,但是在部署时,它报告了一个错误:
INFO 07-24 22:52:39 multiproc_worker_utils.py:136] Terminating local vLLM worker processes
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.model_runner.profile_run()
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return func(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states, residual = layer(
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states, residual = layer(
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states, residual = layer(
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states, residual = layer(
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states, residual = layer(
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states, residual = layer(
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states, residual = layer(
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return fn(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return fn(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return fn(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return fn(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return fn(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return fn(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return fn(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28011) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28015) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28016) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28010) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28012) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28014) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28013) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/usr/local/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
我的config.json是:
{
"model":"origin_model",
"disable_log_requests": "true",
"gpu_memory_utilization": 0.9,
"tensor_parallel_size": 8,
"trust_remote_code": true,
"enable_chunked_prefill": false,
"enable_prefix_caching": false,
"max_model_len": 4096,
"quantization": "fbgemm_fp8",
"dtype": "bfloat16"
}
我应该怎么办?
3条答案
按热度按时间fumotvh31#
H800是否不支持fbgemm_fp8?
tyky79it2#
请阅读 #6689 并在此提出,如果尚未讨论的话。
mum43rcc3#
CUDA版本应为12.X而非11.8。