python offline_inference_simple_pp.py config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 691/691 [00:00<00:00, 3.35MB/s] INFO 05-01 20:38:02 config.py:531] Disabled the custom all-reduce kernel because it is not supported with pipeline parallelism. 2024-05-01 20:38:07,484 WARNING utils.py:580 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set `RAY_USE_MULTIPROCESSING_CPU_COUNT=1` as an env var before starting Ray. Set the env var: `RAY_DISABLE_DOCKER_CPU_WARNING=1` to mute this warning. 2024-05-01 20:38:07,485 WARNING utils.py:592 -- Ray currently does not support initializing Ray with fractional cpus. Your num_cpus will be truncated from 20.4 to 20. 2024-05-01 20:38:07,666 INFO worker.py:1749 -- Started a local Ray instance. INFO 05-01 20:38:10 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='facebook/opt-2.7b', speculative_config=None, tokenizer='facebook/opt-2.7b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.DUMMY, tensor_parallel_size=1, pipeline_parallel_size=2, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0) tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 685/685 [00:00<00:00, 2.83MB/s] vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 7.73MB/s] merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 1.84MB/s] special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████| 441/441 [00:00<00:00, 1.89MB/s] generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 589kB/s] INFO 05-01 20:38:37 utils.py:613] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 (RayWorkerWrapper pid=18418) INFO 05-01 20:38:37 utils.py:613] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 (RayWorkerWrapper pid=18418) INFO 05-01 20:38:38 selector.py:77] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance. (RayWorkerWrapper pid=18418) INFO 05-01 20:38:38 selector.py:33] Using XFormers backend. INFO 05-01 20:38:38 selector.py:77] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance. INFO 05-01 20:38:38 selector.py:33] Using XFormers backend. INFO 05-01 20:38:44 pynccl_utils.py:43] vLLM is using nccl==2.18.1 (RayWorkerWrapper pid=18418) INFO 05-01 20:38:44 pynccl_utils.py:43] vLLM is using nccl==2.18.1 INFO 05-01 20:38:46 model_runner.py:181] Loading model weights took 4.9551 GB (RayWorkerWrapper pid=18418) INFO 05-01 20:38:46 model_runner.py:181] Loading model weights took 4.9551 GB ERROR 05-01 20:38:47 worker_base.py:147] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution. ERROR 05-01 20:38:47 worker_base.py:147] Traceback (most recent call last): ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/worker/worker_base.py", line 139, in execute_method ERROR 05-01 20:38:47 worker_base.py:147] return executor(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ERROR 05-01 20:38:47 worker_base.py:147] return func(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/worker/worker.py", line 140, in determine_num_available_blocks ERROR 05-01 20:38:47 worker_base.py:147] self.model_runner.profile_run() ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ERROR 05-01 20:38:47 worker_base.py:147] return func(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/worker/model_runner.py", line 844, in profile_run ERROR 05-01 20:38:47 worker_base.py:147] self.execute_model(seqs, kv_caches) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ERROR 05-01 20:38:47 worker_base.py:147] return func(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/worker/model_runner.py", line 763, in execute_model ERROR 05-01 20:38:47 worker_base.py:147] hidden_states = model_executable(**execute_model_kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl ERROR 05-01 20:38:47 worker_base.py:147] return self._call_impl(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl ERROR 05-01 20:38:47 worker_base.py:147] return forward_call(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/model_executor/models/opt.py", line 300, in forward ERROR 05-01 20:38:47 worker_base.py:147] hidden_states = self.model(input_ids, positions, kv_caches, ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl ERROR 05-01 20:38:47 worker_base.py:147] return self._call_impl(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl ERROR 05-01 20:38:47 worker_base.py:147] return forward_call(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/model_executor/models/opt.py", line 275, in forward ERROR 05-01 20:38:47 worker_base.py:147] return self.decoder(input_ids, positions, kv_caches, attn_metadata) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl ERROR 05-01 20:38:47 worker_base.py:147] return self._call_impl(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl ERROR 05-01 20:38:47 worker_base.py:147] return forward_call(*args, **kwargs) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/model_executor/models/opt.py", line 249, in forward ERROR 05-01 20:38:47 worker_base.py:147] hidden_states = layer(hidden_states, kv_caches[i], attn_metadata) ERROR 05-01 20:38:47 worker_base.py:147] IndexError: list index out of range Traceback (most recent call last): File "/workspace/vllm/examples/offline_inference_simple_pp.py", line 14, in llm = LLM(model="facebook/opt-2.7b", pipeline_parallel_size=2, load_format="dummy") File "/workspace/vllm/vllm/entrypoints/llm.py", line 118, in __init__ self.llm_engine = LLMEngine.from_engine_args( File "/workspace/vllm/vllm/engine/llm_engine.py", line 294, in from_engine_args engine = cls( File "/workspace/vllm/vllm/engine/llm_engine.py", line 171, in __init__ self._initialize_kv_caches() File "/workspace/vllm/vllm/engine/llm_engine.py", line 251, in _initialize_kv_caches self.model_executor.determine_num_available_blocks()) File "/workspace/vllm/vllm/executor/distributed_gpu_executor.py", line 28, in determine_num_available_blocks num_blocks = self._run_workers("determine_num_available_blocks", ) File "/workspace/vllm/vllm/executor/ray_gpu_executor.py", line 261, in _run_workers driver_worker_output = self.driver_worker.execute_method( File "/workspace/vllm/vllm/worker/worker_base.py", line 148, in execute_method raise e File "/workspace/vllm/vllm/worker/worker_base.py", line 139, in execute_method (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution. (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] Traceback (most recent call last): (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/worker/worker_base.py", line 139, in execute_method (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return executor(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return func(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/worker/worker.py", line 140, in determine_num_available_blocks (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] self.model_runner.profile_run() (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return func(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/worker/model_runner.py", line 844, in profile_run (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] self.execute_model(seqs, kv_caches) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return func(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/worker/model_runner.py", line 763, in execute_model (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] hidden_states = model_executable(**execute_model_kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return self._call_impl(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return forward_call(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/model_executor/models/opt.py", line 300, in forward (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] hidden_states = self.model(input_ids, positions, kv_caches, (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return self._call_impl(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return forward_call(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/model_executor/models/opt.py", line 275, in forward (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return self.decoder(input_ids, positions, kv_caches, attn_metadata) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return self._call_impl(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] return forward_call(*args, **kwargs) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] File "/workspace/vllm/vllm/model_executor/models/opt.py", line 249, in forward (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] hidden_states = layer(hidden_states, kv_caches[i], attn_metadata) (RayWorkerWrapper pid=18418) ERROR 05-01 20:38:47 worker_base.py:147] IndexError: list index out of range return executor(*args, **kwargs) File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/workspace/vllm/vllm/worker/worker.py", line 140, in determine_num_available_blocks self.model_runner.profile_run() File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/workspace/vllm/vllm/worker/model_runner.py", line 844, in profile_run self.execute_model(seqs, kv_caches) File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/workspace/vllm/vllm/worker/model_runner.py", line 763, in execute_model hidden_states = model_executable(**execute_model_kwargs) File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/workspace/vllm/vllm/model_executor/models/opt.py", line 300, in forward hidden_states = self.model(input_ids, positions, kv_caches, File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/workspace/vllm/vllm/model_executor/models/opt.py", line 275, in forward return self.decoder(input_ids, positions, kv_caches, attn_metadata) File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/workspace/vllm-pp-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/workspace/vllm/vllm/model_executor/models/opt.py", line 249, in forward hidden_states = layer(hidden_states, kv_caches[i], attn_metadata) IndexError: list index out of range