直接从 Docker Hub 拉取:
bashdocker pull openmmlab/lmdeploy:latest
运行:
bashdocker run --runtime nvidia --gpus all \ -v /data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4:/data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4 \ -p 8000:8000 \ --ipc=host \ vllm/vllm-openai:latest \ --model /data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4
后台执行:
bashdocker run -d --runtime nvidia --gpus device=7 \ -v /data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4:/data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4 \ -p 8000:8000 \ --ipc=host \ vllm/vllm-openai:latest \ --model /data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4