Docker 官网vLLM镜像快速部署 Qwen2.5

运行：

bash
展开代码
docker run --runtime nvidia --gpus all \
    -v /data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4:/data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4 \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model /data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4

后台执行：

bash
展开代码
docker run -d --runtime nvidia --gpus device=7 \
    -v /data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4:/data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4 \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model /data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4

访问：

python
展开代码
import time

import requests

# 请求 URL
url = "http://101.136.8.66:8000/v1/chat/completions"

# 请求体
payload = {
    "model": "/data/xiedong/Qwen2.5-72B-Instruct-GPTQ-Int4",  # 替换为实际的模型名称
    "messages": [
        {
            'role': 'user',
            'content': [
                {
                    'type': 'text',
                    'text': '''# 角色：通用图像生成的提示扩展器
## 角色概况
- 作者：LangGPT
- 版本：1.0
- 语言：英语
- 描述：您是一名专门的提示词扩展，负责将用户提供的简短提示转化为符合特定风格和内容要求的详细英文描述，以用于图像生成。

## 技能
1. 将简短的提示扩展成详细生动描述。
2. 融入感官细节和上下文，使提示更加丰富。
3. 在增加原始构思深度的同时，保持提示清晰和连贯。
4. 遵循用户指定的任何约束或主题。
5. "风格公式参考的字符串"是你可以参考的风格提示字符串，一般而言，要想办法把"风格公式参考的字符串"和"用户输入的提示"融合到一起。

## 规则
1. 扩展后的提示必须是英文。
2. 确保扩展内容忠实于用户原始提示的核心构思。
3. 避免引入不允许的内容或偏离原始主题。
4. 扩展后的提示应简洁，不超过100字。
5. 不包含任何与提示无关的个人意见或外部参考。

## 工作流程
1. 仔细阅读用户提供的提示和"风格公式参考的字符串"。
2. 确定提示中的关键元素和主题。
3. 通过添加相关细节、描述和上下文来扩展提示。
4. 将扩展后的提示和"风格公式参考的字符串"结合到一起。
4. 检查最终的提示，确保符合所有要求并保持连贯性。
5. 输出最终的扩展提示，以用于图像生成。

## 例子
"用户输入的提示"：小狐狸，马克笔风格
马克笔"风格公式参考的字符串"：Marker Drawing, {prompt}, bold marker lines, visibile paper texture, marker drawing
你的输出：
Marker Drawing, a small fox illustrated in bold marker lines, with a charmingly mischievous expression. The fox has soft, fluffy fur and is captured mid-pose, showcasing its lively, curious nature. The scene is detailed with visible paper texture, emphasizing the authentic marker drawing effect, and the vibrant colors bring a whimsical and friendly atmosphere to the illustration.

## 开始工作
"用户输入的提示"：大熊猫，涂鸦艺术。
涂鸦艺术"风格公式参考的字符串"：'Graffiti Art Style, {prompt}, dynamic, dramatic, vibrant colors, graffiti art style'。
请你输出最终的扩展提示。''',
                }
            ],
        },
        {
            'role': 'assistant',
            'content': [
                {
                    'type': 'text',
                    'text': """Graffiti Art Style, a large panda depicted in a vibrant and playful graffiti style. The panda has bold, dramatic black and white fur, with exaggerated, expressive eyes and a gentle smile. The background bursts with lively splashes of green bamboo leaves and dynamic shapes, adding a sense of movement and urban flair. This illustration is detailed with vibrant colors and strong line work, capturing the energetic essence of graffiti art while emphasizing the panda’s gentle, iconic look.""",
                }
            ],
        },
        {
            'role': 'user',
            'content': [
                {
                    'type': 'text',
                    'text': '''"用户输入的提示"：小狗。
涂鸦艺术"风格公式参考的字符串"：'Graffiti art'。
请你输出最终的扩展提示。'''
                }
            ],
        },
    ],

    # "do_sample": True,  # 如果为False，则使用贪心或最优的生成策略，输出较为确定。
    "temperature": 0.99,  # 范围是0到1之间。值越高，生成的内容越随机
    "top_p": 0.99,  # 值越低，生成内容越集中在更高概率的词汇上
    # "n": 3,  # 指定返回的响应数量。设置为1时只返回一个响应，可以设置为更高的数值来获取多个生成内容进行选择。
    "max_tokens": 2048,
    "stream": False
}

# 请求头
headers = {
    "Content-Type": "application/json"
}

total_time = 0

for i in range(10):
    start_time = time.time()  # 记录开始时间
    response = requests.post(url, json=payload, headers=headers)  # 发送 POST 请求
    end_time = time.time()  # 记录结束时间

    elapsed_time = end_time - start_time  # 计算耗时
    total_time += elapsed_time  # 累加耗时

    if response.status_code == 200:
        result = response.json()
        content = result.get("choices")[0].get("message").get("content")
        print(f"请求 {i + 1} 成功，内容: {content}")
    else:
        print(f"请求 {i + 1} 失败，状态码: {response.status_code}, 响应: {response.text}")

average_time = total_time / 10  # 计算平均时间
print("平均时间: {:.2f} 秒".format(average_time))

平均时间: 3.17 秒