你提供的数据格式:
json展开代码{
"image": "000443568.jpg",
"width": 700,
"height": 700,
"conversations": [
{
"from": "human",
"value": "Write a detailed description of this image..."
},
{
"from": "gpt",
"value": "The image is of a pair of silver earrings..."
}
]
}
问题分析:
<image>
标记:在 conversations
中没有 <image>
占位符image
字段是单个字符串,但需要与对话中的 <image>
标记对应width
和 height
字段在 LLaMA-Factory 中不会被使用根据代码分析,正确的格式应该是:
json展开代码{
"conversations": [
{
"from": "human",
"value": "<image>Write a detailed description of this image, do not forget about the texts on it if they exist. Also, do not forget to mention the type / style of the image. No bullet points. When writing descriptions, prioritize clarity and direct observation over embellishment or interpretation.\nDon't forget these rules:\n1. **Be Direct and Concise**: Provide straightforward descriptions without adding interpretative or speculative elements.\n2. **Use Segmented Details**: Break down details about different elements of an image into distinct sentences, focusing on one aspect at a time.\n3. **Maintain a Descriptive Focus**: Prioritize purely visible elements of the image, avoiding conclusions or inferences.\n4. **Follow a Logical Structure**: Begin with the central figure or subject and expand outward, detailing its appearance before addressing the surrounding setting.\n5. **Avoid Juxtaposition**: Do not use comparison or contrast language; keep the description purely factual.\n6. **Incorporate Specificity**: Mention age, gender, race, and specific brands or notable features when present, and clearly identify the medium if it's discernible."
},
{
"from": "gpt",
"value": "The image is of a pair of silver earrings. Each earring features a rectangular silver frame with an oval purple gemstone in the center. The frame has an ornate design with a sunburst pattern at the top and a spike-like protrusion at the bottom. The earrings are made of silver and have a hook-style backing."
}
],
"images": [
"000443568.jpg"
]
}
json展开代码{
"messages": [
{
"role": "user",
"content": "<image>Write a detailed description of this image..."
},
{
"role": "assistant",
"content": "The image is of a pair of silver earrings..."
}
],
"images": [
"000443568.jpg"
]
}
json展开代码{
"my_vlm_dataset": {
"file_name": "data.json",
"formatting": "sharegpt",
"columns": {
"messages": "conversations",
"images": "images"
}
}
}
json展开代码{
"my_vlm_dataset": {
"file_name": "data.json",
"formatting": "sharegpt",
"columns": {
"messages": "messages",
"images": "images"
},
"tags": {
"role_tag": "role",
"content_tag": "content",
"user_tag": "user",
"assistant_tag": "assistant"
}
}
}
<image>
标记:在对话文本中必须包含 <image>
占位符<image>
标记的数量必须与 images
数组中的图片数量严格一致<image>
可以出现在文本的任何位置media_dir
的路径s3://
、gs://
、gcs://
格式从 src/llamafactory/data/converter.py
的 SharegptDatasetConverter
类可以看出:
python展开代码def __call__(self, example: dict[str, Any]) -> dict[str, Any]:
# 处理对话消息
messages = example[self.dataset_attr.messages]
# 处理图片路径
"_images": self._find_medias(example[self.dataset_attr.images]) if self.dataset_attr.images else None,
json展开代码[
{
"conversations": [
{
"from": "human",
"value": "<image>Describe this image in detail."
},
{
"from": "gpt",
"value": "The image shows a beautiful landscape..."
}
],
"images": [
"/path/to/image1.jpg"
]
},
{
"conversations": [
{
"from": "human",
"value": "<image><image>Compare these two images."
},
{
"from": "gpt",
"value": "Both images show different aspects..."
}
],
"images": [
"/path/to/image2.jpg",
"/path/to/image3.jpg"
]
}
]
bash展开代码python src/train_bash.py \ --stage sft \ --model_name_or_path /path/to/model \ --dataset my_vlm_dataset \ --dataset_dir /path/to/data \ --media_dir /path/to/images \ --template qwen2_vl \ --output_dir /path/to/output \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3 \ --max_samples 1000 \ --cutoff_len 2048 \ --dataloader_num_workers 8
你的数据格式需要做以下修改:
<image>
标记到对话文本中image
字段改为 images
数组width
和 height
字段<image>
标记数量一致这样修改后,数据就符合 LLaMA-Factory 的 ShareGPT + VLM 训练格式要求了。
本文作者:Dong
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC。本作品采用《知识共享署名-非商业性使用 4.0 国际许可协议》进行许可。您可以在非商业用途下自由转载和修改,但必须注明出处并提供原作者链接。 许可协议。转载请注明出处!