编辑
2025-04-19
深度学习
00

LLaMA-Factory 源码信息

register_model_group( models={ "InternVL2.5-1B-MPO": { DownloadSource.DEFAULT: "kingsley01/InternVL2_5-1B-MPO-hf", DownloadSource.MODELSCOPE: "llamafactory/InternVL2_5-1B-MPO-hf", }, "InternVL2.5-2B-MPO": { DownloadSource.DEFAULT: "kingsley01/InternVL2_5-2B-MPO-hf", DownloadSource.MODELSCOPE: "llamafactory/InternVL2_5-2B-MPO-hf", }, "InternVL2.5-4B-MPO": { DownloadSource.DEFAULT: "kingsley01/InternVL2_5-4B-MPO-hf", DownloadSource.MODELSCOPE: "llamafactory/InternVL2_5-4B-MPO-hf", }, "InternVL2.5-8B-MPO": { DownloadSource.DEFAULT: "kingsley01/InternVL2_5-8B-MPO-hf", DownloadSource.MODELSCOPE: "llamafactory/InternVL2_5-8B-MPO-hf", }, "InternVL3-1B-hf": { DownloadSource.DEFAULT: "kingsley01/InternVL3-1B-hf", DownloadSource.MODELSCOPE: "llamafactory/InternVL3-1B-hf", }, "InternVL3-2B-hf": { DownloadSource.DEFAULT: "kingsley01/InternVL3-2B-hf", DownloadSource.MODELSCOPE: "llamafactory/InternVL3-2B-hf", }, "InternVL3-8B-hf": { DownloadSource.DEFAULT: "kingsley01/InternVL3-8B-hf", DownloadSource.MODELSCOPE: "llamafactory/InternVL3-8B-hf", }, }, template="intern_vl", multimodal=True, )
编辑
2025-04-18
深度学习
00

InternVL 1技术深度分析

1. 引言

InternVL(Internal Vision-Language model)是一个开源的多模态大型模型项目,由上海人工智能实验室(OpenGVLab)开发。InternVL 1是该项目的第一个主要版本,它通过创新的视觉-语言融合方法,实现了强大的图像理解和多模态对话能力。本文将深入分析InternVL 1的技术架构、关键特性和创新点,以提供对该模型的全面了解。

编辑
2025-04-18
深度学习
00

该文章已加密,点击 阅读全文 并输入密码后方可查看。

编辑
2025-04-18
深度学习
00

Spatial Layout Projector (SLP)

  1. InternVL采用了一种称为"Spatial Layout Projector (SLP)"的方法,将四维的空间坐标[x1,y1,x2,y2](一个bounding box)转换为单个token嵌入:

    "A key innovation in LayTextLLM is the Spatial Layout Projector (SLP), which transforms a spatial layout into a singular bounding box token. This enhancement enables the model to process both spatial layouts and textual inputs simultaneously. To be specifically, each OCR-derived spatial layout is represented by a bounding box defined by four-dimensional coordinates [x1,y1,x2,y2]..."

  2. 这种方法确实将每个边界框(box)表示为一个token,不同于之前的"coordinate-as-tokens"方案(这种方案会将坐标转换为多个token):

    "Compared to the coordinate-as-tokens scheme, the SLP represents each bounding box with a single token. This approach significantly reduces the number of input tokens..."

  3. 这种单token表示法的计算方式是通过将坐标映射到高维空间来实现的:

    "The process can be computed as z=W⋅c+b, where c∈ℝ^4 is the vector of the bounding box coordinates. W∈ℝ^(d×4) is a weight matrix with d represents the dimension of the embedding, b∈ℝ^(d×1) is a bias vector, z is the resulting bounding box token represented as an d-dimensional embedding."

编辑
2025-04-17
单片机
00

这里有一个很好的例子:https://thingsboard.io/use-cases/fleet-tracking/

thingsboard架构

image.png

这篇文章想完成这些事情:

  1. 在 thingsboard 新建GPS设备。
  2. 在客户端,使用Python模拟为GPS设备,往thingsboard 发送GPS数据(经纬度)。
  3. 在 thingsboard 仪表盘展示设备的GPS位置轨迹。
  4. 在 thingsboard 定义虚拟边界使用地理围栏,设置区域。当设备进入或离开地理围栏时触发操作,例如发送短信警告、发出警报或启动工作流。
  5. 学习如何取得 thingsboard 的设备数据。
  6. 学习配置 thingsboard 数据转发,将数据存入自己的数据库。