视觉模型

2024 年 2 月 2 日

Vision Models

新的 LLaVA 模型

LLaVA(大型语言和视觉助手)模型集合已更新至 1.6 版本,支持以下功能:

  • 更高图像分辨率:支持高达 4 倍像素,使模型能够理解更多细节。
  • 改进的文本识别和推理能力:在额外的文档、图表和图表数据集上进行了训练。
  • 更宽松的许可证:通过 Apache 2.0 许可证或 LLaMA 2 社区许可证分发。

这些模型提供三种参数大小:7B、13B 和新的 34B 模型。

  • ollama run llava:7b
  • ollama run llava:13b
  • ollama run llava:34b

用法

CLI

Art

要使用 ollama run 运行视觉模型,请使用文件路径引用 .jpg.png 文件。

% ollama run llava "describe this image: ./art.jpg"
The image shows a colorful poster featuring an illustration of a cartoon character with spiky hair. Above the character's head is a crown, suggesting royalty or high status. On the right side of the poster, there is text in French that reads "Jean-Michel Basquiat,"

Python

import ollama

res = ollama.chat(
	model="llava",
	messages=[
		{
			'role': 'user',
			'content': 'Describe this image:',
			'images': ['./art.jpg']
		}
	]
)

print(res['message']['content'])

JavaScript

import ollama from 'ollama'

const res = await ollama.chat({
	model: 'llava',
	messages: [{
		role: 'user',
		content: 'Describe this image:'
		images: ['./art.jpg']
	}]
})

console.log(res.message.content)

注意:在 Ollama 的 PythonJavaScript 库以及 REST API 中,可以将 base64 编码的文件提供给 images 参数。有关向视觉模型提供图像的更多示例,请参阅完整的 API 文档

示例

目标检测

Object detection

% ollama run llava "tell me what do you see in this picture? ./pic.jpg"
In the image, a man wearing blue and white is holding video game controllers in his hands while smiling. His arms are raised above his head as he plays with the virtual reality gaming devices. The man appears to be enjoying himself and immersed in his gaming experience.

文本识别

Word Art

% ollama run llava "what does the text say? ./wordart.png"
This is an image with a capitalized, bolded text of "ollama." The background is black and the letters are yellow, which create a contrasting appearance. Although the words "ollama" make up the majority of the image, it could also be seen as part of a sign or advertisement due to its brightness and prominent font style.

感谢 LLaVA 团队

LLaVA 1.6 模型系列由一个了不起的 团队 发布。有关模型如何训练的信息可在 LLaVA 网站 上找到,以及将 LLaVA 1.6 与领先的开源和专有模型进行比较的基准测试结果。

Until next time