llama3.2-vision

Llama 3.2-Vision 多模态大型语言模型 (LLM) 集合是一个包含 11B 和 90B 大小的指令微调图像推理生成模型的集合（文本 + 图像输入/文本输出）。 Llama 3.2-Vision 指令微调模型针对视觉识别、图像推理、图像描述和回答有关图像的一般问题进行了优化。这些模型在常见的行业基准测试中优于许多可用的开源和闭源多模态模型。

支持的语言：对于仅文本任务，官方支持英语、德语、法语、意大利语、葡萄牙语、印地语、西班牙语和泰语。 Llama 3.2 接受过比这 8 种支持的语言更广泛的语言集合的训练。请注意，对于图像 + 文本应用程序，仅支持英语。

用法

首先，拉取模型

ollama pull llama3.2-vision

Python 库

要将 Llama 3.2 Vision 与 Ollama Python 库一起使用

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)

JavaScript 库

要将 Llama 3.2 Vision 与 Ollama JavaScript 库一起使用

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)

cURL

curl https://127.0.0.1:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'

参考文献

GitHub

HuggingFace

The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.

Supported Languages: For text only tasks, English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Note for image+text applications, English is the only language supported.

## Usage

First, pull the model:

```bash
ollama pull llama3.2-vision
```

### Python Library

To use Llama 3.2 Vision with the Ollama [Python library](https://github.com/ollama/ollama-python):

```python
import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)
```

### JavaScript Library

To use Llama 3.2 Vision with the Ollama [JavaScript library](https://github.com/ollama/ollama-js):

```javascript
import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)
```

### cURL

```shell
curl https://127.0.0.1:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'
```

## References

[GitHub](https://github.com/meta-llama/llama-models)

[HuggingFace](https://hugging-face.cn/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf)

粘贴、拖放或单击以上传图像 (.png, .jpeg, .jpg, .svg, .gif)