minicpm-v

MiniCPM-V 2.6 是 MiniCPM-V 系列中最新也是最强大的模型。该模型基于 SigLip-400M 和 Qwen2-7B，拥有总计 80 亿个参数。与 MiniCPM-Llama3-V 2.5 相比，它表现出显著的性能提升，并引入了用于多图像和视频理解的新功能。MiniCPM-V 2.6 的主要特点包括

🔥 领先的性能：MiniCPM-V 2.6 在最新版本的 OpenCompass 上取得了平均 65.2 分的成绩，OpenCompass 是对 8 个热门基准进行全面评估的工具。仅用 80 亿个参数，它就超越了 GPT-4o mini、GPT-4V、Gemini 1.5 Pro 和 Claude 3.5 Sonnet 等广泛使用的专有模型，用于单图像理解。
🖼️ 多图像理解和上下文学习。MiniCPM-V 2.6 还可以对多个图像进行对话和推理。它在 Mantis-Eval、BLINK、Mathverse mv 和 Sciverse mv 等流行的多图像基准测试中取得了最先进的性能，并且在上下文学习能力方面也展现出巨大潜力。
💪 强大的 OCR 能力：MiniCPM-V 2.6 可以处理任何纵横比、高达 180 万像素的图像（例如，1344x1344）。它在 OCRBench 上取得了最先进的性能，超越了 GPT-4o、GPT-4V 和 Gemini 1.5 Pro 等专有模型。基于最新的 RLAIF-V 和 VisCPM 技术，它具有可靠的行为，在 Object HalBench 上的幻觉率显著低于 GPT-4o 和 GPT-4V，并支持英语、中文、德语、法语、意大利语、韩语等多种语言。
🚀 卓越的效率：除了友好的尺寸外，MiniCPM-V 2.6 还展现出最先进的 token 密度（即编码到每个视觉 token 的像素数量）。在处理 180 万像素图像时，它仅产生 640 个 token，比大多数模型少 75%。这直接提高了推理速度、首个 token 延迟、内存使用量和功耗。

参考资料

GitHub

Hugging Face

> Note: this model requires [Ollama 0.3.10](https://github.com/ollama/ollama/releases/tag/v0.3.10) or later.

MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:

* **🔥 Leading Performance**: MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet for single image understanding.

* **🖼️ Multi Image Understanding and In-context Learning**. MiniCPM-V 2.6 can also perform conversation and reasoning over multiple images. It achieves state-of-the-art performance on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability.

* **💪 Strong OCR Capability**: MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro. Based on the the latest RLAIF-V and VisCPM techniques, it features trustworthy behaviors, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports multilingual capabilities on English, Chinese, German, French, Italian, Korean, etc.

* **🚀 Superior Efficiency**: In addition to its friendly size, MiniCPM-V 2.6 also shows state-of-the-art token density (i.e., number of pixels encoded into each visual token). It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models. This directly improves the inference speed, first-token latency, memory usage, and power consumption.

## Refrences

[GitHub](https://github.com/OpenBMB/MiniCPM-V)

[Hugging Face](https://hugging-face.cn/openbmb/MiniCPM-V-2_6)

粘贴、拖放或点击以上传图像（.png、.jpeg、.jpg、.svg、.gif）