qwq:32b-preview-q8_0 - Ollama 框架

qwq

QwQ 是一个实验性研究模型，专注于提升人工智能推理能力。

工具 32b

153.9K 下载量更新于 2 个月前

更新于 2 个月前

2 个月前

9c62a2e770b7 · 35 GB

{ "stop": [ "<|im_start|>", "<|im_end|>" ] }

你是一个乐于助人且无害的助手。你是由阿里巴巴开发的 Qwen。你应该逐步思考

{{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{- end }} {{- if .Too

Apache 许可证 2.0 版，2004 年 1 月

自述文件

QwQ 是一个由 Qwen 团队开发的 320 亿参数实验性研究模型，专注于提升人工智能推理能力。

QwQ 在这些基准测试中展现了卓越的性能

GPQA 上达到 65.2%，展示了其研究生水平的科学推理能力
AIME 上达到 50.0%，突显了其强大的数学问题解决能力
MATH-500 上达到 90.6%，展示了在不同主题中出色的数学理解能力
LiveCodeBench 上达到 50.0%，验证了其在真实场景中强大的编程能力。

这些结果强调了 QwQ 在分析和问题解决能力方面的显著进步，尤其是在需要深入推理的技术领域。

作为一个预览版本，它展示了有希望的分析能力，但也存在一些重要的局限性

语言混合和代码切换：模型可能会混合语言或意外地在语言之间切换，从而影响响应的清晰度。
递归推理循环：模型可能会进入循环推理模式，导致冗长的响应而没有结论性的答案。
安全和伦理考量：该模型需要加强安全措施以确保可靠和安全的性能，用户在部署时应谨慎。
性能和基准测试局限性：该模型在数学和编程方面表现出色，但在其他领域仍有改进空间，例如常识推理和细致的语言理解。

QwQ is a 32B parameter experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities.

![image.png](/assets/mchiang0610/mikey3.1/e6d2ac3a-0d55-4e8b-9f53-ee5e269ed521)

![image.png](/assets/mchiang0610/mikey3.1/b56aaf87-c5bf-4249-be99-28930845e48e)

QwQ demonstrates remarkable performance across these benchmarks:

- **65.2% on GPQA**, showcasing its graduate-level scientific reasoning capabilities 
- **50.0% on AIME**, highlighting its strong mathematical problem-solving skills
- **90.6% on MATH-500**, demonstrating exceptional mathematical comprehension across diverse topics
- **50.0% on LiveCodeBench**, validating its robust programming abilities in real-world scenarios.

These results underscore QwQ’s significant advancement in analytical and problem-solving capabilities, particularly in technical domains requiring deep reasoning.

As a preview release, it demonstrates promising analytical abilities while having several important limitations:

1. **Language Mixing and Code-Switching:** The model may mix languages or switch between them unexpectedly, affecting response clarity.

2. **Recursive Reasoning Loops:** The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.

3. **Safety and Ethical Considerations:** The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it.

4. **Performance and Benchmark Limitations:** The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

粘贴、拖放或点击上传图片 (.png, .jpeg, .jpg, .svg, .gif)