Minions:本地和云 LLM 的结合
2025年2月25日
Avanika Narayan、Dan Biderman 和 Sabri Eyuboglu,来自 Christopher Ré 在斯坦福的 Hazy Research 实验室,与 Avner May、Scott Linderman、James Zou 一起,开发了一种方法,通过让小型设备端模型(例如带有 Ollama 的 Llama 3.2)与云中较大的模型(例如 GPT-4o)协作,将 LLM 的大量工作负载转移到消费级设备上。
这篇新的论文以及配套的开源代码旨在通过两种协议配置,在最小或不降低质量的情况下,降低云成本
- Minion:云模型与单个可以访问数据的本地模型自由聊天,直到两者达成解决方案
- 远程成本降低 30.4 倍,同时保持 87% 的云模型性能
- MinionS:云模型将任务分解为位大小的子任务,以便在上下文块上执行。 小型 LLM 并行解决这些任务
- 远程成本降低 5.7 倍,同时保持 97.9% 的云模型性能
开始使用
克隆存储库
git clone https://github.com/HazyResearch/minions.git
cd minions
(可选)使用您喜欢的包管理器(例如 conda
、venv
、uv
等)创建一个虚拟环境。
python3 -m venv .venv
source .venv/bin/activate
接下来,安装 Python 包和依赖项
pip install -e .
如果您还没有安装,请安装 Ollama 和 Meta 的 Llama 3.2 模型
ollama pull llama3.2
最后,为云模型创建一个 OpenAI API 密钥。
运行演示应用程序
提供的 streamlit 应用程序运行 Minion 和 MinionS 协议的交互式演示。 要启动它,请运行
streamlit run app.py
将打开一个浏览器窗口,其中包含有关输入 OpenAI API 密钥、选择本地模型以及运行 Minion 或 MinionS 的说明
示例代码
要使用 Python 以编程方式运行 Minion 或 MinionS,可以使用 minions
包。
Minion
首先创建一个名为 example.py
的文件并添加以下内容
from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minion import Minion
local_client = OllamaClient(
model_name="llama3.2",
)
remote_client = OpenAIClient(
model_name="gpt-4o",
)
# Instantiate the Minion object with both clients
minion = Minion(local_client, remote_client)
context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""
task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."
# Execute the minion protocol for up to two communication rounds
output = minion(
task=task,
context=[context],
max_rounds=2
)
print(output["final_answer"])
然后运行该示例
python example.py
MinionS
稍作修改,即可使用相同的代码运行 MinionS 协议
from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minions import Minions
from pydantic import BaseModel
class StructuredLocalOutput(BaseModel):
explanation: str
citation: str | None
answer: str | None
local_client = OllamaClient(
model_name="llama3.2",
temperature=0.0,
structured_output_schema=StructuredLocalOutput
)
remote_client = OpenAIClient(
model_name="gpt-4o",
)
# Instantiate the Minion object with both clients
minions = Minions(local_client, remote_client)
context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""
task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."
# Execute the minion protocol for up to two communication rounds
output = minions(
task=task,
doc_metadata="Medical Report",
context=[context],
max_rounds=2
)
print(output["final_answer"])
进行修改后,重新运行该示例
python example.py