Minions:本地和云 LLM 的结合

2025年2月25日

Minions

Avanika NarayanDan BidermanSabri Eyuboglu,来自 Christopher Ré 在斯坦福的 Hazy Research 实验室,与 Avner MayScott LindermanJames Zou 一起,开发了一种方法,通过让小型设备端模型(例如带有 Ollama 的 Llama 3.2)与云中较大的模型(例如 GPT-4o)协作,将 LLM 的大量工作负载转移到消费级设备上。

这篇新的论文以及配套的开源代码旨在通过两种协议配置,在最小或不降低质量的情况下,降低云成本

  • Minion:云模型与单个可以访问数据的本地模型自由聊天,直到两者达成解决方案
    • 远程成本降低 30.4 倍,同时保持 87% 的云模型性能
  • MinionS:云模型将任务分解为位大小的子任务,以便在上下文块上执行。 小型 LLM 并行解决这些任务
    • 远程成本降低 5.7 倍,同时保持 97.9% 的云模型性能

开始使用

克隆存储库

git clone https://github.com/HazyResearch/minions.git 
cd minions

(可选)使用您喜欢的包管理器(例如 condavenvuv 等)创建一个虚拟环境。

python3 -m venv .venv
source .venv/bin/activate

接下来,安装 Python 包和依赖项

pip install -e .

如果您还没有安装,请安装 Ollama 和 Meta 的 Llama 3.2 模型

ollama pull llama3.2

最后,为云模型创建一个 OpenAI API 密钥

运行演示应用程序

提供的 streamlit 应用程序运行 Minion 和 MinionS 协议的交互式演示。 要启动它,请运行

streamlit run app.py

将打开一个浏览器窗口,其中包含有关输入 OpenAI API 密钥、选择本地模型以及运行 Minion 或 MinionS 的说明

Minions Screenshot

示例代码

要使用 Python 以编程方式运行 Minion 或 MinionS,可以使用 minions 包。

Minion

首先创建一个名为 example.py 的文件并添加以下内容

from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minion import Minion

local_client = OllamaClient(
    model_name="llama3.2",
)
    
remote_client = OpenAIClient(
    model_name="gpt-4o",
)

# Instantiate the Minion object with both clients
minion = Minion(local_client, remote_client)

context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""

task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."

# Execute the minion protocol for up to two communication rounds
output = minion(
    task=task,
    context=[context],
    max_rounds=2
)

print(output["final_answer"])

然后运行该示例

python example.py

MinionS

稍作修改,即可使用相同的代码运行 MinionS 协议

from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minions import Minions
from pydantic import BaseModel

class StructuredLocalOutput(BaseModel):
    explanation: str
    citation: str | None
    answer: str | None

local_client = OllamaClient(
    model_name="llama3.2",
    temperature=0.0,
    structured_output_schema=StructuredLocalOutput
)

remote_client = OpenAIClient(
    model_name="gpt-4o",
)


# Instantiate the Minion object with both clients
minions = Minions(local_client, remote_client)

context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""

task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."

# Execute the minion protocol for up to two communication rounds
output = minions(
    task=task,
    doc_metadata="Medical Report",
    context=[context],
    max_rounds=2
)

print(output["final_answer"])

进行修改后,重新运行该示例

python example.py

阅读更多