Skip to content

YiLabsAI/HouYiAgent

Repository files navigation

HouYi Logo

Lightweight Β· Extensible Β· Production-Grade Β· Context-Engineered Multi-Agent Framework

with neuro-symbolic verification

License: MIT Tests Coverage Python Versions
Twitter Follow


Overview

HouYi is a lightweight, extensible, production-grade multi-agent framework that ships with SOTA built-in agents (Deep Research, Chatbox, Memory Inbox). One Agent class, one SDK β€” define, orchestrate, evaluate, and ship agents from prototype to production without changing your API surface.

Why HouYi

  • Full-lifecycle harness β€” Not just execution: definition β†’ orchestration β†’ context engineering β†’ evaluation β†’ observability β†’ governance. Every layer is pluggable, every extension point is documented for community and enterprise customization.
  • Context engineering as first-class β€” Token budgeting, persistent memory with emphasis-aware recall, RAG, context compression, and Reminders injection at the Transformer attention sweet spot β€” built into the SDK, not afterthoughts.
  • Neuro-symbolic verification β€” Z3 SMT solver validates LLM outputs against business constraints, separating probabilistic reasoning from deterministic correctness for production reliability.
  • Ships with SOTA agents β€” Deep Research (plan β†’ multi-round search β†’ conflict resolution β†’ citation-verified report with RACE/FACT scoring), Chatbox (multi-turn with tool calling and memory), Memory Inbox (LLM-powered extraction with review workflow). Use them directly or study their source as reference implementations.

πŸ— Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     HouYi Studio (Ideas Foundry)                        β”‚
β”‚   Graph Orchestration  Β· Chatbox Β· Agent Hub Β· Deep Research            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                   Studio Server (FastAPI + SSE)                         β”‚
β”‚   Chat API Β· Research API Β· Memory API Β· Knowledge API                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                         HouYi SDK (Core)                                β”‚
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Agent   β”‚  β”‚  AgentTeam   β”‚  β”‚  Team    β”‚  β”‚  DAG               β”‚   β”‚
β”‚  β”‚  Runner  β”‚  β”‚  Manager     β”‚  β”‚  Task    β”‚  β”‚  Engine            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”                                             β”‚
β”‚            β”‚ Orchestrator β”‚  Delegate Β· Autonomous Β· DAG                β”‚
β”‚            β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚         Context Engineering Layer                 β˜… Pluggable   β”‚    β”‚
β”‚  β”‚  Token Budget Β· Tools Β· Memory Β· RAG Β· State Checkpoints        β”‚    β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚
β”‚  β”‚         Capabilities Layer                        β˜… Pluggable   β”‚    β”‚
β”‚  β”‚  SimpleSkill Β· Web Search Β· Shell Exec Β· A2A Β· Self-Evolver     β”‚    β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚
β”‚  β”‚         Quality & Governance Layer                β˜… Pluggable   β”‚    β”‚
β”‚  β”‚  Evaluators Β· Z3 Verification Β· Sandbox Β· Cost Control          β”‚    β”‚
β”‚  β”‚  OTEL Tracing Β· Error Policy Β· Conflict Resolution              β”‚    β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚
β”‚  β”‚         Adapters Layer                            β˜… Pluggable   β”‚    β”‚
β”‚  β”‚  OpenAI Β· Anthropic Β· Gemini Β· more...                          β”‚    β”‚
β”‚  β”‚  Memory Store Β· Embedding Provider Β· Persistence Backend        β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Extension Points

HouYi is designed for community contribution and enterprise customization. Every β˜… Pluggable layer exposes well-defined extension points:

Extension Point                  Protocol / Base Class          Implementations
─────────────────────────────────────────────────────────────────────────────
LLM Adapter                      LLMAdapter                     OpenAI, Anthropic, Gemini, Ollama, vLLM
Memory Backend                   MemoryStore                    SQLite, Redis, QMD
Embedding Provider               EmbeddingProvider              FastEmbed, OpenAI, HuggingFace
Search Provider                  WebSearchService               Bocha, DuckDuckGo, Tavily, Serper
Skill / Tool                     @tool / SkillSpec              Any Python function β†’ auto-schema
Context Source                   ContextSource                  RAG, Memory, MCP server, custom retriever
Evaluator                        Evaluator                      19+ built-in evaluators, extensible strategy Pattern 
Observability Exporter           OTEL SpanExporter              Jaeger, Zipkin, Datadog, Prometheus
Message Bus Backend              AgentMessageBus                In-process queue, NATS, Kafka, RocketMQ
Orchestration Mode               AgentOrchestrator              Delegate, Autonomous, DAG, custom
Error / Conflict Policy          ErrorPolicy / ConflictResolver Retry, fallback, source voting, LLM arbiter
Verification Backend             Z3 Solver                      SMT constraints, custom verifier
State / Persistence              StateStore                     SQLite, filesystem, Redis

✨ Key Features

Category Feature Highlight
Orchestration Lightweight Pydantic Core Declarative agents, tasks, and workflows as Python classes with automatic validation and serialization β€” "code as configuration"
Unified Multi-Agent Engine Same Agent class, same SDK: tool-loop, mode="delegate" (supervisor dispatches sub-agents), mode="autonomous" (shared state + message bus). Scale from a single chatbot to a multi-agent research team without API fragmentation
Async DAG Execution Built on asyncio with DAG-based task orchestration β€” parallel execution, dynamic graph evolution, and non-blocking I/O for high-concurrency scenarios
Context Context Engineering Pipeline Dynamic token budgeting, RAG integration, persistent Memory with hybrid retrieval (full-text + embedding), emphasis-aware recall that prioritizes user-stressed instructions, and context compression with Reminders injection at the Transformer attention sweet spot
SimpleSkill Specification Cross-platform skill model with built-in governance, evaluation hooks, and host-portable capability negotiation. Any Python function becomes a governed, evaluable capability unit
Quality Neuro-Symbolic Verification Z3 SMT solver formally verifies LLM outputs against business constraints, separating probabilistic reasoning from deterministic correctness for production reliability
Extensible Evaluator Framework 19+ evaluators across 4 categories β€” Quality (accuracy, completeness, relevance, coherence, factuality), Safety (toxicity, bias, hallucination), RAG (groundedness, faithfulness, context precision/recall), Performance (cost, latency). Add custom evaluators via Evaluator base class
Cost-Aware Governance Token budget control with dynamic model routing enables automatic cost optimization while maintaining quality through intelligent provider fallback
Infrastructure A2A Pub/Sub Protocol Native Agent-to-Agent messaging (P2P, Pub/Sub, Broadcast) aligned with the A2A Pub/Sub draft. Pluggable transport: in-process queues for dev, NATS/Kafka/RocketMQ for distributed production
Zero-Config Observability OpenTelemetry auto-instruments every agent execution with distributed tracing across LLM calls, tool invocations, and state transitions β€” <3% overhead, no manual setup
Persistent State & Workflows Automatic execution snapshots support pause/resume, external event handling, and human-in-the-loop workflows β€” agents wait for async callbacks and resume exactly where they left off
Secure Sandbox Execution Isolated execution environment with permission controls prevents LLM-generated code from accessing unauthorized resources, ensuring enterprise-grade security

πŸ“¦ Installation

git clone https://github.com/YiLabsAI/HouYiAgent.git
cd HouYiAgent
uv sync --extra dev

Launch HouYi Studio

HouYi Studio is a full-featured web IDE with Chatbox, Agent Hub, Deep Research, and Memory Inbox. Start it locally with one command:

cp .env.example .env   # configure your LLM and search API keys
./scripts/dev.sh        # launches backend (FastAPI) + frontend (Vite) via tmux

Open http://localhost:3000 to access the Studio.

πŸš€ Quick Start

Simple Agent

from houyi import Agent, tool
from houyi.llm import OpenAIAdapter

@tool
def search(query: str) -> list[str]:
    """Search the web for information."""
    return [f"Result for {query}"]

agent = Agent(
    role="Researcher",
    skills=[search],
    llm=OpenAIAdapter(model="gpt-4o-mini"),
)

result = agent.run("What is HouYi?")

Multi-Agent Team

from houyi import Agent, Task, Team

researcher = Agent(role="Researcher", skills=[search], llm=llm)
analyst = Agent(role="Analyst", skills=[analyze], llm=llm)

team = Team(
    agents=[researcher, analyst],
    tasks=[
        Task("Research AI trends", agent=researcher),
        Task("Analyze findings", agent=analyst, context=[0]),
    ],
)
result = team.run()

Sub-Agent Delegation (Supervisor Pattern)

from houyi import Agent, AgentTeamConfig

supervisor = Agent(
    role="Research Supervisor",
    llm=llm,
    tools=[web_search],
    sub_agents=[
        AgentTeamConfig(role="Searcher", skills=["web_search"]),
        AgentTeamConfig(role="Analyst", skills=["code_execute"]),
    ],
    mode="delegate",
)

result = supervisor.run("Deep research on AI agent architectures")

Memory β€” Persistent Context Across Sessions

from houyi.adapters.memory.engine import MemoryEngine
from houyi.adapters.memory.store import MemoryStore

store = MemoryStore(data_dir="./memory_data")
engine = MemoryEngine(store)

await engine.add("User prefers Python over JavaScript", tags=["preference"])
memories = await engine.recall("programming language preference?", top_k=5)
context = await engine.build_context("coding question", max_tokens=500)

Context Engineering β€” Reminders Injection

from houyi.application.context.reminders import ReminderInjector, CITATION_REMINDER

injector = ReminderInjector([CITATION_REMINDER])
messages = injector.inject(conversation_messages)
# Critical instructions injected at context tail β€” Transformer attention sweet spot

Evaluation

from houyi import evaluate

results = evaluate(
    agent=agent,
    test_cases=[{"input": "What is AI?", "expected_output": "..."}],
    evaluators=["accuracy", "completeness", "relevance"],
)
print(results.summary())

πŸ€– Built-in Agents

HouYi ships with production-ready agent applications built on top of the SDK:

Agent Description
Deep Research Automated research: plan decomposition β†’ multi-round web search β†’ source aggregation β†’ intermediate analysis β†’ conflict resolution β†’ citation-verified report with RACE/FACT quality scoring
Chatbox Multi-turn conversational AI with streaming, tool calling, memory integration, and full context engineering pipeline
Memory Inbox LLM-powered memory extraction from conversations with human-in-the-loop review/approve/reject workflow

Each is a production-grade application that exercises every layer of the SDK. Study their source as reference implementations for building your own agents.

πŸ“š Documentation

Guide Description
Getting Started Installation, quick start, core concepts
API Reference Complete API documentation
Advanced Features Observability, multi-LLM, DAG execution, context engineering
Evaluation Evaluator framework and all built-in evaluators
Development Guide Coding standards and engineering practices
Examples Runnable code examples

🀝 Contributing

We welcome contributions! See our Contributing Guide.

πŸ› Standards & Acknowledgments

HouYi is built on and contributes to open standards:

Standard Role in HouYi
OpenTelemetry Zero-config distributed tracing across LLM calls, tools, and agent state transitions
SimpleSkill HouYi's native skill specification β€” cross-platform, governable, evaluable capability units (originated from this project)
MCP Model Context Protocol integration for external context sources
A2A Agent-to-Agent protocol with native Pub/Sub messaging for distributed multi-agent communication

About

A lightweight, production-ready multi-agent framework

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors