Integrations Overview

DeepEval integrates with the frameworks, model providers, and data stores teams already use to build LLM applications. Use these pages to connect tracing, evaluation, synthetic data, and model configuration to your existing stack.

Frameworks

Framework integrations let DeepEval evaluate entire execution traces without manually orchestrating every intermediate step. Use these when you want traces, spans, and component-level evals to line up with the framework your agents, chains, tools, and workflows already run on.

LangChain

Trace and evaluate LangChain chains, tools, and agents.

Pydantic AI

Trace Pydantic AI agents and evaluate their outputs.

OpenAI Agents

Evaluate workflows built with the OpenAI Agents SDK.

LangGraph

Trace and evaluate graph-based agent workflows.

AgentCore

Instrument AWS AgentCore agents with OpenTelemetry traces.

Strands

Instrument Strands Agents SDK apps with OpenTelemetry traces.

Google ADK

Trace Google ADK agents through OpenTelemetry and OpenInference.

LlamaIndex

Instrument LlamaIndex retrieval and agent pipelines.

CrewAI

Trace CrewAI crews, agents, tasks, and tool calls.

OpenAI

Trace OpenAI SDK calls and evaluate OpenAI-powered apps.

Anthropic

Trace Anthropic model calls inside DeepEval workflows.

Evaluation Models

Evaluation model integrations configure the LLM provider DeepEval uses for LLM-as-a-judge metrics, synthetic data generation, conversation simulation, and prompt optimization. Pick the provider that matches your infrastructure, latency, privacy, and cost needs.

OpenAI

Azure OpenAI

Ollama

OpenRouter

Anthropic

Amazon Bedrock

Gemini

DeepSeek

Vertex AI

Grok

Moonshot

Portkey

vLLM

LM Studio

LiteLLM

Vector DBs

Vector database integrations show how to connect retrieval systems to DeepEval so RAG metrics can evaluate the context your application actually retrieves. Use these examples to benchmark retrieval quality and end-to-end RAG behavior.