Problem
Anthropic's prompt-caching API needs cache_control: {"type": "ephemeral"} markers on system blocks, tools entries, and messages to hit the 5-minute cache and get ~90% discount on cached input tokens (plus better TTFT). Today selectools.providers.anthropic_provider.AnthropicProvider sends:
# anthropic_provider.py:166-174 (sync complete) and :503-509 (acomplete)
request_args = {
"model": model_name,
"system": system_prompt, # plain string — no cache_control possible
"messages": payload,
...
}
if tools:
request_args["tools"] = [self._map_tool_to_anthropic(t) for t in tools]
# anthropic_provider.py:474-481
def _map_tool_to_anthropic(self, tool: Tool) -> Dict[str, Any]:
return {
"name": schema["name"],
"description": schema["description"],
"input_schema": schema["parameters"],
# no cache_control hook
}
Installed anthropic SDK is 0.95.0 — new enough to support caching; the gap is purely in the wrapper.
Impact for consumers
A real-world profile from Sheriff (WhatsApp finance agent on Haiku 4.5):
- Static system instructions: ~500 tokens
- Category list: ~100-500 tokens (stable within session)
- Tool schemas (40 tools × ~75 tokens): ~3000 tokens
- Per-turn history/user message: ~250-1050 tokens
That's ~3500-4000 cacheable tokens per call. Any user sending a second WhatsApp message within 5 minutes would hit cache and pay 10% of that on input — material at scale on Haiku, and the TTFT win is independent of cost.
Proposal
Add caching opt-ins to AnthropicProvider.__init__:
AnthropicProvider(
api_key=...,
default_model=...,
cache_system: bool = False, # NEW
cache_tools: bool = False, # NEW
)
When cache_system=True: wrap the system string in block form with a single cache_control: ephemeral marker. When cache_tools=True: add cache_control: ephemeral to the last tool in the outgoing schema list (Anthropic applies the marker to everything up to it).
Surface response.usage.cache_creation_input_tokens and cache_read_input_tokens on UsageStats so consumers can observe hit rate.
Why flags instead of automatic
Caching has minimum-token requirements (1024 for Sonnet/Opus, 2048 historically for Haiku — check current docs before implementing) and a write surcharge on first-turn. Forcing it on would regress short-prompt users. Opt-in keeps the default behavior unchanged.
Scope
complete, acomplete, and stream paths.
_format_messages stays unchanged; this is purely about the top-level system and tools fields.
- Response parsing stays unchanged except for two new optional fields on
UsageStats.
Sheriff context
Filed from the Sheriff project (~/projects/sheriff) as a follow-up to the 2026-04-16 agentic-layer investigation. Full writeup: vault note prompt-caching-investigation.md in the Clovis vault. Happy to open a PR if the API shape looks right.
Problem
Anthropic's prompt-caching API needs
cache_control: {"type": "ephemeral"}markers onsystemblocks,toolsentries, andmessagesto hit the 5-minute cache and get ~90% discount on cached input tokens (plus better TTFT). Todayselectools.providers.anthropic_provider.AnthropicProvidersends:Installed
anthropicSDK is 0.95.0 — new enough to support caching; the gap is purely in the wrapper.Impact for consumers
A real-world profile from Sheriff (WhatsApp finance agent on Haiku 4.5):
That's ~3500-4000 cacheable tokens per call. Any user sending a second WhatsApp message within 5 minutes would hit cache and pay 10% of that on input — material at scale on Haiku, and the TTFT win is independent of cost.
Proposal
Add caching opt-ins to
AnthropicProvider.__init__:When
cache_system=True: wrap the system string in block form with a singlecache_control: ephemeralmarker. Whencache_tools=True: addcache_control: ephemeralto the last tool in the outgoing schema list (Anthropic applies the marker to everything up to it).Surface
response.usage.cache_creation_input_tokensandcache_read_input_tokensonUsageStatsso consumers can observe hit rate.Why flags instead of automatic
Caching has minimum-token requirements (1024 for Sonnet/Opus, 2048 historically for Haiku — check current docs before implementing) and a write surcharge on first-turn. Forcing it on would regress short-prompt users. Opt-in keeps the default behavior unchanged.
Scope
complete,acomplete, andstreampaths._format_messagesstays unchanged; this is purely about the top-levelsystemandtoolsfields.UsageStats.Sheriff context
Filed from the Sheriff project (
~/projects/sheriff) as a follow-up to the 2026-04-16 agentic-layer investigation. Full writeup: vault noteprompt-caching-investigation.mdin the Clovis vault. Happy to open a PR if the API shape looks right.