AnthropicProvider: expose cache_control for prompt caching on system/tools

## Problem

Anthropic's prompt-caching API needs `cache_control: {"type": "ephemeral"}` markers on `system` blocks, `tools` entries, and `messages` to hit the 5-minute cache and get ~90% discount on cached input tokens (plus better TTFT). Today `selectools.providers.anthropic_provider.AnthropicProvider` sends:

```python
# anthropic_provider.py:166-174 (sync complete) and :503-509 (acomplete)
request_args = {
    "model": model_name,
    "system": system_prompt,  # plain string — no cache_control possible
    "messages": payload,
    ...
}
if tools:
    request_args["tools"] = [self._map_tool_to_anthropic(t) for t in tools]

# anthropic_provider.py:474-481
def _map_tool_to_anthropic(self, tool: Tool) -> Dict[str, Any]:
    return {
        "name": schema["name"],
        "description": schema["description"],
        "input_schema": schema["parameters"],
        # no cache_control hook
    }
```

Installed `anthropic` SDK is 0.95.0 — new enough to support caching; the gap is purely in the wrapper.

## Impact for consumers

A real-world profile from Sheriff (WhatsApp finance agent on Haiku 4.5):

- Static system instructions: ~500 tokens
- Category list: ~100-500 tokens (stable within session)
- Tool schemas (40 tools × ~75 tokens): ~3000 tokens
- Per-turn history/user message: ~250-1050 tokens

That's ~3500-4000 cacheable tokens per call. Any user sending a second WhatsApp message within 5 minutes would hit cache and pay 10% of that on input — material at scale on Haiku, and the TTFT win is independent of cost.

## Proposal

Add caching opt-ins to `AnthropicProvider.__init__`:

```python
AnthropicProvider(
    api_key=...,
    default_model=...,
    cache_system: bool = False,   # NEW
    cache_tools: bool = False,    # NEW
)
```

When `cache_system=True`: wrap the system string in block form with a single `cache_control: ephemeral` marker. When `cache_tools=True`: add `cache_control: ephemeral` to the last tool in the outgoing schema list (Anthropic applies the marker to everything up to it).

Surface `response.usage.cache_creation_input_tokens` and `cache_read_input_tokens` on `UsageStats` so consumers can observe hit rate.

## Why flags instead of automatic

Caching has minimum-token requirements (1024 for Sonnet/Opus, 2048 historically for Haiku — check current docs before implementing) and a write surcharge on first-turn. Forcing it on would regress short-prompt users. Opt-in keeps the default behavior unchanged.

## Scope

- `complete`, `acomplete`, and `stream` paths.
- `_format_messages` stays unchanged; this is purely about the top-level `system` and `tools` fields.
- Response parsing stays unchanged except for two new optional fields on `UsageStats`.

## Sheriff context

Filed from the Sheriff project (`~/projects/sheriff`) as a follow-up to the 2026-04-16 agentic-layer investigation. Full writeup: vault note `prompt-caching-investigation.md` in the Clovis vault. Happy to open a PR if the API shape looks right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnthropicProvider: expose cache_control for prompt caching on system/tools #57

Problem

Impact for consumers

Proposal

Why flags instead of automatic

Scope

Sheriff context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

AnthropicProvider: expose cache_control for prompt caching on system/tools #57

Description

Problem

Impact for consumers

Proposal

Why flags instead of automatic

Scope

Sheriff context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions