Proponent
Reads the artifact, extracts atomic claims, produces the optimistic reading. The model that would normally ship the answer unchallenged.
Adversarial review engine · live
The adversary that ships your code..
ipcha mistabra — Aramaic, from the Talmudic gemara: ‘the opposite is plausible.’ A structured three-agent protocol that treats adversarial challenge as an architectural constraint, not a prompt-engineering trick. Python sidecar on port 8100, DeBERTa-NLI microservice on 8200, cross-provider fan-out, authority-grounded findings. Apache 2.0.
Hallucination floor
0.15 %
Agents in pipeline
3
NLI score separation
0.923
Adversarial lenses
3
Provider fan-out
2–5
Licence
Apache 2.0
ipcha mistabra · the opposite is plausible
The gemara’s rhetorical move — ipcha mistabra, “perhaps the opposite” — forces the strongest counter-reading before a ruling stands. Here it becomes a pipeline: proponent writes the claim, ipcha agent writes the strongest objection, auditor arbitrates. No step is optional. No role may be skipped. The prompt itself forbids contrarian noise without substance — every finding must cite authority or formal logic.
Reads the artifact, extracts atomic claims, produces the optimistic reading. The model that would normally ship the answer unchallenged.
Assumes the claim is false until proven otherwise. Grounds every objection in an authority document or formal argument. "Contrarian noise without substance" is explicitly forbidden by the prompt.
Merges proponent and ipcha outputs. Scores which findings survive pressure. Emits a structured verdict — { surviving, needs_hardening, rejected, overall_score } — not a narrative.
Lens diversity — not just provider diversity
Provider-diversity alone converges; a hostile prompt is a hostile prompt, whoever runs it. Ipcha routes each fan-out branch through a distinct adversarial lens so the disagreements are structural, not stylistic.
Security & Attack Surface
Threat model, auth boundaries, secret handling, privilege escalation paths, supply-chain hygiene.
Scalability & Operational
Hot paths, allocation, unbounded queues, coordination points, failure-mode propagation under load.
Organizational & Process
Bus factor, implicit hand-offs, review scarcity, silent dependency on a single maintainer's mental model.
5 % × 15 % × 20 %
Every vendor that quotes a hallucination rate owes you the derivation. Here is ours, in one table. It is a model, not a measurement — the point is that you can argue with each row. Metis set this flag: the 0.15 % figure is load-bearing, so it is derived, not asserted.
| Step | Figure | Why that number |
|---|---|---|
| Base LLM error | 5 % | Representative uncorrected hallucination rate on factoid-grounded generation (public benchmark range). |
| × Survives ipcha-contradict | 15 % | Adversarial pass catches ~85 % when grounded in authority documents (Fagan-style inspection carries over to LLM review). |
| × Survives ensemble vote | 20 % | Cross-provider fan-out (anthropic ⊕ openai ⊕ …) catches ~80 % of what remains. Diverse pre-training, diverse blind spots. |
| = Residual hallucination floor | 0.15 % | 0.05 × 0.15 × 0.20. A floor, not a ceiling. On out-of-distribution or authority-poor claims it will be higher. |
0.05 × 0.15 × 0.20 = 0.0015 = 0.15 %. The filters are only conditionally independent if the authority corpus is non-degenerate and the providers are genuinely diverse. Violate either and the floor lifts. The paper reports ISce score separation of 0.923 with Cohen’s d of −0.80 — that is the evidence for the middle term, not the term itself.
Before the merge, not after the incident
Ipcha is a verification framework, not a platform. It does not require nyxCore — the sidecar and NLI service ship as two Docker containers and speak REST. It does, however, compose with the rest of the stack.
Wire the sidecar into CI. Every architecture decision, every LLM-drafted PR description, every release note gets contradicted before humans sign.
The Refactor Plan and Code X-Ray modes fan out through Ipcha before they fan in. Convergence across adversarial personas is the signal — unanimous alarm wins over any single model's confidence.
/dashboard/ipcha auto-generates a five-step workflow: Prepare → Adversarial (provider fan-out, min 2, max 5) → Synthesis → Arbitration (Cael) → Results. Insights are written back with insightScope: ethic.
The ipcha-mistabra repo contains a backcheck/ directory. The protocol is run against its own paper. The gate rule must block on any unrefuted high-severity finding — and has.
Opposite, plausibly
Two calls hit the sidecar: /score for IS over a proponent–ipcha pair, /arbitrate for the Cael synthesis. The orchestration is whatever runs your workflows. Nothing below is framework-specific.
# 1. Bring the sidecar up (port 8100) + the NLI service (port 8200) docker run -p 8200:8200 ipcha-nli uvicorn ipcha.api:app --host 0.0.0.0 --port 8100 # 2. The inversion loop — pseudocode, stays under 40 lines on purpose def ipcha(artifact, authority_docs, providers): claims = extract_claims(artifact) # Proponent proponent = proponent_review(artifact, claims) # Fan-out across providers × adversarial lenses (min 2, max 5) ipcha_outputs = asyncio.gather(*[ contradict(artifact, claims, lens=L, model=M, authority=authority_docs) for M in providers for L in ("security", "scalability", "organizational") ]) # Promise.allSettled semantics findings = merge_and_resolve(proponent, ipcha_outputs) verdict = cael_arbitrate(findings) # { surviving, needs_hardening, rejected, overall_score } # Gate rule: block iff any unrefuted finding has severity >= high if any(f.severity in {"critical", "high"} and f.status != "refuted" for f in findings): return BLOCK, verdict return PASS, verdict # 3. Score a single proponent–ipcha pair directly against the sidecar curl -X POST http://localhost:8100/score \ -H "content-type: application/json" \ -d '{"claim": "The new auth layer is rate-limited.", "evidence": [ {"text": "Middleware applies token bucket per user.", "type": "SUPPORTING"}, {"text": "Admin bypass path skips middleware.", "type": "CONTRADICTING"} ]}' # => { "score": 0.71, "score_tfidf": 0.34, "scorer": "nli", ... }
The NLI scorer (ISce) falls back to TF-IDF (ISw) on service failure — the paper reports 0.923 vs 0.224 score separation between the two. Both numbers are published; you pick the trust level.
Ipcha-on-Ipcha · the backcheck
Every other nyxCore landing uses this voice in its ‘what we don’t do’ section. Applied to itself, consistency demands Ipcha does the same — without softening. The backcheck/ directory in the repo is public; the gate has blocked on this page too.
Ipcha on Ipcha
Ipcha shifts the burden of proof — it does not carry it. A claim that survives three rounds is harder, not correct. The paper is explicit: IS is a score, not a proof.
Ipcha on Ipcha
A hostile counter-argument can be wrong and still useful; it can be right and still merge rejected. The auditor's job is to arbitrate, not average. The gate blocks — it does not decide.
Ipcha on Ipcha
A model trained to object will find objections. We mitigate with prompt-lens diversity (Security / Scalability / Organizational) plus provider diversity, Promise.allSettled — but we do not claim neutrality.
Nemesis holds the red pen; Metis holds the honesty flag. The 0.15 % floor is what survived both.
Quickstart
Python 3.11+, Docker, Redis (for the sycophancy monitor and DoW budget). Apache 2.0. Integrate via REST — any language, any orchestrator.
# NLI microservice (DeBERTa, ONNX) — port 8200 cd nli-service && docker build -t ipcha-nli . docker run -p 8200:8200 ipcha-nli # Sidecar (protocol + scoring + routing) — port 8100 cd sidecar && pip install -r requirements.txt uvicorn api:app --host 0.0.0.0 --port 8100 # Run the paper evaluation — reproduces IS_ce = 0.923 cd evaluation && python run_all.py # Run the backcheck — the protocol against its own paper python -m ipcha.backcheck ./paper/ipcha-paper.tex
Before you trust the 0.15 %
The floor holds only when the authority corpus covers the domain, the providers are genuinely diverse, and the adversarial lenses match the actual failure surface. Run the backcheck on a representative artifact first. Calibrate the severity thresholds against findings you already know are real. Only then enable the gate in CI — otherwise you will block on noise, lose trust in the signal, and route around the protocol. The protocol is mandatory; your calibration is what makes it worth keeping.
Metis: first week is calibration. Nemesis: first month is adversarial-lens tuning.