Symptom
After runWithCatch (or evalWithCatch) catches a panic that originated inside a fastcall path, Context::fastCallDepth is incremented but never decremented — the -- after the inner eval() is skipped by the throw/longjmp. Repeated caught-and-swallowed panics from the fastcall path grow fastCallDepth monotonically and eventually trigger a false "stack overflow, fast call depth limit exceeded" error from the bounds check at simulate.h:599 / simulate_nodes.h:1023.
Root cause
Both fastcall eval paths use a bare ++ / eval / -- sequence with no try/finally and no RAII guard:
Context::callOrFastcall (simulate.h):
fastCallDepth ++;
auto aa = abiArg;
abiArg = args;
result = fn->code->eval(*this); // throw/longjmp here skips the decrement
stopFlags = 0;
abiArg = aa;
fastCallDepth --;
SimNode_FastCall<argCount>::eval (simulate_nodes.h, plus the typed EVAL_NODE macro variant): same shape.
The catch sites only restore some of the per-call state:
runWithCatch and evalWithCatch snapshot/restore abiArg / abiCMRES / abiThisBlockArg and pop the stack watermark on the catch path — but do not snapshot/restore fastCallDepth. The same omission exists in the four in-VM try_recover paths.
Context::restart() does zero fastCallDepth, but it asserts insideContext == 0, so it can only be used from outside an active eval — not from a callback / hook / surface-and-forget binding that's already locked.
Why this is more visible after #2526
The new Context::clearException() helper added in #2526 makes the leak more visible because surface-and-forget bindings (e.g. dasAudio mixer callback) now keep the same context alive across many caught panics instead of restart()-ing it. Pre-#2526 the typical path was either crash-out or restart(), both of which masked the leak. clearException() itself does NOT touch fastCallDepth (early review proposed it, but that would underflow uint32_t if called inside an active fastcall).
Two fix shapes
A. Catch-site fix (~5 sites): snapshot fastCallDepth on entry to runWithCatch / evalWithCatch / each in-VM try_recover path, restore in the catch / longjmp branch. Mirrors the existing aa / acm / atba snapshot pattern in those functions.
B. Increment-site fix (~3 sites): wrap each ++ / eval / -- in an RAII guard so the decrement runs in the dtor regardless of how control leaves. Touches hot eval paths, but bulletproof against any catch site (current or future) and removes the requirement that every catcher remember to do the snapshot.
B is structurally cleaner; A more closely mirrors the existing pattern. Either one fixes the symptom.
References
Symptom
After
runWithCatch(orevalWithCatch) catches a panic that originated inside a fastcall path,Context::fastCallDepthis incremented but never decremented — the--after the innereval()is skipped by the throw/longjmp. Repeated caught-and-swallowed panics from the fastcall path growfastCallDepthmonotonically and eventually trigger a false "stack overflow, fast call depth limit exceeded" error from the bounds check at simulate.h:599 / simulate_nodes.h:1023.Root cause
Both fastcall eval paths use a bare
++ / eval / --sequence with no try/finally and no RAII guard:Context::callOrFastcall(simulate.h):SimNode_FastCall<argCount>::eval(simulate_nodes.h, plus the typedEVAL_NODEmacro variant): same shape.The catch sites only restore some of the per-call state:
runWithCatchandevalWithCatchsnapshot/restoreabiArg/abiCMRES/abiThisBlockArgand pop the stack watermark on the catch path — but do not snapshot/restorefastCallDepth. The same omission exists in the four in-VMtry_recoverpaths.Context::restart()does zerofastCallDepth, but it assertsinsideContext == 0, so it can only be used from outside an active eval — not from a callback / hook / surface-and-forget binding that's already locked.Why this is more visible after #2526
The new
Context::clearException()helper added in #2526 makes the leak more visible because surface-and-forget bindings (e.g. dasAudio mixer callback) now keep the same context alive across many caught panics instead ofrestart()-ing it. Pre-#2526 the typical path was either crash-out orrestart(), both of which masked the leak.clearException()itself does NOT touchfastCallDepth(early review proposed it, but that would underflowuint32_tif called inside an active fastcall).Two fix shapes
A. Catch-site fix (~5 sites): snapshot
fastCallDepthon entry torunWithCatch/evalWithCatch/ each in-VMtry_recoverpath, restore in the catch / longjmp branch. Mirrors the existingaa/acm/atbasnapshot pattern in those functions.B. Increment-site fix (~3 sites): wrap each
++ / eval / --in an RAII guard so the decrement runs in the dtor regardless of how control leaves. Touches hot eval paths, but bulletproof against any catch site (current or future) and removes the requirement that every catcher remember to do the snapshot.B is structurally cleaner; A more closely mirrors the existing pattern. Either one fixes the symptom.
References
Context::clearException()andrunWithCatchAndClear()— they handle the exception-fields half of post-panic cleanup, but punt onfastCallDepth.