Skip to content

fix(sandbox): rewrite credential placeholders in websocket text frames#1286

Draft
ericksoa wants to merge 12 commits intoNVIDIA:mainfrom
ericksoa:fix/872-websocket-credential-rewrite
Draft

fix(sandbox): rewrite credential placeholders in websocket text frames#1286
ericksoa wants to merge 12 commits intoNVIDIA:mainfrom
ericksoa:fix/872-websocket-credential-rewrite

Conversation

@ericksoa
Copy link
Copy Markdown

@ericksoa ericksoa commented May 9, 2026

Summary

Fixes #872 by adding an opt-in WebSocket credential placeholder rewrite path for REST/L7 endpoints after an allowed HTTP 101 Switching Protocols upgrade.

The implementation preserves the placeholder/no-secret-in-sandbox model: provider secrets remain out of sandbox env, argv, config, logs, and sandbox-generated payloads until OpenShell reaches the relay boundary. This complements, but does not replace, #984 selective credential passthrough.

This is scoped to client-to-server WebSocket text-frame credential rewrite. It is not generic post-upgrade L7 inspection.

Related Issue

Fixes #872

Changes

  • Add websocket_credential_rewrite as an opt-in endpoint policy field, defaulting to false.
  • Wire the field through proto, YAML serde, provider profiles, merge behavior, OPA JSON conversion, L7 config parsing, and policy docs.
  • Add a credential-aware WebSocket relay for opted-in REST endpoints after an allowed upgrade.
  • Validate WebSocket upgrade responses before forwarding 101 headers, including status, Upgrade, Connection, Sec-WebSocket-Accept, and safe extension negotiation.
  • Rewrite only UTF-8 client-to-server text messages containing openshell:resolve:env:* placeholders.
  • Leave server-to-client bytes as raw passthrough.
  • Support only safe permessage-deflate negotiation with no-context-takeover constraints; unsupported compression parameters and mismatched upstream responses fail closed.
  • Fail closed on RSV/compression bits without negotiated support, malformed frames, unmasked client frames, invalid UTF-8 text, unresolved placeholders, oversized buffered text messages, invalid close frames, and invalid fragmentation.
  • Preserve raw opt-out WebSocket upgrade behavior, including forward-proxy upgrade requests.
  • Add WebSocket, secret rewrite, policy round-trip, OPA conversion, upgrade-path, route-selected L7, and forward-proxy coverage.

Security

  • Opt-in per endpoint and only honored for protocol: rest endpoints.
  • Client-to-server text frames only; no binary rewriting and no server-to-client inspection.
  • Unsafe compression variants are not supported; upstream extension negotiation must exactly match the sanitized safe offer.
  • Ambiguous protocol states fail closed before forwarding upgrade headers.
  • OCSF rewrite events report safe metadata and replacement counts without payloads, placeholders, secret values, or secret lengths.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (not applicable; covered with focused WebSocket relay/integration tests)

Additional focused validation run locally:

  • cargo fmt
  • cargo test -p openshell-sandbox websocket
  • cargo test -p openshell-sandbox secrets
  • cargo test -p openshell-policy
  • cargo clippy -p openshell-sandbox -p openshell-policy --all-targets -- -D warnings
  • mise run docs
  • mise run ci

Autobahn was not run because the repo does not include an Autobahn harness; the PR adds focused RFC 6455 and RFC 7692 regression tests instead.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 9, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

ericksoa added 4 commits May 8, 2026 19:58
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa
Copy link
Copy Markdown
Author

ericksoa commented May 9, 2026

Comparative implementation audit: WebSocket credential rewrite

This is a behavior-level comparison for PR memory. I compared the branch against permissively licensed, widely used WebSocket implementations and did not copy code from any of them.

Comparison set

All comparison targets are permissively licensed enough for behavioral review and architectural comparison:

Implementation License Why it is a useful comparison
websockets/ws MIT Mature Node client/server, Autobahn client and server reports, tunable permessage-deflate, optional native masking and UTF-8 acceleration.
snapview/tungstenite-rs MIT / Apache-2.0 Rust RFC 6455 baseline, Autobahn-tested, close to this repo's language/runtime ecosystem. It is mostly a protocol implementation baseline, not a compression benchmark.
coder/websocket ISC Modern Go implementation with Autobahn coverage, close handshake, ping/pong API, concurrent writes, and RFC 7692 support.
gorilla/websocket BSD-2-Clause Long-lived Go baseline with stable API, Autobahn server tests, origin defaults, and limited no-context-takeover compression.
python-websockets/websockets BSD-3-Clause Correctness-focused Python implementation with Sans-I/O layers, heavy RFC 6455 compliance testing, and full RFC 7692 negotiation factories.
Autobahn Testsuite BSD-style / open source The common external conformance benchmark. It covers handshakes, framing, ping/pong, reserved bits, fragmentation, UTF-8, limits, close behavior, and permessage-deflate.

Scope difference

OpenShell is not a general WebSocket endpoint library. The PR implements a security relay for a narrow problem: keep provider secrets out of sandbox-owned environment, argv, config, logs, and generated payloads, then resolve placeholders only at the relay boundary for opted-in client-to-server text frames.

That means several mature-library features are intentionally not goals here: application APIs, broadcast helpers, endpoint origin policy, generic server/client ownership, and full bidirectional message inspection. The relevant comparison is protocol safety plus operational confidence around the relay path.

Where OpenShell is stronger or more specialized

  • Credential boundary: The compared libraries parse and expose messages to applications. They do not solve OpenShell's no-secret-in-sandbox placeholder model. This PR is stronger for this specific security invariant because it rewrites only at the relay boundary and emits OCSF metadata without payloads, placeholders, secret values, or secret lengths.
  • Fail-closed posture: The branch rejects invalid upgrade responses before forwarding 101 headers, rejects unsolicited extension negotiation, rejects unsafe compression variants, and rejects malformed client frames. General libraries often expose protocol errors to an application; this relay closes the connection before an unsafe state can become transparent pass-through.
  • Narrow mutation surface: Binary frames and server-to-client bytes remain raw passthrough. Text frames without placeholders are re-emitted unchanged unless fragmentation or compression forces reframing. This is less capable than a full WebSocket stack, but it is safer for a security proxy.
  • Safe compression subset: The PR only accepts permessage-deflate when the offer can be reduced to a no-client-context-takeover-safe form and the upstream response exactly matches that safe offer. This is narrower than ws, coder/websocket, and python-websockets, but closer to Gorilla's conservative no-context-takeover stance and better aligned with not retaining compression dictionaries that may contain secret material.
  • Proxy-path parity: The branch covers CONNECT L7, route-selected L7, and forward-proxy upgrade behavior. Endpoint libraries do not usually exercise this proxy-selection surface.

Where mature implementations are ahead

  • Autobahn conformance: ws, tungstenite, coder/websocket, and Gorilla all advertise Autobahn coverage. OpenShell currently has focused regression tests but no Autobahn harness for the relay. This is the largest confidence gap.
  • Header and extension parsing depth: Mature stacks parse extension offers/responses as structured protocol data. OpenShell's current extension parser is intentionally small and handles the safe subset, but it should be hardened for quoted values, duplicate parameters, duplicate extension offers, weird OWS, and parser differentials.
  • Subprotocol validation: The PR validates Upgrade, Connection, Sec-WebSocket-Accept, and extensions, but it does not yet enforce that Sec-WebSocket-Protocol in the response is absent or selected from the client's offer. Endpoint libraries generally handle subprotocol negotiation explicitly.
  • Close-handshake behavior: Endpoint libraries expose or perform a real close handshake. OpenShell currently forwards close frames and rejects frames after client close, but protocol failures usually terminate the transport without sending a WebSocket protocol-error close frame. That may be acceptable for a fail-closed proxy, but it is less endpoint-complete.
  • Ping/pong and liveness APIs: Libraries expose ping/pong handlers or keepalive patterns. OpenShell forwards control frames but does not add heartbeat or liveness policy. That is probably correct for a transparent relay, but it should remain an explicit non-goal.
  • Configurable limits and backpressure knobs: Mature libraries expose message limits, compression thresholds, buffers, and sometimes concurrency controls. OpenShell has fixed defensive limits. That is simple and safe, but less tunable for high-volume providers.
  • Performance and fuzz evidence: Mature projects have years of test history, Autobahn reports, and sometimes optimized masking/UTF-8 paths. OpenShell has focused tests and workspace CI, but no parser fuzz target or benchmark for the rewrite path yet.

Plan to close the differences

  1. Add a compliance note and test matrix. Add architecture/ or docs/ material that states the relay's intentional scope, the comparison set above, and which RFC 6455 / RFC 7692 behaviors are covered by unit, integration, Autobahn, or non-goal status.
  2. Add subprotocol response validation. Track the original Sec-WebSocket-Protocol offer and reject a 101 response that selects an unoffered subprotocol or returns multiple protocol headers. Add tests for absent, valid, invalid, and duplicate subprotocol responses.
  3. Harden extension parsing. Replace the current string-split parser with a structured parser for RFC token/parameter syntax, including quoted-string rejection or normalization, duplicate parameters, multiple extension offers, duplicate response extensions, bad OWS, and case normalization. Keep the accepted surface narrow.
  4. Build an Autobahn relay harness. Use the Dockerized Autobahn Testsuite and a small local harness that drives traffic through OpenShell's relay to an echo endpoint. Start with the case groups that map to our responsibilities: handshake, framing, masking, reserved bits, opcodes, fragmentation, UTF-8, limits, close, and permessage-deflate. Document any cases that do not apply because OpenShell is not the endpoint.
  5. Add fuzz/property coverage for the frame parser. Add a cargo fuzz or proptest target for read_frame_header, fragmentation state, close payload validation, UTF-8 handling, and extension parsing. The objective is parser differential resistance, not only happy-path coverage.
  6. Decide protocol-error close semantics. For malformed frames after a successful upgrade, decide whether the relay should send a best-effort close frame with 1002, 1007, or 1009 before closing. This is more endpoint-like, but it changes wire behavior and should be a maintainer decision.
  7. Make limits explicit. Document the fixed text/compressed/raw-frame limits and consider policy or config hooks only if real providers need them. Keep defaults conservative.
  8. Benchmark the rewrite path. Add criterion-style tests for masked text passthrough, placeholder rewrite, fragmented text, compressed text, and binary/raw passthrough. Compare before/after within OpenShell rather than attempting cross-language benchmarks.
  9. Keep unsafe compression variants out of scope. Do not add context takeover, arbitrary window-bit negotiation, or binary rewriting unless a separate design explicitly justifies the secret-retention and memory risks.
  10. Re-evaluate parser reuse only after tests exist. A permissive Rust crate such as tungstenite is a good correctness reference, but the relay's c2s-only mutation, remasking, and fail-closed secret model may still require custom glue. Decide from the Autobahn and fuzz results rather than replacing code prematurely.

Current conclusion

The branch is defensible for the issue-specific threat model and stronger than generic libraries on the no-secret relay invariant. It is weaker than mature WebSocket stacks on external conformance proof, structured negotiation parsing, subprotocol validation, close-handshake polish, and performance/fuzz history. The most valuable next step is not broadening features; it is adding external conformance and parser-hardening evidence while keeping the accepted protocol surface intentionally small.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa
Copy link
Copy Markdown
Author

ericksoa commented May 9, 2026

Follow-up hardening pass

This pass turns the portable lessons from mature WebSocket stacks into OpenShell-specific behavior, without copying implementation code: parse negotiation as structured protocol tokens, validate server-selected options against the original client offer, canonicalize equivalent safe extension forms, and fail closed when the rewrite path sees parameters it does not explicitly support.

Concretely, fix(sandbox): harden websocket negotiation parsing adds RFC 6455 subprotocol tracking and response validation, replaces ad hoc Sec-WebSocket-Extensions splitting with token-aware parsing, canonicalizes only the safe permessage-deflate no-context-takeover subset, rejects duplicate/value-bearing/quoted unsafe extension parameters in the rewrite path, and keeps WebSocketExtensionMode::Preserve raw so opt-out behavior does not change.

This matters for credential rewriting because the proxy becomes a protocol participant after it mutates an upgraded flow. Loose negotiation can let the upstream select compression state or subprotocol behavior the relay did not actually offer or understand, which creates room for state confusion, fail-open compression handling, or secret-bearing payloads moving through a path that no longer has the invariants the rewriter depends on. The new behavior keeps the supported surface narrow and deterministic: either the relay can validate and safely process the negotiated shape, or the handshake fails before forwarding the 101 to the client.

Remaining conformance work is still useful, especially an Autobahn-style relay harness and broader fuzz/property coverage for handshake parsing and frame sequencing. Those should come after this focused spec pass so future parser or engine swaps have a stable regression matrix to preserve.

@ericksoa
Copy link
Copy Markdown
Author

ericksoa commented May 9, 2026

I have read the DCO document and I hereby sign the DCO.

@ericksoa
Copy link
Copy Markdown
Author

ericksoa commented May 9, 2026

I have read the Contributor Agreement including DCO and I hereby sign the Contributor Agreement and DCO

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa
Copy link
Copy Markdown
Author

ericksoa commented May 9, 2026

Focused conformance matrix added

I added test(sandbox): add websocket conformance relay matrix as the practical next step before a full Autobahn lane. The new in-crate harness drives the real HTTP 101 relay path and then sends post-upgrade frames through three representative modes:

  • raw preserve mode, proving opt-out behavior still relays frames without mask validation or rewriting;
  • rewrite mode without negotiated compression, proving a masked text credential is rewritten only after a validated upgrade;
  • rewrite mode with the safe permessage-deflate subset, proving canonical extension negotiation carries through to compressed text rewriting with RSV1 preserved.

This is not a replacement for Autobahn. It is the closest low-cost regression layer for this PR because it covers the specific OpenShell behavior boundary: once the proxy becomes an active WebSocket participant for credential rewriting, the handshake assumptions, compression negotiation, and post-upgrade frame relay need to be tested together rather than only as isolated parser/frame unit tests. Full Autobahn remains useful as a follow-up CI or manual conformance job, but it would add Docker/runtime/tooling weight beyond this focused PR pass.

ericksoa added 2 commits May 8, 2026 22:04
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa
Copy link
Copy Markdown
Author

ericksoa commented May 9, 2026

Progress update: policy configuration now treats WebSocket as a first-class inspected protocol in the incremental update flow.

What changed:

  • openshell policy update --add-endpoint host:port:read-write:websocket:enforce is now accepted, so users do not have to hand-edit YAML just to create a WebSocket L7 endpoint.
  • --add-allow and --add-deny can target existing protocol: websocket endpoints using the existing method/path schema. For WebSocket, GET means the RFC 6455 upgrade and WEBSOCKET_TEXT means client text messages on the upgraded request path; this deliberately does not add payload-content matching.
  • Policy merge access expansion is now protocol-aware: WebSocket read-only expands to GET, WebSocket read-write expands to GET plus WEBSOCKET_TEXT, and WebSocket presets no longer expand into REST write methods.
  • CLI help and policy/security docs now document the WebSocket workflow, the WEBSOCKET_TEXT method, and the current rule-level constraints.

Why it matters:
This makes the WebSocket credential-rewrite capability operationally usable through the same policy-update workflow as HTTP. A maintainer can start with a minimal endpoint, audit denials, then add specific WebSocket handshake/text-frame rules without replacing the full policy. It also keeps the schema natural: transport selection stays on protocol, permissions stay in method/path rules, and WebSocket text-frame policy remains explicitly separate from message payload inspection.

Validation:

  • cargo fmt
  • cargo test -p openshell-cli policy_update
  • cargo test -p openshell-policy merge
  • cargo test -p openshell-sandbox websocket
  • cargo test -p openshell-sandbox secrets
  • cargo test -p openshell-policy
  • cargo clippy -p openshell-cli -p openshell-sandbox -p openshell-policy --all-targets -- -D warnings
  • mise run pre-commit

ericksoa added 2 commits May 8, 2026 22:23
…dential-rewrite

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa
Copy link
Copy Markdown
Author

ericksoa commented May 9, 2026

Progress update: WebSocket credential replacement is now configurable from the incremental policy workflow and covered through the route-selected relay path.

What changed:

  • Added openshell policy update --websocket-credential-rewrite for --add-endpoint operations. This lets users create or merge a protocol: websocket endpoint and enable text-frame credential placeholder replacement without hand-editing YAML.
  • The flag is intentionally constrained: it is accepted only for added protocol: websocket endpoints or protocol: rest compatibility endpoints that may upgrade to WebSocket. It is rejected for plain L4 endpoints and protocol: sql.
  • Merge behavior now preserves/merges websocket_credential_rewrite: true when an incoming endpoint is merged into an existing endpoint.
  • Gateway/server-side merge summaries now surface websocket_credential_rewrite=true so audit/progress output records when this sensitive behavior is enabled, without logging secret material.
  • Added a route-selected WebSocket regression proving the main value path: sandbox sends an openshell:resolve:env:* placeholder in a masked client text frame, OpenShell resolves it after a valid 101 upgrade, upstream receives the real credential, and the forwarded client-to-server frame remains masked.
  • Updated the policy docs and CLI examples to show the WebSocket credential rewrite workflow.

Why it matters:
The point of this capability is not only to parse WebSockets; it is to let agents call real WebSocket services while keeping service credentials out of sandbox-authored code and logs. This makes that operational path first-class: policy can allow the upgrade and client text messages, then explicitly opt in to resolving OpenShell placeholders at the relay boundary. The agent still sends placeholders, OpenShell injects the credential only at egress time, binary frames remain untouched, and no payload-content policy matching was added.

Validation:

  • cargo fmt
  • cargo test -p openshell-cli policy_update
  • cargo test -p openshell-policy merge
  • cargo test -p openshell-server summarize_cli_policy_merge_op_formats_websocket_credential_rewrite
  • cargo test -p openshell-sandbox route_selected_websocket_rewrites_text_credentials_after_upgrade
  • cargo test -p openshell-sandbox websocket
  • cargo test -p openshell-sandbox secrets
  • cargo test -p openshell-policy
  • cargo clippy -p openshell-cli -p openshell-sandbox -p openshell-policy -p openshell-server --all-targets -- -D warnings
  • mise run pre-commit

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa
Copy link
Copy Markdown
Author

ericksoa commented May 9, 2026

Follow-up update after the policy CLI shape review:

  • Kept the WebSocket conformance workflow manual-only. The workflow remains workflow_dispatch without a schedule.
  • Changed incremental policy update syntax so WebSocket credential rewrite follows the existing endpoint-local pattern. The new CLI form is:
openshell policy update demo \
  --add-endpoint realtime.example.com:443:read-write:websocket:enforce:websocket-credential-rewrite \
  --binary /usr/bin/node

Rationale: --binary is command-wide because binaries live on the network rule and already apply to each endpoint added in the same command. protocol, access, enforcement, and websocket_credential_rewrite are endpoint fields, so rewrite should be expressed on the --add-endpoint spec instead of through a broad flag that accidentally applies to every endpoint in the batch. This keeps the WebSocket extension aligned with the existing REST/WebSocket endpoint schema and makes mixed batches safer to review.

Message-content policy would buy a different class of control than this PR currently implements: it would let OpenShell authorize WebSocket messages by application intent inside the text payload, such as event type, channel, action, model/tool name, or service-specific JSON shape, instead of only by host/port, upgrade path, and WEBSOCKET_TEXT. That would be valuable for production SaaS/realtime APIs where the same WebSocket path carries both safe reads and state-changing operations. I am leaving that out of this PR because it is a larger product/schema decision: WebSocket payloads are not standardized, parsers must be protocol-specific and opt-in, and any denial/logging path must preserve the current no-payload/no-secret/no-length-leak guarantees.

Validation after this update:

  • cargo fmt
  • cargo test -p openshell-cli policy_update
  • cargo test -p openshell-sandbox websocket
  • cargo test -p openshell-sandbox secrets
  • cargo test -p openshell-policy
  • cargo clippy -p openshell-cli -p openshell-sandbox -p openshell-policy --all-targets -- -D warnings
  • mise run pre-commit

All passed locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Secrets rewriting is not applied to websocket traffic.

1 participant