fix(sandbox): rewrite credential placeholders in websocket text frames by ericksoa · Pull Request #1286 · NVIDIA/OpenShell · GitHub

ericksoa · 2026-05-09T02:43:06Z

Summary

Fixes #872 by adding an opt-in WebSocket credential placeholder rewrite path for REST/L7 endpoints after an allowed HTTP 101 Switching Protocols upgrade.

The implementation preserves the placeholder/no-secret-in-sandbox model: provider secrets remain out of sandbox env, argv, config, logs, and sandbox-generated payloads until OpenShell reaches the relay boundary. This complements, but does not replace, #984 selective credential passthrough.

This is scoped to client-to-server WebSocket text-frame credential rewrite. It is not generic post-upgrade L7 inspection.

Related Issue

Fixes #872

Changes

Add websocket_credential_rewrite as an opt-in endpoint policy field, defaulting to false.
Wire the field through proto, YAML serde, provider profiles, merge behavior, OPA JSON conversion, L7 config parsing, and policy docs.
Add a credential-aware WebSocket relay for opted-in REST endpoints after an allowed upgrade.
Validate WebSocket upgrade responses before forwarding 101 headers, including status, Upgrade, Connection, Sec-WebSocket-Accept, and safe extension negotiation.
Rewrite only UTF-8 client-to-server text messages containing openshell:resolve:env:* placeholders.
Leave server-to-client bytes as raw passthrough.
Support only safe permessage-deflate negotiation with no-context-takeover constraints; unsupported compression parameters and mismatched upstream responses fail closed.
Fail closed on RSV/compression bits without negotiated support, malformed frames, unmasked client frames, invalid UTF-8 text, unresolved placeholders, oversized buffered text messages, invalid close frames, and invalid fragmentation.
Preserve raw opt-out WebSocket upgrade behavior, including forward-proxy upgrade requests.
Add WebSocket, secret rewrite, policy round-trip, OPA conversion, upgrade-path, route-selected L7, and forward-proxy coverage.

Security

Opt-in per endpoint and only honored for protocol: rest endpoints.
Client-to-server text frames only; no binary rewriting and no server-to-client inspection.
Unsafe compression variants are not supported; upstream extension negotiation must exactly match the sanitized safe offer.
Ambiguous protocol states fail closed before forwarding upgrade headers.
OCSF rewrite events report safe metadata and replacement counts without payloads, placeholders, secret values, or secret lengths.

Testing

mise run pre-commit passes
Unit tests added/updated
E2E tests added/updated (not applicable; covered with focused WebSocket relay/integration tests)

Additional focused validation run locally:

cargo fmt
cargo test -p openshell-sandbox websocket
cargo test -p openshell-sandbox secrets
cargo test -p openshell-policy
cargo clippy -p openshell-sandbox -p openshell-policy --all-targets -- -D warnings
mise run docs
mise run ci

Autobahn was not run because the repo does not include an Autobahn harness; the PR adds focused RFC 6455 and RFC 7692 regression tests instead.

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable)

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

copy-pr-bot · 2026-05-09T02:43:09Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

github-actions · 2026-05-09T02:43:22Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa · 2026-05-09T04:08:10Z

Comparative implementation audit: WebSocket credential rewrite

This is a behavior-level comparison for PR memory. I compared the branch against permissively licensed, widely used WebSocket implementations and did not copy code from any of them.

Comparison set

All comparison targets are permissively licensed enough for behavioral review and architectural comparison:

Implementation	License	Why it is a useful comparison
`websockets/ws`	MIT	Mature Node client/server, Autobahn client and server reports, tunable `permessage-deflate`, optional native masking and UTF-8 acceleration.
`snapview/tungstenite-rs`	MIT / Apache-2.0	Rust RFC 6455 baseline, Autobahn-tested, close to this repo's language/runtime ecosystem. It is mostly a protocol implementation baseline, not a compression benchmark.
`coder/websocket`	ISC	Modern Go implementation with Autobahn coverage, close handshake, ping/pong API, concurrent writes, and RFC 7692 support.
`gorilla/websocket`	BSD-2-Clause	Long-lived Go baseline with stable API, Autobahn server tests, origin defaults, and limited no-context-takeover compression.
`python-websockets/websockets`	BSD-3-Clause	Correctness-focused Python implementation with Sans-I/O layers, heavy RFC 6455 compliance testing, and full RFC 7692 negotiation factories.
`Autobahn Testsuite`	BSD-style / open source	The common external conformance benchmark. It covers handshakes, framing, ping/pong, reserved bits, fragmentation, UTF-8, limits, close behavior, and permessage-deflate.

Scope difference

OpenShell is not a general WebSocket endpoint library. The PR implements a security relay for a narrow problem: keep provider secrets out of sandbox-owned environment, argv, config, logs, and generated payloads, then resolve placeholders only at the relay boundary for opted-in client-to-server text frames.

That means several mature-library features are intentionally not goals here: application APIs, broadcast helpers, endpoint origin policy, generic server/client ownership, and full bidirectional message inspection. The relevant comparison is protocol safety plus operational confidence around the relay path.

Where OpenShell is stronger or more specialized

Credential boundary: The compared libraries parse and expose messages to applications. They do not solve OpenShell's no-secret-in-sandbox placeholder model. This PR is stronger for this specific security invariant because it rewrites only at the relay boundary and emits OCSF metadata without payloads, placeholders, secret values, or secret lengths.
Fail-closed posture: The branch rejects invalid upgrade responses before forwarding 101 headers, rejects unsolicited extension negotiation, rejects unsafe compression variants, and rejects malformed client frames. General libraries often expose protocol errors to an application; this relay closes the connection before an unsafe state can become transparent pass-through.
Narrow mutation surface: Binary frames and server-to-client bytes remain raw passthrough. Text frames without placeholders are re-emitted unchanged unless fragmentation or compression forces reframing. This is less capable than a full WebSocket stack, but it is safer for a security proxy.
Safe compression subset: The PR only accepts permessage-deflate when the offer can be reduced to a no-client-context-takeover-safe form and the upstream response exactly matches that safe offer. This is narrower than ws, coder/websocket, and python-websockets, but closer to Gorilla's conservative no-context-takeover stance and better aligned with not retaining compression dictionaries that may contain secret material.
Proxy-path parity: The branch covers CONNECT L7, route-selected L7, and forward-proxy upgrade behavior. Endpoint libraries do not usually exercise this proxy-selection surface.

Where mature implementations are ahead

Autobahn conformance: ws, tungstenite, coder/websocket, and Gorilla all advertise Autobahn coverage. OpenShell currently has focused regression tests but no Autobahn harness for the relay. This is the largest confidence gap.
Header and extension parsing depth: Mature stacks parse extension offers/responses as structured protocol data. OpenShell's current extension parser is intentionally small and handles the safe subset, but it should be hardened for quoted values, duplicate parameters, duplicate extension offers, weird OWS, and parser differentials.
Subprotocol validation: The PR validates Upgrade, Connection, Sec-WebSocket-Accept, and extensions, but it does not yet enforce that Sec-WebSocket-Protocol in the response is absent or selected from the client's offer. Endpoint libraries generally handle subprotocol negotiation explicitly.
Close-handshake behavior: Endpoint libraries expose or perform a real close handshake. OpenShell currently forwards close frames and rejects frames after client close, but protocol failures usually terminate the transport without sending a WebSocket protocol-error close frame. That may be acceptable for a fail-closed proxy, but it is less endpoint-complete.
Ping/pong and liveness APIs: Libraries expose ping/pong handlers or keepalive patterns. OpenShell forwards control frames but does not add heartbeat or liveness policy. That is probably correct for a transparent relay, but it should remain an explicit non-goal.
Configurable limits and backpressure knobs: Mature libraries expose message limits, compression thresholds, buffers, and sometimes concurrency controls. OpenShell has fixed defensive limits. That is simple and safe, but less tunable for high-volume providers.
Performance and fuzz evidence: Mature projects have years of test history, Autobahn reports, and sometimes optimized masking/UTF-8 paths. OpenShell has focused tests and workspace CI, but no parser fuzz target or benchmark for the rewrite path yet.

Plan to close the differences

Add a compliance note and test matrix. Add architecture/ or docs/ material that states the relay's intentional scope, the comparison set above, and which RFC 6455 / RFC 7692 behaviors are covered by unit, integration, Autobahn, or non-goal status.
Add subprotocol response validation. Track the original Sec-WebSocket-Protocol offer and reject a 101 response that selects an unoffered subprotocol or returns multiple protocol headers. Add tests for absent, valid, invalid, and duplicate subprotocol responses.
Harden extension parsing. Replace the current string-split parser with a structured parser for RFC token/parameter syntax, including quoted-string rejection or normalization, duplicate parameters, multiple extension offers, duplicate response extensions, bad OWS, and case normalization. Keep the accepted surface narrow.
Build an Autobahn relay harness. Use the Dockerized Autobahn Testsuite and a small local harness that drives traffic through OpenShell's relay to an echo endpoint. Start with the case groups that map to our responsibilities: handshake, framing, masking, reserved bits, opcodes, fragmentation, UTF-8, limits, close, and permessage-deflate. Document any cases that do not apply because OpenShell is not the endpoint.
Add fuzz/property coverage for the frame parser. Add a cargo fuzz or proptest target for read_frame_header, fragmentation state, close payload validation, UTF-8 handling, and extension parsing. The objective is parser differential resistance, not only happy-path coverage.
Decide protocol-error close semantics. For malformed frames after a successful upgrade, decide whether the relay should send a best-effort close frame with 1002, 1007, or 1009 before closing. This is more endpoint-like, but it changes wire behavior and should be a maintainer decision.
Make limits explicit. Document the fixed text/compressed/raw-frame limits and consider policy or config hooks only if real providers need them. Keep defaults conservative.
Benchmark the rewrite path. Add criterion-style tests for masked text passthrough, placeholder rewrite, fragmented text, compressed text, and binary/raw passthrough. Compare before/after within OpenShell rather than attempting cross-language benchmarks.
Keep unsafe compression variants out of scope. Do not add context takeover, arbitrary window-bit negotiation, or binary rewriting unless a separate design explicitly justifies the secret-retention and memory risks.
Re-evaluate parser reuse only after tests exist. A permissive Rust crate such as tungstenite is a good correctness reference, but the relay's c2s-only mutation, remasking, and fail-closed secret model may still require custom glue. Decide from the Autobahn and fuzz results rather than replacing code prematurely.

Current conclusion

The branch is defensible for the issue-specific threat model and stronger than generic libraries on the no-secret relay invariant. It is weaker than mature WebSocket stacks on external conformance proof, structured negotiation parsing, subprotocol validation, close-handshake polish, and performance/fuzz history. The most valuable next step is not broadening features; it is adding external conformance and parser-hardening evidence while keeping the accepted protocol surface intentionally small.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa · 2026-05-09T04:20:01Z

Follow-up hardening pass

This pass turns the portable lessons from mature WebSocket stacks into OpenShell-specific behavior, without copying implementation code: parse negotiation as structured protocol tokens, validate server-selected options against the original client offer, canonicalize equivalent safe extension forms, and fail closed when the rewrite path sees parameters it does not explicitly support.

Concretely, fix(sandbox): harden websocket negotiation parsing adds RFC 6455 subprotocol tracking and response validation, replaces ad hoc Sec-WebSocket-Extensions splitting with token-aware parsing, canonicalizes only the safe permessage-deflate no-context-takeover subset, rejects duplicate/value-bearing/quoted unsafe extension parameters in the rewrite path, and keeps WebSocketExtensionMode::Preserve raw so opt-out behavior does not change.

This matters for credential rewriting because the proxy becomes a protocol participant after it mutates an upgraded flow. Loose negotiation can let the upstream select compression state or subprotocol behavior the relay did not actually offer or understand, which creates room for state confusion, fail-open compression handling, or secret-bearing payloads moving through a path that no longer has the invariants the rewriter depends on. The new behavior keeps the supported surface narrow and deterministic: either the relay can validate and safely process the negotiated shape, or the handshake fails before forwarding the 101 to the client.

Remaining conformance work is still useful, especially an Autobahn-style relay harness and broader fuzz/property coverage for handshake parsing and frame sequencing. Those should come after this focused spec pass so future parser or engine swaps have a stable regression matrix to preserve.

ericksoa · 2026-05-09T04:22:36Z

I have read the DCO document and I hereby sign the DCO.

ericksoa · 2026-05-09T04:23:20Z

I have read the Contributor Agreement including DCO and I hereby sign the Contributor Agreement and DCO

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa · 2026-05-09T04:32:30Z

Focused conformance matrix added

I added test(sandbox): add websocket conformance relay matrix as the practical next step before a full Autobahn lane. The new in-crate harness drives the real HTTP 101 relay path and then sends post-upgrade frames through three representative modes:

raw preserve mode, proving opt-out behavior still relays frames without mask validation or rewriting;
rewrite mode without negotiated compression, proving a masked text credential is rewritten only after a validated upgrade;
rewrite mode with the safe permessage-deflate subset, proving canonical extension negotiation carries through to compressed text rewriting with RSV1 preserved.

This is not a replacement for Autobahn. It is the closest low-cost regression layer for this PR because it covers the specific OpenShell behavior boundary: once the proxy becomes an active WebSocket participant for credential rewriting, the handshake assumptions, compression negotiation, and post-upgrade frame relay need to be tested together rather than only as isolated parser/frame unit tests. Full Autobahn remains useful as a follow-up CI or manual conformance job, but it would add Docker/runtime/tooling weight beyond this focused PR pass.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa · 2026-05-09T05:20:53Z

Progress update: policy configuration now treats WebSocket as a first-class inspected protocol in the incremental update flow.

What changed:

openshell policy update --add-endpoint host:port:read-write:websocket:enforce is now accepted, so users do not have to hand-edit YAML just to create a WebSocket L7 endpoint.
--add-allow and --add-deny can target existing protocol: websocket endpoints using the existing method/path schema. For WebSocket, GET means the RFC 6455 upgrade and WEBSOCKET_TEXT means client text messages on the upgraded request path; this deliberately does not add payload-content matching.
Policy merge access expansion is now protocol-aware: WebSocket read-only expands to GET, WebSocket read-write expands to GET plus WEBSOCKET_TEXT, and WebSocket presets no longer expand into REST write methods.
CLI help and policy/security docs now document the WebSocket workflow, the WEBSOCKET_TEXT method, and the current rule-level constraints.

Why it matters:
This makes the WebSocket credential-rewrite capability operationally usable through the same policy-update workflow as HTTP. A maintainer can start with a minimal endpoint, audit denials, then add specific WebSocket handshake/text-frame rules without replacing the full policy. It also keeps the schema natural: transport selection stays on protocol, permissions stay in method/path rules, and WebSocket text-frame policy remains explicitly separate from message payload inspection.

Validation:

cargo fmt
cargo test -p openshell-cli policy_update
cargo test -p openshell-policy merge
cargo test -p openshell-sandbox websocket
cargo test -p openshell-sandbox secrets
cargo test -p openshell-policy
cargo clippy -p openshell-cli -p openshell-sandbox -p openshell-policy --all-targets -- -D warnings
mise run pre-commit

…dential-rewrite Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa · 2026-05-09T05:34:13Z

Progress update: WebSocket credential replacement is now configurable from the incremental policy workflow and covered through the route-selected relay path.

What changed:

Added openshell policy update --websocket-credential-rewrite for --add-endpoint operations. This lets users create or merge a protocol: websocket endpoint and enable text-frame credential placeholder replacement without hand-editing YAML.
The flag is intentionally constrained: it is accepted only for added protocol: websocket endpoints or protocol: rest compatibility endpoints that may upgrade to WebSocket. It is rejected for plain L4 endpoints and protocol: sql.
Merge behavior now preserves/merges websocket_credential_rewrite: true when an incoming endpoint is merged into an existing endpoint.
Gateway/server-side merge summaries now surface websocket_credential_rewrite=true so audit/progress output records when this sensitive behavior is enabled, without logging secret material.
Added a route-selected WebSocket regression proving the main value path: sandbox sends an openshell:resolve:env:* placeholder in a masked client text frame, OpenShell resolves it after a valid 101 upgrade, upstream receives the real credential, and the forwarded client-to-server frame remains masked.
Updated the policy docs and CLI examples to show the WebSocket credential rewrite workflow.

Why it matters:
The point of this capability is not only to parse WebSockets; it is to let agents call real WebSocket services while keeping service credentials out of sandbox-authored code and logs. This makes that operational path first-class: policy can allow the upgrade and client text messages, then explicitly opt in to resolving OpenShell placeholders at the relay boundary. The agent still sends placeholders, OpenShell injects the credential only at egress time, binary frames remain untouched, and no payload-content policy matching was added.

Validation:

cargo fmt
cargo test -p openshell-cli policy_update
cargo test -p openshell-policy merge
cargo test -p openshell-server summarize_cli_policy_merge_op_formats_websocket_credential_rewrite
cargo test -p openshell-sandbox route_selected_websocket_rewrites_text_credentials_after_upgrade
cargo test -p openshell-sandbox websocket
cargo test -p openshell-sandbox secrets
cargo test -p openshell-policy
cargo clippy -p openshell-cli -p openshell-sandbox -p openshell-policy -p openshell-server --all-targets -- -D warnings
mise run pre-commit

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa · 2026-05-09T05:57:38Z

Follow-up update after the policy CLI shape review:

Kept the WebSocket conformance workflow manual-only. The workflow remains workflow_dispatch without a schedule.
Changed incremental policy update syntax so WebSocket credential rewrite follows the existing endpoint-local pattern. The new CLI form is:

openshell policy update demo \
  --add-endpoint realtime.example.com:443:read-write:websocket:enforce:websocket-credential-rewrite \
  --binary /usr/bin/node

Rationale: --binary is command-wide because binaries live on the network rule and already apply to each endpoint added in the same command. protocol, access, enforcement, and websocket_credential_rewrite are endpoint fields, so rewrite should be expressed on the --add-endpoint spec instead of through a broad flag that accidentally applies to every endpoint in the batch. This keeps the WebSocket extension aligned with the existing REST/WebSocket endpoint schema and makes mixed batches safer to review.

Message-content policy would buy a different class of control than this PR currently implements: it would let OpenShell authorize WebSocket messages by application intent inside the text payload, such as event type, channel, action, model/tool name, or service-specific JSON shape, instead of only by host/port, upgrade path, and WEBSOCKET_TEXT. That would be valuable for production SaaS/realtime APIs where the same WebSocket path carries both safe reads and state-changing operations. I am leaving that out of this PR because it is a larger product/schema decision: WebSocket payloads are not standardized, parsers must be protocol-specific and opt-in, and any denial/logging path must preserve the current no-payload/no-secret/no-length-leak guarantees.

Validation after this update:

cargo fmt
cargo test -p openshell-cli policy_update
cargo test -p openshell-sandbox websocket
cargo test -p openshell-sandbox secrets
cargo test -p openshell-policy
cargo clippy -p openshell-cli -p openshell-sandbox -p openshell-policy --all-targets -- -D warnings
mise run pre-commit

All passed locally.

fix(sandbox): rewrite credential placeholders in websocket text frames

71bd516

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa added 4 commits May 8, 2026 19:58

fix(sandbox): harden websocket credential rewrite

726efa3

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

feat(sandbox): add websocket l7 inspection and compression

6fd6d4e

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

fix(sandbox): harden websocket upgrade validation

bac23a1

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

test(sandbox): cover route-selected websocket upgrades

9a6f696

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa mentioned this pull request May 9, 2026

bug: Secrets rewriting is not applied to websocket traffic. #872

Open

3 tasks

fix(sandbox): harden websocket negotiation parsing

143237f

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

test(sandbox): add websocket conformance relay matrix

7bff1c4

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa added 2 commits May 8, 2026 22:04

test(e2e): add websocket conformance lane

2f97ff3

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

fix(policy): support websocket incremental rules

a670178

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa added 2 commits May 8, 2026 22:23

Merge remote-tracking branch 'origin/main' into fix/872-websocket-cre…

e55f1ee

…dential-rewrite Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

feat(policy): enable websocket credential rewrite updates

4009a4a

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

fix(cli): make websocket rewrite endpoint-local

1562b70

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sandbox): rewrite credential placeholders in websocket text frames#1286

fix(sandbox): rewrite credential placeholders in websocket text frames#1286
ericksoa wants to merge 12 commits intoNVIDIA:mainfrom
ericksoa:fix/872-websocket-credential-rewrite

ericksoa commented May 9, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026 •

edited

Loading

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericksoa commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Security

Testing

Checklist

Uh oh!

copy-pr-bot Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericksoa commented May 9, 2026

Comparative implementation audit: WebSocket credential rewrite

Comparison set

Scope difference

Where OpenShell is stronger or more specialized

Where mature implementations are ahead

Plan to close the differences

Current conclusion

Uh oh!

ericksoa commented May 9, 2026

Follow-up hardening pass

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Focused conformance matrix added

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ericksoa commented May 9, 2026 •

edited

Loading

github-actions Bot commented May 9, 2026 •

edited

Loading