Skip to content

detect secrets embedded inside larger tokens#2582

Merged
dgageot merged 1 commit intodocker:mainfrom
dgageot:board/improving-redact-secrets-for-non-word-bo-98e8d72c
Apr 28, 2026
Merged

detect secrets embedded inside larger tokens#2582
dgageot merged 1 commit intodocker:mainfrom
dgageot:board/improving-redact-secrets-for-non-word-bo-98e8d72c

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 28, 2026

secretsscan.Redact / ContainsSecrets used to require a word boundary
around every secret: detection only fired when the recognisable token
sat next to whitespace, punctuation, or the start/end of the input.
That meant values pasted into a larger token leaked through, even when
the prefix and length were a perfect match:

input shape before after
KEY= ghp_… (with space)
KEY=ghp_… (no space)
or ghp_…
BEFOREghp_…AFTER
…AKIA…EXAMPLEAFTER

The fix drops the leading [^0-9a-zA-Z]|^ anchor (withoutWordPrefix
/ startWord) and the trailing [.,]?(\s+|$) anchor (endSecret)
from the rule expressions, plus the equivalent inline anchors on the
alibaba-access-key-id rule. Each rule's payload is already tightly
constrained (fixed-length character classes, explicit token shapes),
so removing the boundary check doesn't broaden the regex enough to
trigger false positives in practice.

Performance is unchanged in shape: the keyword pre-filter still skips
the regex hot path for typical inputs, and Go's RE2-based engine
keeps detection at O(len(text) · len(rules)). The clean-input and
with-secret benchmarks show the same allocation profile (1 / 4
allocs per op) as before.

Two new tests pin the behaviour:

  • TestRedactDetectsSecretsAcrossWordBoundaries covers GitHub PAT,
    AWS access key, and Docker PAT in 12 boundary shapes (alone,
    leading alphanumerics, trailing alphanumerics, fully embedded,
    mid-KEY=…, …).
  • TestRedactScalesLinearly is a guard-rail that fails if a future
    change reintroduces super-linear behaviour: doubling the input ~16×
    must not balloon wall time by more than 128×, well below the ~256×
    a true O(n²) regression would produce.

@dgageot dgageot requested a review from a team as a code owner April 28, 2026 17:09
@dgageot dgageot merged commit 291c33b into docker:main Apr 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants