Skip to content

feat(secretsscan): add 20 more secret patterns#2692

Merged
dgageot merged 1 commit intodocker:mainfrom
dgageot:board/c03d4209b53073ea
May 7, 2026
Merged

feat(secretsscan): add 20 more secret patterns#2692
dgageot merged 1 commit intodocker:mainfrom
dgageot:board/c03d4209b53073ea

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented May 7, 2026

Adds 20 new detection rules to pkg/secretsscan, on top of the existing upstream-derived catalogue:

# Provider Shape
1 Discord bot token [MNO]<24>.<6>.<27+> (3-part dotted)
2 Discord webhook URL https://discord(?:app)?.com/api/webhooks/<id>/<token>
3 Telegram bot token <8-10 digits>:AA<33 base64url>
4 Fly.io macaroon FlyV1 fm2_…
5 Groq API key gsk_<52>
6 Perplexity API key pplx-<48-56>
7 xAI / Grok API key xai-<80>
8 Cohere API key co_<40>
9 Buildkite agent token bkua_<40>
10 CircleCI project token CCIPRJ_<org>_<token>
11 Cloudinary URL cloudinary://<key>:<secret>@<cloud>
12 MongoDB connection string mongodb(?:+srv)?://user:<pwd>@… (only the password is redacted)
13 PostgreSQL connection string postgres(?:ql)?://user:<pwd>@… (only the password is redacted)
14 Azure storage connection string DefaultEndpointsProtocol=…;AccountKey=<base64> (only the key is redacted)
15 Mapbox secret key sk.<60>.<22>
16 Vault batch token hvb.<90-200>
17 Vault recovery token hvr.<90-200>
18 Netlify PAT nfp_<40>
19 Asana PAT 1/<≥14 digits>:<32 hex>
20 Cloudflare Origin CA key v1.0-<32 hex>-<146 base64>

Each rule has a documented vendor prefix or framing structure unique enough to keep the keyword pre-filter useful and the false-positive rate low; the in-source comments cite the format they target.

Supporting changes

  • pkg/secretsscan/aho.go: kwMask grown from [2]uint64 (128-bit) to [4]uint64 (256-bit). The catalogue went from 121 → 149 unique keywords, exceeding the previous 128 cap. empty, overlaps, scan, buildAhoCorasick's panic limit, and the BFS merge are updated accordingly.
  • pkg/secretsscan/aho_test.go: TestAhoCorasickPanicOnTooManyPatterns now exercises the 257-pattern boundary; TestKwMaskOperations covers bits in words 3 and 4.
  • pkg/secretsscan/secrets_test.go: 21 new detection cases in TestContainsSecretsRecognisesKnownTokens (Discord gets two, one per first-segment letter class).
  • pkg/secretsscan/edge_cases_test.go (new): three suites pinning the new behaviour:
    • TestNoisyKeywordsFalsePositives — common noise (12:00 AA, 1/2 cup, v1.0-beta, disk., …) does not trip detection.
    • TestDiscordTokenPrefixes — every documented Discord token first-segment prefix (MT/Mz/ND/NT/Nz/OD) is detected; this caught a regex/keyword skew during review and now guards against future drift.
    • TestConnectionStringContextPreservation — the new MongoDB / Postgres / Azure-storage rules redact only the (?P<secret>…) span and keep scheme + username + host visible.

Validation

  • go test ./pkg/secretsscan/... ./pkg/hooks/... — all pass
  • golangci-lint run ./... — 0 issues
  • Idempotency, redaction-marker safety, slack-rotating-token bounding, and linear-scaling tests still pass
  • Benchmarks on Apple M4 Max:
    • BenchmarkRedactCleanInput: 19.1 µs/op (was 18.6 µs)
    • BenchmarkRedactWithSecret: 4.3 µs/op (unchanged)

covers discord (bot tokens + webhooks), telegram, fly.io, groq,
perplexity, xai, cohere, buildkite, circleci, cloudinary, mongodb /
postgres / azure-storage connection strings, mapbox secret keys, vault
batch + recovery tokens, netlify, asana, and cloudflare origin ca keys.
also bumps kwMask from [2]uint64 to [4]uint64 to make room.
@dgageot dgageot requested a review from a team as a code owner May 7, 2026 10:39
Copy link
Copy Markdown
Member

@rumpl rumpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Question, doesn't a secret scanning library already exist?

@dgageot dgageot merged commit fe314d5 into docker:main May 7, 2026
8 checks passed
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

20 new secret detection rules reviewed. The kwMask expansion from [2]uint64 to [4]uint64 is arithmetically correct, all keyword/regex pairs are properly aligned (mixed-case keywords are correctly normalized via strings.ToLower in compiledRuleSet), operator precedence in overlaps() is correct per the Go spec (| has higher precedence than !=), and the named (?P<secret>...) groups in the MongoDB/Postgres/Azure rules redact only the password/key span.

One low-severity concern noted inline.

Comment thread pkg/secretsscan/rules.go
// / `AccountName` framing is only metadata. The base64 value
// is typically 88 chars (44-byte key) but we accept anything
// from 20 chars upwards to cover shorter SAS-signing keys.
expression: `DefaultEndpointsProtocol=https?;AccountName=[^;]+;AccountKey=(?P<secret>[A-Za-z0-9+/=]{20,})`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Azure AccountKey regex may over-redact on connection strings missing a semicolon separator

The (?P<secret>[A-Za-z0-9+/=]{20,}) capture group is unbounded and greedily consumes any [A-Za-z0-9+/=] characters. For a well-formed Azure connection string the semicolons between fields act as natural stop-bytes — but if a consumer assembles a string without the trailing semicolon the regex swallows text beyond the key value.

Reproducible example:

"DefaultEndpointsProtocol=https;AccountName=foo;AccountKey=AAAA==BlobEndpoint=http://127.0.0.1"

Captured secret: "AAAA==BlobEndpoint=http" (the field name BlobEndpoint is entirely in [A-Za-z]).

Impact: only over-redaction (the secret is still detected), not a missed detection. Well-formed strings produced by Azure SDKs always include semicolons so this is unlikely to be triggered in practice. A safe fix is to add an upper bound, e.g. {20,200}, or to change the character class to exclude = at the end of the value by anchoring on (?:;|$), e.g.:

`DefaultEndpointsProtocol=https?;AccountName=[^;]+;AccountKey=(?P<secret>[A-Za-z0-9+/=]{20,200})(?:;|$)`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants