You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Rust native engine is missing extractors for 11 languages that the JS engine supports via WASM. Files in those languages are silently dropped by the Rust orchestrator (no nodes, no edges, no analysis), and the JS-side WASM backfill (#967, #1068) papers over the gap.
PR #1070 makes the orchestrator behave correctly when this asymmetry exists, but the asymmetry itself is the underlying issue — every WASM-only language is a future opportunity for the same class of regression.
Languages with JS extractor + WASM grammar but no Rust extractor
Language
Extensions
JS grammar
F#
.fs, .fsx, .fsi
tree-sitter-fsharp
Gleam
.gleam
tree-sitter-gleam
Clojure
.clj, .cljs, .cljc
tree-sitter-clojure
Julia
.jl
tree-sitter-julia
R
.r, .R
tree-sitter-r
Erlang
.erl, .hrl
tree-sitter-erlang
Solidity
.sol
tree-sitter-solidity
Objective-C
.m
(verify)
CUDA
.cu, .cuh
(verify)
Groovy
.groovy, .gvy
(verify)
Verilog
.v, .sv
tree-sitter-verilog
Source-of-truth registries:
JS: LANGUAGE_REGISTRY in src/domain/parser.ts — 35 entries
Rust: crates/codegraph-core/src/extractors/mod.rs — 21 modules, plus SUPPORTED_EXTENSIONS in crates/codegraph-core/src/file_collector.rs
What "add a Rust extractor" requires per language
Verify a Rust tree-sitter grammar crate exists on crates.io (or vendor one).
Add a LanguageKind variant in crates/codegraph-core/src/parser_registry.rs and wire from_extension.
Add the extension(s) to SUPPORTED_EXTENSIONS in file_collector.rs.
Add pub mod <lang>; to extractors/mod.rs and a struct implementing SymbolExtractor.
Add the dispatch arm in extract_symbols_with_opts.
Add a fixture under tests/benchmarks/resolution/fixtures/<lang>/ with expected-edges.json (or extend the existing one).
Add a per-language extraction test under tests/parsers/<lang>.test.ts (JS-side, exercises the WASM extractor) and a Rust-side unit test.
Verify build-parity tests pass (WASM and native produce identical output for the language).
The asymmetry is invisible to users: they enable the language, builds work, but they're paying a hidden perf cost on every incremental rebuild on the native engine.
New WASM-only languages added in the future will hit the same trap unless the Rust port keeps pace.
Acceptance criteria
All 11 languages above have Rust extractors that pass per-language extraction tests.
The build-parity tests pass for each language (WASM and native produce the same node/edge counts on the language fixture).
A CI gate prevents future drift: a test that fails when a JS LANGUAGE_REGISTRY entry has no corresponding Rust extractor (or an explicit allowlist of intentionally WASM-only languages).
This is a substantial body of work and probably wants to be split per language — opening this as the umbrella so the effort is tracked.
Summary
The Rust native engine is missing extractors for 11 languages that the JS engine supports via WASM. Files in those languages are silently dropped by the Rust orchestrator (no nodes, no edges, no analysis), and the JS-side WASM backfill (#967, #1068) papers over the gap.
PR #1070 makes the orchestrator behave correctly when this asymmetry exists, but the asymmetry itself is the underlying issue — every WASM-only language is a future opportunity for the same class of regression.
Languages with JS extractor + WASM grammar but no Rust extractor
.fs,.fsx,.fsitree-sitter-fsharp.gleamtree-sitter-gleam.clj,.cljs,.cljctree-sitter-clojure.jltree-sitter-julia.r,.Rtree-sitter-r.erl,.hrltree-sitter-erlang.soltree-sitter-solidity.m.cu,.cuh.groovy,.gvy.v,.svtree-sitter-verilogSource-of-truth registries:
LANGUAGE_REGISTRYinsrc/domain/parser.ts— 35 entriescrates/codegraph-core/src/extractors/mod.rs— 21 modules, plusSUPPORTED_EXTENSIONSincrates/codegraph-core/src/file_collector.rsWhat "add a Rust extractor" requires per language
LanguageKindvariant incrates/codegraph-core/src/parser_registry.rsand wirefrom_extension.SUPPORTED_EXTENSIONSinfile_collector.rs.pub mod <lang>;toextractors/mod.rsand a struct implementingSymbolExtractor.extract_symbols_with_opts.tests/benchmarks/resolution/fixtures/<lang>/withexpected-edges.json(or extend the existing one).tests/parsers/<lang>.test.ts(JS-side, exercises the WASM extractor) and a Rust-side unit test.Why this matters
Acceptance criteria
LANGUAGE_REGISTRY.length === extractors/mod.rs declared modules + 2(typescript and tsx sharejavascript.rs; ocaml-interface sharesocaml.rs).LANGUAGE_REGISTRYentry has no corresponding Rust extractor (or an explicit allowlist of intentionally WASM-only languages).This is a substantial body of work and probably wants to be split per language — opening this as the umbrella so the effort is tracked.