GitHub - lukifer23/AnarchoBot: 138M param ChatML training stack optimized for Apple Silicon via MLX. Features a curated Quality2K continuation curriculum and v18 SFT alignment. · GitHub

AnarchoBot (MLX, Apple Silicon)

138M ChatML training stack for Apple Silicon using MLX.

Canonical remote branch: main. Historical legacy-default state is preserved at tag archive/origin-master-2026-03-20.

Active Workflow

The repo now treats one path as first-class:

clean Quality2K continuation -> explicitly approved pinned checkpoint -> v20 align/full/repair SFT recovery curriculum -> broad raw chat gate

The v19 repair passed a narrow gate but fails broad chat. Keep v19 checkpoints, loaders, tokenizer, model topology, ChatML rendering, and assistant-only loss masking compatible; do not overwrite or repoint v19 artifacts while v20 is being trained. The latest v19 canonical run state is incomplete/stale for coherent multi-turn chat decisions, so v20 is now the active recovery path.

Active entrypoints:

scripts/build_pretrain_quality2k.py
scripts/run_pretrain_quality2k_terminal.sh
scripts/audit_dense_mainline.py
scripts/review_plain_generation.py
scripts/select_quality2k_checkpoint.py
scripts/pin_quality2k_checkpoint.py
scripts/build_sft_release.py
scripts/run_sft_release.py
scripts/run_sft_release_v20.py
scripts/run_multiturn_coherence_eval.py (scored raw/guarded broad chat suite; see SFT Runbook)

Research branch entrypoints:

scripts/extend_tokenizer_with_vm_tokens.py
scripts/build_vm_pilot_dataset.py
scripts/init_vm_from_dense.py
scripts/extend_tokenizer_with_wasm_tokens.py
scripts/normalize_local_docs.py
scripts/build_wasm_subset_corpus.py
scripts/build_wasm80m_pretrain_corpus.py
scripts/build_wasm80m_sft_corpora.py
scripts/run_wasm80m_pretrain.py
scripts/run_wasm80m_sft.py
scripts/eval_wasm80m.py

Historical probe-era and experimental material is retained only as archived reference. See Archive Notes.

Historical dense shims:

scripts/build_sft_v19_release.py
scripts/run_sft_release_v19.py
scripts/build_sft_v18_release.py
scripts/run_sft_release_v18.py
scripts/run_sft_release_v18_terminal.sh These remain compatibility shims only and are non-authoritative for release decisions.

The WASM80m scripts listed under “Research branch entrypoints” are a parallel tokenizer/model line (docs/wasm80m_runbook.md); they are not part of finishing dense 138M v20 chat.

The only architecture on the release path is the dense 138M line. Experimental dense_vm and dense_wasm80m work are isolated to separate branch/config families and do not share checkpoint compatibility with the dense mainline.

Current Artifacts

Preserved raw pretrain base: checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl
Active continuation config: configs/pretrain_mlx_138m_quality2k.yaml
Active continuation outputs: checkpoints/pretrain_mlx_138m_quality2k
Canonical SFT handoff: checkpoints/pretrain_mlx_138m_quality2k/selected_for_sft.pkl
active v20 SFT configs:
- configs/sft_release_v20_align.yaml
- configs/sft_release_v20_full.yaml
- configs/sft_release_v20_repair.yaml
active v20 SFT corpora:
- data/sft_chatml_v20_align.jsonl
- data/sft_chatml_v20_release.jsonl
- data/sft_chatml_v20_eval.jsonl
- data/sft_chatml_v20_repair.jsonl
active v20 shard directories:
- data/sft_chatml_shards_v20_align
- data/sft_chatml_shards_v20_release
- data/sft_chatml_eval_shards_v20_release
- data/sft_chatml_shards_v20_repair
v20 run outputs: checkpoints/sft_release_v20_* and reports/sft_release_v20_runs/*
best observed v20 repair probe, not promotable: checkpoints/sft_release_v20_repair_gatebridge/sft_step_1000.pkl
- broad raw: 106/120 scored checks, 68/80 scenarios, rewrite_rate=0.0
- broad guarded: 107/120 scored checks, 69/80 scenarios, rewrite_rate=0.1333
- blockers: raw arithmetic follow-up misses plus practical/factual lexical misses; guarded rewrite rate is above the 0.10 cap
v19 compatibility pin: checkpoints/sft_release_v19_repair/selected_for_future_work.pkl remains loadable evidence only until a v20 checkpoint passes broad raw/guarded eval, lineage checks, manifest hashes, and manual smoke prompts.
Eval commands, gate CLI, release bundle, and optional MLX smoke tests: docs/eval.md. Pin promotion, raw_reply vs reply, and gate_report.json retention: docs/sft_runbook.md (sections after Candidate Eval).
Mainline pin metadata for approved selections includes lineage fields: run_id, source_checkpoint, selected_step, gate_report_path, manifest_hash, and mainline_valid.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -e .
PYTHONPATH=src python scripts/setup_verification.py

Canonical Pretrain Continuation

Build the curated continuation corpus:

source .venv/bin/activate
PYTHONPATH=src python scripts/build_pretrain_quality2k.py

The active 138M continuation runtime contract is:

context: 2048 tokens
dropout: 0.0
compile: true
compile_granularity: microbatch
precision: bfloat16
micro_batch_size: 1
grad_accum_steps: 16
gradient_checkpointing: false

Run the continuation from Terminal:

cd /Users/admin/Downloads/VSCode/AnarchoBot
./scripts/run_pretrain_quality2k_terminal.sh

Start a fresh continuation explicitly:

cd /Users/admin/Downloads/VSCode/AnarchoBot
./scripts/run_pretrain_quality2k_terminal.sh --clean-run

Monitor the run:

source .venv/bin/activate
PYTHONPATH=src python scripts/metrics_window.py \
  --log-dir checkpoints/pretrain_mlx_138m_quality2k/logs \
  --config configs/pretrain_mlx_138m_quality2k.yaml

Validate the staged continuation checkpoints before extending the run:

source .venv/bin/activate
PYTHONPATH=src python scripts/validate_mainline_training.py grad-coverage \
  --config configs/pretrain_mlx_138m_quality2k.yaml \
  --checkpoint checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl

PYTHONPATH=src python scripts/validate_mainline_training.py checkpoint-diff \
  --config configs/pretrain_mlx_138m_quality2k.yaml \
  --start-checkpoint checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl \
  --end-checkpoint checkpoints/pretrain_mlx_138m_quality2k/mlx_step_11000.pkl

For the completed 12000 continuation run, the preserved candidate pool is 8000, 9000, 10000, 11000, and 12000. Earlier checkpoints rotated out under ckpt_keep: 5.

Canonical SFT Handoff

Select the checkpoint with the deterministic continuation handoff rule:

source .venv/bin/activate
PYTHONPATH=src python scripts/select_quality2k_checkpoint.py \
  --manifest examples/quality2k_selection_manifest.json \
  --print-pin-command

The selector uses held-out perplexity with earliest-step tie-break, and only blocks candidates for checkpoint-diff failure, non-finite/missing perplexity, or catastrophic plain-generation regression versus the base review.

Pin the chosen continuation checkpoint only after the clean rerun validations pass:

source .venv/bin/activate
PYTHONPATH=src python scripts/pin_quality2k_checkpoint.py \
  --checkpoint checkpoints/pretrain_mlx_138m_quality2k/mlx_step_11000.pkl \
  --mainline-valid \
  --artifact-role mainline_candidate \
  --validation-basis "base grad coverage + compile parity passed; checkpoint diff passed; held-out perplexity won preserved 8000-12000 pool; no catastrophic plain-generation regression vs base"

Export a Hugging Face token at runtime before rebuilding the canonical natural-chat slice:

export HF_TOKEN=...

Build the v20 SFT corpora:

source .venv/bin/activate
PYTHONPATH=src python scripts/build_sft_release.py --version v20 --clean-output

The standalone builder writes reports/sft_v20_release_build/build_summary.json. The shared runner writes per-run build reports under reports/sft_v20_release_builds/<run_id>/build_summary.json.

The current v20 build reports these manifest counts:

align: 6000 examples
release: 35000 examples
eval: 2000 examples
repair: 4000 examples

The shared runner now validates manifest_examples against these bands:

align: 5000-8000
release: 30000-45000
eval: >=1800
repair: 3500-5000

Run the v20 curriculum:

cd /Users/admin/Downloads/VSCode/AnarchoBot
PYTHONPATH=src .venv/bin/python scripts/run_sft_release_v20.py

Default v20 release controls include:

v20 corpus rebuild and manifest validation before training
align 4000, full 16000, repair 1000
repair shards use numeric-token loss weighting for digit-containing assistant tokens (6.0) to give arithmetic/structured utility errors enough gradient without changing checkpoint format
exact broad arithmetic gate prompts are filtered from all train/eval/repair splits; repair may use near-holdout arithmetic and gate-bridge chat examples for failure classes, not exact broad-gate prompts
dual-track raw/guarded gating with broad raw multi-turn chat required for promotion
rewrite-rate cap (<=0.10 by default); policy rewrites are a secondary safety net, not release proof

Current gate evidence says to rebuild the repair mix around the remaining raw failures rather than extend the same repair run blindly. The gatebridge probe improved over the failed v19 repair baseline, but it is still not a selected or release-ready checkpoint. The later sft_release_v20_repair_gatebridge_chatfix restart regressed to 102/120 raw scored checks and 65/80 raw scenarios, so it is also diagnostic-only.

Before the full run, use scripts/benchmark_sft_throughput.py for short checkpoint-compatible probes of micro-batch, compile, and prefetch settings.

selected_for_sft.pkl is now blocked from the canonical SFT path unless its sibling metadata file marks it mainline_valid: true.

Run the static dense-mainline audit at any time without touching training:

source .venv/bin/activate
PYTHONPATH=src python scripts/audit_dense_mainline.py \
  --json-output reports/pretrain_quality2k_review/static_dense_audit.json

Tests

source .venv/bin/activate
pip install pytest
PYTHONPATH=src pytest

Optional MLX checkpoint smoke tests (loads weights on GPU; set ANARCHOBOT_CANONICAL_CKPT to a v20 candidate once one exists, otherwise the legacy v19 repair pin remains the compatibility smoke default):

ANARCHOBOT_RUN_MLX_TESTS=1 PYTHONPATH=src pytest -m mlx_checkpoint tests/test_canonical_checkpoint.py

Generated Artifact Policy

Repo-tracked content is source, prompts, configs, tests, docs, and curated evidence.

Runtime artifacts are intentionally untracked:

continuation checkpoints
generated shard directories
runtime reports
transient build JSONL/message dumps

Preserved historical evidence lives under legacy_evidence/.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
archive		archive
checkpoints		checkpoints
configs		configs
data		data
docs		docs
examples		examples
research/wasm80m		research/wasm80m
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
HANDOFF_2026-02-23.md		HANDOFF_2026-02-23.md
LICENSE		LICENSE
README.md		README.md
debug_sft.jsonl		debug_sft.jsonl
mlx_pid.txt		mlx_pid.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AnarchoBot (MLX, Apple Silicon)

Active Workflow

Current Artifacts

Setup

Canonical Pretrain Continuation

Canonical SFT Handoff

Tests

Generated Artifact Policy

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AnarchoBot (MLX, Apple Silicon)

Active Workflow

Current Artifacts

Setup

Canonical Pretrain Continuation

Canonical SFT Handoff

Tests

Generated Artifact Policy

Docs

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages