Skip to content

lukifer23/AnarchoBot

Repository files navigation

AnarchoBot (MLX, Apple Silicon)

138M ChatML training stack for Apple Silicon using MLX.

Canonical remote branch: main. Historical legacy-default state is preserved at tag archive/origin-master-2026-03-20.

Active Workflow

The repo now treats one path as first-class:

clean Quality2K continuation -> explicitly approved pinned checkpoint -> v20 align/full/repair SFT recovery curriculum -> broad raw chat gate

The v19 repair passed a narrow gate but fails broad chat. Keep v19 checkpoints, loaders, tokenizer, model topology, ChatML rendering, and assistant-only loss masking compatible; do not overwrite or repoint v19 artifacts while v20 is being trained. The latest v19 canonical run state is incomplete/stale for coherent multi-turn chat decisions, so v20 is now the active recovery path.

Active entrypoints:

  • scripts/build_pretrain_quality2k.py
  • scripts/run_pretrain_quality2k_terminal.sh
  • scripts/audit_dense_mainline.py
  • scripts/review_plain_generation.py
  • scripts/select_quality2k_checkpoint.py
  • scripts/pin_quality2k_checkpoint.py
  • scripts/build_sft_release.py
  • scripts/run_sft_release.py
  • scripts/run_sft_release_v20.py
  • scripts/run_multiturn_coherence_eval.py (scored raw/guarded broad chat suite; see SFT Runbook)

Research branch entrypoints:

  • scripts/extend_tokenizer_with_vm_tokens.py
  • scripts/build_vm_pilot_dataset.py
  • scripts/init_vm_from_dense.py
  • scripts/extend_tokenizer_with_wasm_tokens.py
  • scripts/normalize_local_docs.py
  • scripts/build_wasm_subset_corpus.py
  • scripts/build_wasm80m_pretrain_corpus.py
  • scripts/build_wasm80m_sft_corpora.py
  • scripts/run_wasm80m_pretrain.py
  • scripts/run_wasm80m_sft.py
  • scripts/eval_wasm80m.py

Historical probe-era and experimental material is retained only as archived reference. See Archive Notes.

Historical dense shims:

  • scripts/build_sft_v19_release.py
  • scripts/run_sft_release_v19.py
  • scripts/build_sft_v18_release.py
  • scripts/run_sft_release_v18.py
  • scripts/run_sft_release_v18_terminal.sh These remain compatibility shims only and are non-authoritative for release decisions.

The WASM80m scripts listed under “Research branch entrypoints” are a parallel tokenizer/model line (docs/wasm80m_runbook.md); they are not part of finishing dense 138M v20 chat.

The only architecture on the release path is the dense 138M line. Experimental dense_vm and dense_wasm80m work are isolated to separate branch/config families and do not share checkpoint compatibility with the dense mainline.

Current Artifacts

  • Preserved raw pretrain base: checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl
  • Active continuation config: configs/pretrain_mlx_138m_quality2k.yaml
  • Active continuation outputs: checkpoints/pretrain_mlx_138m_quality2k
  • Canonical SFT handoff: checkpoints/pretrain_mlx_138m_quality2k/selected_for_sft.pkl
  • active v20 SFT configs:
    • configs/sft_release_v20_align.yaml
    • configs/sft_release_v20_full.yaml
    • configs/sft_release_v20_repair.yaml
  • active v20 SFT corpora:
    • data/sft_chatml_v20_align.jsonl
    • data/sft_chatml_v20_release.jsonl
    • data/sft_chatml_v20_eval.jsonl
    • data/sft_chatml_v20_repair.jsonl
  • active v20 shard directories:
    • data/sft_chatml_shards_v20_align
    • data/sft_chatml_shards_v20_release
    • data/sft_chatml_eval_shards_v20_release
    • data/sft_chatml_shards_v20_repair
  • v20 run outputs: checkpoints/sft_release_v20_* and reports/sft_release_v20_runs/*
  • best observed v20 repair probe, not promotable: checkpoints/sft_release_v20_repair_gatebridge/sft_step_1000.pkl
    • broad raw: 106/120 scored checks, 68/80 scenarios, rewrite_rate=0.0
    • broad guarded: 107/120 scored checks, 69/80 scenarios, rewrite_rate=0.1333
    • blockers: raw arithmetic follow-up misses plus practical/factual lexical misses; guarded rewrite rate is above the 0.10 cap
  • v19 compatibility pin: checkpoints/sft_release_v19_repair/selected_for_future_work.pkl remains loadable evidence only until a v20 checkpoint passes broad raw/guarded eval, lineage checks, manifest hashes, and manual smoke prompts.
  • Eval commands, gate CLI, release bundle, and optional MLX smoke tests: docs/eval.md. Pin promotion, raw_reply vs reply, and gate_report.json retention: docs/sft_runbook.md (sections after Candidate Eval).
  • Mainline pin metadata for approved selections includes lineage fields: run_id, source_checkpoint, selected_step, gate_report_path, manifest_hash, and mainline_valid.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -e .
PYTHONPATH=src python scripts/setup_verification.py

Canonical Pretrain Continuation

Build the curated continuation corpus:

source .venv/bin/activate
PYTHONPATH=src python scripts/build_pretrain_quality2k.py

The active 138M continuation runtime contract is:

  • context: 2048 tokens
  • dropout: 0.0
  • compile: true
  • compile_granularity: microbatch
  • precision: bfloat16
  • micro_batch_size: 1
  • grad_accum_steps: 16
  • gradient_checkpointing: false

Run the continuation from Terminal:

cd /Users/admin/Downloads/VSCode/AnarchoBot
./scripts/run_pretrain_quality2k_terminal.sh

Start a fresh continuation explicitly:

cd /Users/admin/Downloads/VSCode/AnarchoBot
./scripts/run_pretrain_quality2k_terminal.sh --clean-run

Monitor the run:

source .venv/bin/activate
PYTHONPATH=src python scripts/metrics_window.py \
  --log-dir checkpoints/pretrain_mlx_138m_quality2k/logs \
  --config configs/pretrain_mlx_138m_quality2k.yaml

Validate the staged continuation checkpoints before extending the run:

source .venv/bin/activate
PYTHONPATH=src python scripts/validate_mainline_training.py grad-coverage \
  --config configs/pretrain_mlx_138m_quality2k.yaml \
  --checkpoint checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl

PYTHONPATH=src python scripts/validate_mainline_training.py checkpoint-diff \
  --config configs/pretrain_mlx_138m_quality2k.yaml \
  --start-checkpoint checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl \
  --end-checkpoint checkpoints/pretrain_mlx_138m_quality2k/mlx_step_11000.pkl

For the completed 12000 continuation run, the preserved candidate pool is 8000, 9000, 10000, 11000, and 12000. Earlier checkpoints rotated out under ckpt_keep: 5.

Canonical SFT Handoff

Select the checkpoint with the deterministic continuation handoff rule:

source .venv/bin/activate
PYTHONPATH=src python scripts/select_quality2k_checkpoint.py \
  --manifest examples/quality2k_selection_manifest.json \
  --print-pin-command

The selector uses held-out perplexity with earliest-step tie-break, and only blocks candidates for checkpoint-diff failure, non-finite/missing perplexity, or catastrophic plain-generation regression versus the base review.

Pin the chosen continuation checkpoint only after the clean rerun validations pass:

source .venv/bin/activate
PYTHONPATH=src python scripts/pin_quality2k_checkpoint.py \
  --checkpoint checkpoints/pretrain_mlx_138m_quality2k/mlx_step_11000.pkl \
  --mainline-valid \
  --artifact-role mainline_candidate \
  --validation-basis "base grad coverage + compile parity passed; checkpoint diff passed; held-out perplexity won preserved 8000-12000 pool; no catastrophic plain-generation regression vs base"

Export a Hugging Face token at runtime before rebuilding the canonical natural-chat slice:

export HF_TOKEN=...

Build the v20 SFT corpora:

source .venv/bin/activate
PYTHONPATH=src python scripts/build_sft_release.py --version v20 --clean-output

The standalone builder writes reports/sft_v20_release_build/build_summary.json. The shared runner writes per-run build reports under reports/sft_v20_release_builds/<run_id>/build_summary.json.

The current v20 build reports these manifest counts:

  • align: 6000 examples
  • release: 35000 examples
  • eval: 2000 examples
  • repair: 4000 examples

The shared runner now validates manifest_examples against these bands:

  • align: 5000-8000
  • release: 30000-45000
  • eval: >=1800
  • repair: 3500-5000

Run the v20 curriculum:

cd /Users/admin/Downloads/VSCode/AnarchoBot
PYTHONPATH=src .venv/bin/python scripts/run_sft_release_v20.py

Default v20 release controls include:

  • v20 corpus rebuild and manifest validation before training
  • align 4000, full 16000, repair 1000
  • repair shards use numeric-token loss weighting for digit-containing assistant tokens (6.0) to give arithmetic/structured utility errors enough gradient without changing checkpoint format
  • exact broad arithmetic gate prompts are filtered from all train/eval/repair splits; repair may use near-holdout arithmetic and gate-bridge chat examples for failure classes, not exact broad-gate prompts
  • dual-track raw/guarded gating with broad raw multi-turn chat required for promotion
  • rewrite-rate cap (<=0.10 by default); policy rewrites are a secondary safety net, not release proof

Current gate evidence says to rebuild the repair mix around the remaining raw failures rather than extend the same repair run blindly. The gatebridge probe improved over the failed v19 repair baseline, but it is still not a selected or release-ready checkpoint. The later sft_release_v20_repair_gatebridge_chatfix restart regressed to 102/120 raw scored checks and 65/80 raw scenarios, so it is also diagnostic-only.

Before the full run, use scripts/benchmark_sft_throughput.py for short checkpoint-compatible probes of micro-batch, compile, and prefetch settings.

selected_for_sft.pkl is now blocked from the canonical SFT path unless its sibling metadata file marks it mainline_valid: true.

Run the static dense-mainline audit at any time without touching training:

source .venv/bin/activate
PYTHONPATH=src python scripts/audit_dense_mainline.py \
  --json-output reports/pretrain_quality2k_review/static_dense_audit.json

Tests

source .venv/bin/activate
pip install pytest
PYTHONPATH=src pytest

Optional MLX checkpoint smoke tests (loads weights on GPU; set ANARCHOBOT_CANONICAL_CKPT to a v20 candidate once one exists, otherwise the legacy v19 repair pin remains the compatibility smoke default):

ANARCHOBOT_RUN_MLX_TESTS=1 PYTHONPATH=src pytest -m mlx_checkpoint tests/test_canonical_checkpoint.py

Generated Artifact Policy

Repo-tracked content is source, prompts, configs, tests, docs, and curated evidence.

Runtime artifacts are intentionally untracked:

  • continuation checkpoints
  • generated shard directories
  • runtime reports
  • transient build JSONL/message dumps

Preserved historical evidence lives under legacy_evidence/.

Docs

About

138M param ChatML training stack optimized for Apple Silicon via MLX. Features a curated Quality2K continuation curriculum and v18 SFT alignment.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors