Skip to content

Senkichi/job-cannon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

698 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Job Cannon

Job Cannon

A personal job-search command center: aggregates listings from Gmail alerts and SERP APIs, proactively scrapes career pages from a curated company watchlist (5-platform ATS coverage β€” Greenhouse / Lever / Ashby / SmartRecruiters / Workday β€” plus a tier-4 AI navigator for custom sites), scores everything with a cascade-routed AI pipeline (free local + free cloud providers, with Anthropic as paid fallback), and tracks application state. Single-user, runs on localhost.

CI Coverage Python 3.13+ Ruff License: AGPL v3

Engineering Highlights

  • Single-tier ordinal scoring through a multi-provider cascade. Every job runs through one 'scoring' tier with a six-axis ordinal rubric. The cascade tries free providers first (Ollama local β†’ Groq β†’ Cerebras β†’ Gemini) and falls through to Anthropic only when all free options are exhausted or rate-limited. Phase 33 shootout selected qwen2.5:14b (Ollama) as the production primary; typical monthly cost is ~$0. Classification (apply | consider | skip | reject) is derived in Python from the numeric sub-scores β€” never emitted by the LLM β€” which prevents classification drift across model swaps.
  • Schema-versioned SQLite migrations. 48 idempotent migrations applied via pragma user_version. Migration 41 introduces a backup-recency preflight that refuses destructive schema changes without a recent userdata snapshot (override via GSD_BACKUP_CONFIRMED=1 for alternate backup schemes).
  • Background scheduler with cross-process safety. APScheduler 3.x with a pidfile + psutil liveness check β€” survives Flask reloads, single-instance enforced. Auto-starts a local Ollama service for the nightly agentic-backfill tier.
  • HTMX-only frontend. No JS framework, no bundler, no build step. Inline expansion, partial fragments, server-driven UI. 36 Jinja2 templates, Tailwind via CDN, SortableJS for the kanban.
  • ATS coverage across 5 platforms with a tier-4 AI navigator. Greenhouse, Lever, Ashby, SmartRecruiters, and Workday have explicit scanners; the AI navigator caches Playwright recipes (16 active) for the long-tail of custom-built career sites (iCIMS, Phenom, UKG, bespoke).
  • Eval harness with paired MAE + BCa bootstrap 95% CIs for prompt-variant A/B testing across the full provider matrix (Ollama-local, Groq, Cerebras, Gemini, Anthropic).
  • Cost-gated execution. Configurable monthly budget cap; the cost-gate returns a bool and lets callers decide whether to fail-open or raise β€” the orchestrator and the scheduler choose differently and that's intentional.
  • 2163 tests (unit + integration + Playwright e2e) green on the CI matrix (Ubuntu + Windows Γ— Python 3.13).

Quick Start

git clone https://github.com/Senkichi/job-cannon.git
cd job-cannon
uv sync --extra dev --extra eval

# First run only β€” DO NOT run if config.yaml or .env already exist:
if (-not (Test-Path config.yaml)) { Copy-Item config.example.yaml config.yaml }
if (-not (Test-Path .env))        { Copy-Item .env.example .env }
# Add ANTHROPIC_API_KEY to .env (https://console.anthropic.com/settings/keys)

uv run job-cannon
# Open http://localhost:5000

For Gmail OAuth setup and full configuration reference, see docs/SETUP.md.

Architecture

flowchart LR
  Gmail[Gmail Alerts<br/>LinkedIn / Glassdoor / ZipRecruiter] --> Parser
  SerpAPI --> Parser
  JSearch --> Parser
  Thordata --> Parser
  DataForSEO --> Parser
  ATS[ATS Scanners<br/>Greenhouse / Lever / Ashby / Workday / SmartRecruiters] --> Parser
  Parser[Source Parsers<br/>+ Normalize] --> DB[(SQLite + WAL)]
  DB --> Score[Cascade Scoring<br/>six-axis ordinal rubric]
  Score -->|tries in order| Cascade{{Ollama qwen2.5:14b<br/>β†’ Groq β†’ Cerebras<br/>β†’ Gemini β†’ Anthropic}}
  Cascade --> Classify[Python-derived<br/>classification]
  Classify --> Dashboard[Flask + HTMX<br/>localhost:5000]
  DB --> Pipeline[Application<br/>Pipeline Tracker]
  Pipeline --> Dashboard
Loading

For deeper subsystem detail, see docs/architecture/.

Tech Stack

Layer Tooling
Runtime Python 3.13, Flask 3.1, APScheduler 3.x
Storage SQLite (WAL mode) β€” raw SQL, no ORM
Frontend Jinja2 + jinja2-fragments, HTMX 2.x, Tailwind (CDN), SortableJS
AI Multi-provider cascade: Ollama (qwen2.5:14b primary) β†’ Groq β†’ Cerebras β†’ Gemini β†’ Anthropic SDK (paid fallback)
Sources Gmail API v1 (OAuth), SerpAPI, JSearch, Thordata, DataForSEO
Tooling uv (canonical), ruff, pre-commit, gitleaks, commitizen, pytest
CI GitHub Actions (Ubuntu + Windows matrix), Codecov upload

Project Structure

job_finder/
|-- web/                    # Flask app (11 blueprints, scheduler, AI clients, ATS)
|-- parsers/                # Email parsers (LinkedIn, Glassdoor, ZipRecruiter, Indeed stub)
|-- sources/                # Data sources (Gmail, SerpAPI, JSearch, Thordata, DataForSEO)
|-- scoring/                # Single-tier ordinal scoring + six-axis rubric helpers
|-- eval/                   # Eval harness + bootstrap CIs
|-- models.py               # Job dataclass with dedup_key
|-- config.py               # YAML config loader + path discovery
|-- __main__.py             # `uv run job-cannon` entry point
`-- db/                     # SQLite persistence (raw SQL, no ORM); package since S7d (2026-05-06)
tests/                      # 2163 tests, unit + integration + e2e
docs/
|-- SETUP.md                # Gmail OAuth, config reference, troubleshooting
`-- architecture/           # Subsystem deep-dives

The 11 blueprints: admin, batch_scoring, companies, costs, dashboard, detections, jobs, pipeline, profile, settings, sync.

Cost Estimates

The cascade tries free providers first, so typical monthly AI cost is ~$0. Anthropic only enters the picture as a paid fallback when every free provider in the chain is exhausted or rate-limited.

Provider Cost When
Ollama (qwen2.5:14b local) $0 Primary β€” runs locally
Groq / Cerebras / Gemini free tiers $0 Each gated by per-day request limits
Anthropic (paid fallback) ~$0.05–0.15 per job scored Only when all free providers exhausted

A configurable budget cap (default $25/mo, set in config.yaml under scoring.monthly_budget_usd) limits the Anthropic fallback if it ever activates. The app stops paid scoring when the cap is reached and resumes the next month.

Optional SERP sources: SerpAPI, JSearch, Thordata, and DataForSEO are all opt-in. Each has its own pricing tier β€” see config.example.yaml for details.

Platform Compatibility

  • Developed on Windows 11, tested with Python 3.13.
  • macOS / Linux supported (no Windows-only code paths). The repo's .githooks/ are bash; on Windows use Git Bash.
  • SQLite ships with Python β€” no separate database install.
  • No Docker, no cloud services, no deployment required.

Running Tests

uv run --active pytest -q --tb=short        # full suite
uv run --active pytest -m "not e2e"         # skip Playwright e2e tier
uv run --active pytest tests/test_db.py -v  # one file

Tests use temp SQLite databases and a mocked Anthropic client β€” no API keys needed for unit / integration. The e2e tier requires uv run --active playwright install chromium once.

Documentation

License

GNU AGPL v3.0 or later β€” see LICENSE.

About

Aggregates jobs from Gmail/SERPs and proactively scrapes a curated company watchlist (5-platform ATS coverage + AI navigator fallback). Cascade-routed AI scoring + local pipeline tracker. Flask + HTMX + SQLite. Single-user, localhost.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages