AtlasStack is an autonomous software engineering engine that analyzes GitHub repositories using Gemini 1.5 Flash and Qwen2.5-Coder, finds bugs, explains architecture, and proposes fixes — including an embedded web IDE experience.
- Deep Repo Analysis: Clones any public GitHub repo and produces a structured report: architecture, data flow, important files, security fixes, and a health score.
- Embedded Web IDE: A full VS Code-like development environment directly in the browser.
- "Explain Like I'm 10": Toggleable ELI5 summaries for every codebase analysis.
- Lite Mode Backend: Runs entirely on Python + SQLite — no Docker needed.
- Premium CLI Tool: Analyze repositories and view history directly from your terminal.
- Autonomous Training Data Collection: Captures scan inputs and LLM outputs to build a high-quality fine-tuning dataset automatically.
- Enterprise Authentication: Secured by Clerk, providing out-of-the-box SSO, OAuth, and user management.
AtlasStack is built as a distributed system of specialized services coordinated via an API Gateway.
flowchart TD
A[Client / UI / CLI] -->|REST| B(API Gateway)
B -->|Queue| C(Analysis Worker)
C -->|Model calls| D(LLM Service)
C -->|Index| E(Knowledge Service)
E -->|Search| B
subgraph Data Stores
F[(PostgreSQL)]
G[(Redis)]
H[(Neo4j)]
I[(Weaviate)]
end
B --> F
B --> G
C --> F
C --> G
E --> H
E --> I
For more details, see the Architecture Deep Dive.
- Frontend: React, TypeScript, Vite, Monaco Editor, Tailwind CSS, Framer Motion
- Backend (Lite Mode): Python 3.12, FastAPI, SQLite, Gemini 1.5 Flash (Google) or Qwen2.5-Coder (HuggingFace)
- Message Broker: RabbitMQ
- Graph Database: Neo4j (Code dependency maps)
- Vector Database: Weaviate (Semantic code search)
- Caching & Sessions: Redis
- Primary Data: PostgreSQL
- Observability: Prometheus, Grafana, Jaeger (Tracing)
- Orchestration: Docker Compose, Kubernetes
- Python 3.12+
- Node.js 18+
- A Gemini API Key (Recommended) or a HuggingFace token
cp .env.example .env
cp clients/web/.env.example clients/web/.env.local
# Edit .env — at minimum set HF_TOKEN
# Edit clients/web/.env.local — set VITE_CLERK_PUBLISHABLE_KEY# Recommended: Use a virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -e . # Registers the 'atlas' commandpython test_app2.py
# → http://localhost:8005
# → API docs at http://localhost:8005/docscd clients/web
npm install
npm run dev
# → http://localhost:3000- Open http://localhost:3000
- Click Sign In / Register → create an account
- Enter a public GitHub repo URL and click Start Analysis
- The AI will clone the repo, analyze it, and return a full report
No AI Key? The analysis endpoint falls back to a mock response. Set
GEMINI_API_KEYorHF_TOKENin.envto enable real AI analysis.
The AtlasStack VS Code extension brings AI-driven analysis directly to your workspace.
- Build/Install: Open clients/vscode and follow the README to build the
.vsix. - Connect: Open VS Code settings and set
atlasstack.serverUrlto your backend (default:http://localhost:8005). - Analyze: Use the AtlasStack icon in the Activity Bar to run deep repository scans.
The AtlasStack CLI is a powerful tool for developers who live in the terminal.
-
Create Environment:
python -m venv .venv .\.venv\Scripts\Activate.ps1 # Windows # source .venv/bin/activate # macOS/Linux
-
Install CLI:
pip install -r requirements.txt pip install -e .
| Command | Description |
|---|---|
atlas scan . |
Scan the current local directory |
atlas analyze <url> |
Analyze a remote GitHub repository |
atlas history |
View your past analysis records |
atlas view <id> |
See detailed results for a specific scan |
atlas config |
Update your Hugging Face credentials |
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /api/v1/analysis/mvp |
No | Full repo analysis |
| GET | /api/v1/analyses |
Yes | List your past analyses |
| GET | /api/v1/analyses/{id} |
Yes | Get analysis detail |
| POST | /api/v1/repositories |
Yes | Register a repo |
| GET | /api/v1/repositories |
Yes | List your repos |
| POST | /api/v1/repositories/{id}/analyze |
Yes | Trigger analysis |
| GET | /health |
No | Health check |
Full interactive docs at http://localhost:8005/docs
atlasstack/
├── clients/
│ ├── web/ # Vite + React frontend (landing page + IDE)
│ └── vscode/ # VS Code extension
├── services/
│ ├── api/ # FastAPI gateway (auth, repos, analysis)
│ ├── analysis/ # Celery workers (AST, security, perf)
│ ├── llm/ # LLM inference service
│ └── knowledge/ # Neo4j + Weaviate knowledge graph
├── shared/ # Shared Pydantic models + utilities
├── k8s/ # Kubernetes manifests
├── monitoring/ # Prometheus + Grafana + Jaeger configs
├── test_app2.py # Lite Mode entry point
└── docker-compose.yml # Full stack deployment
AtlasStack supports custom model fine-tuning via training/. You can run the pipeline to adapt the analysis engine to specific coding standards or languages.
# Run Supervised Fine-Tuning (SFT)
make train-sft
# Run RLHF pipeline
make train-rlhfAtlasStack can automatically harvest its own analysis results to build a specialized dataset for future training.
- Storage: Data is saved in JSONL format for easy ingestion by the
DatasetLoader. - Default Path:
training/datasets/collected_scans.jsonl - Toggle: Enable or disable via
COLLECT_TRAINING_DATAin your.env.
Every collected entry includes the system instruction, the repository context, and the structured LLM response, along with metadata like the repo URL and scan timestamp.
| Command | Description |
|---|---|
make up |
Start full Docker stack |
make test |
Run unit + integration tests |
make lint |
Run flake8 and pylint |
make format |
Format code with black/isort |
make benchmark |
Run performance test suite |
make clean |
Remove all containers & cache |
For a full enterprise-grade deployment with PostgreSQL, Redis, and RabbitMQ:
- Prepare Environment:
cp .env.example .env # Fill in required secrets (HF_TOKEN, DATABASE_URL, etc.) - Launch Stack:
docker-compose up -d
- Run Migrations:
docker-compose exec api alembic upgrade head
Deployment manifests are located in /k8s.
kubectl apply -f k8s/| Issue | Solution |
|---|---|
| White Screen in IDE | Ensure Cross-Origin-Opener-Policy headers are set. Check browser console for Clerk errors. |
| Analysis Timed Out | Increase MAX_ANALYSIS_TIME in .env. Default is 300s. |
| "git clone" fails | Verify the repository is public or your GITHUB_TOKEN has proper scopes. |
| Database connection error | If using LITE_MODE=false, ensure PostgreSQL is running and DATABASE_URL is correct. |
| Alembic migration failed | Run ENVIRONMENT=development alembic upgrade head to see detailed logs. |
| Variable | Default | Description |
|---|---|---|
HF_TOKEN |
- | HuggingFace Token (required for real AI) |
GEMINI_API_KEY |
- | Google Gemini API Key (Recommended) |
ENVIRONMENT |
production |
Set to development for verbose logging and bypass strict security checks |
JWT_SECRET |
- | Secret for JWT signing (Required in production) |
LITE_MODE |
true |
Toggle between SQLite and full Postgres stack |
VITE_CLERK_PUBLISHABLE_KEY |
- | Clerk Public Key for Frontend Auth |
DEFAULT_MODEL |
Qwen2.5-Coder |
The LLM used for analysis |
COLLECT_TRAINING_DATA |
true |
Toggle scan data harvesting |
RATE_LIMIT_REQUESTS |
100 |
Max requests per window |
RATE_LIMIT_WINDOW |
60 |
Window size in seconds |
See .env.example for the full list of configuration options.
- Authentication: All user authentication, session management, and SSO is handled securely via Clerk (frontend) and JWT (backend).
- Vulnerability Scanning: CI/CD includes automated scans via
bandit,safety, andsemgrep. - Execution Sandboxing: Repository analysis runs in isolated temporary directories with strict timeouts and size limits.
- LLM Data Privacy: Local LLM inference via Lite Mode ensures your codebase never leaves your infrastructure (unless using external HuggingFace/Google Inference APIs).
Apache License 2.0 — see LICENSE for details.

