All new development of bioAF has moved to github.com/bioAF/bioAF.
This repository is preserved for historical reference only. Issues, pull requests, and updates should be directed to the new repository. The documentation below is retained as-is and may be out of date.
Computational Biology Automation Framework
A turnkey computational biology platform for small biotech companies (5-50 researchers), deployed on Google Cloud Platform. bioAF provides a web-based control plane for managing HPC clusters, notebook environments, pipeline engines, and data visualization tools -- all provisioned through UI-driven Terraform.
- Experiment Tracking - MINSEQE-compliant metadata, sample management, batch processing, project organization
- Compute Orchestration - Kubernetes (GKE) compute via the BioAF Adapter Layer, JupyterHub/RStudio notebooks, versioned compute environments, auto-scaling, Cloud Build image pipeline
- Pipeline Engine - Nextflow integration, custom pipelines, pipeline catalog, run monitoring, parameter management
- Data Management - File upload/download, dataset browser, GCS storage integration, GEO export, SuperSeries cross-experiment packaging
- Results & Visualization - QC dashboards, cellxgene single-cell viewer, plot archive, search
- SSH Access - One-click kubectl exec into running pipeline jobs and notebook sessions
- Notifications - Event-driven alerts via in-app, email (SMTP), and Slack (OAuth integration)
- Cost Center - GCP billing integration, budget alerts, component cost breakdown, projections
- Backup & Recovery - 4-tier GCS backups (pg_dump, GCS versioning, platform config, terraform state), restore with review period
- Session Credentials - Per-user RStudio credentials with PAM authentication, auto-generated usernames
- Role-Based Access - Permission-based RBAC with four built-in roles, custom role creation, and per-resource/action grants
- Upgrade System - GitHub-based version checking, managed upgrade flow with rollback
- Audit Log - Immutable audit trail with filtering, pagination, and human-readable descriptions
- GitOps - Version-controlled platform configuration with diff and rollback
A computational biologist registers an experiment, links FASTQ files (uploaded or auto-ingested from a sequencer drop), selects a pipeline from the catalog (nf-core/scrnaseq, rnaseq, or custom), and launches a run. The BioAF Adapter Layer handles everything below that: staging inputs from GCS, submitting Kubernetes Jobs to GKE Autopilot, monitoring execution via Nextflow trace parsing, collecting outputs back to GCS, and transitioning the experiment through its status lifecycle (registered -> library_prep -> sequencing -> fastq_uploaded -> processing -> pipeline_complete -> [reviewed ->] analysis -> complete). Pipeline completion triggers event-driven notifications (in-app, email, Slack), and results are browsable through the plot archive, cellxgene viewer, and GEO export tools. Jupyter and RStudio sessions run as Kubernetes Pods with GCS-backed home directories and SSH access. RStudio sessions use per-user PAM authentication (ADR-030), and notebook container images are managed as versioned environments (ADR-033), built automatically via Cloud Build (ADR-031).
The adapter layer (ADR-020) abstracts compute, storage, and notebook providers behind clean interfaces, so all application logic is decoupled from infrastructure specifics. Today that means GKE + GCS (ADR-021, ADR-022).
Infrastructure is provisioned through UI-driven Terraform (ADR-007) -- researchers never touch HCL. All secrets live in Secret Manager (ADR-008), all actions are recorded in an immutable audit log (ADR-009), and data portability is guaranteed (ADR-012).
See all architecture decision records in decisions/README.md.
- Docker and Docker Compose
- Git
- openssl (for secret generation)
Run this on your local machine to provision a GCP VM and get started:
curl -fsSL https://raw.githubusercontent.com/not-that-guy-again/bioAF/main/install-gcp.sh | bashThe script sets up gcloud, creates a VM with Docker, and walks you through the process. Once the VM is ready, SSH in and run:
git clone https://github.com/not-that-guy-again/bioAF.git
cd bioAF
./bioaf setupIf you already have a Linux server with Docker installed:
git clone https://github.com/not-that-guy-again/bioAF.git
cd bioAF
./bioaf setupThe setup command handles everything: checks prerequisites, generates
secrets and TLS certs, pulls pre-built images, runs migrations, and prints
a one-time setup code. Open the URL it shows in your browser and enter the
code to create your admin account and configure the platform.
| Command | Description |
|---|---|
./bioaf setup |
First-run setup (pulls images, generates secrets, prints setup code) |
./bioaf start |
Start all services in dependency order |
./bioaf stop |
Stop all services |
./bioaf restart |
Restart all services |
./bioaf status |
Show service status |
./bioaf logs [service] |
Tail logs (all or one service) |
./bioaf build [service] |
Build container images locally (development only) |
./bioaf migrate |
Run database migrations |
./bioaf migrate-down <rev> |
Downgrade database to a specific revision |
./bioaf seed <script.py> |
Run a seed/data script in the backend container |
./bioaf backup |
Create a database backup |
./bioaf update [version] |
Update to latest (or specific) version |
./bioaf reset-db |
Destroy and recreate the database (with confirmation) |
./bioaf shell [service] |
Open a shell in a container (default: backend) |
./bioaf dbshell |
Open a psql session to the database |
./bioaf register-outputs |
Register pipeline output files from GCS |
./bioaf help |
Show all commands |
See the full Deployment Guide for detailed instructions.
- Quickstart - Documentation hub
- Deployment Guide - Full deployment walkthrough
- Bench Scientist Guide - Experiments, samples, results
- Computational Biologist Guide - Pipelines, notebooks, environments
- Admin Guide - User management, costs, backups, notifications
- Life After bioAF - Data portability after teardown
- ADR Index - Architecture Decision Records
- SSH Access Guide - Connecting to running workloads
- GEO Export Guide - Exporting to NCBI GEO
- Reference Data Guide - Managing reference genomes and annotations
- Compute Stack Setup - Kubernetes configuration
# Start backend, frontend, and PostgreSQL
docker compose -f docker/docker-compose.dev.yml up
# Backend: http://localhost:8000
# Frontend: http://localhost:3000
# Postgres: localhost:5432# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt -r requirements-dev.txt
uvicorn app.main:app --reload
# Frontend
cd frontend
npm install
npm run dev
# Database (requires PostgreSQL 16)
cd backend
alembic upgrade head# Backend tests (requires PostgreSQL)
docker compose -f docker/docker-compose.dev.yml up -d db
cd backend && python -m pytest tests/ -v
# Frontend tests
cd frontend && npm testbioAF manages these infrastructure components through its UI:
| Component | Category | Compute Stack | Dependencies |
|---|---|---|---|
| GKE Cluster | Compute | Kubernetes | None |
| GCS Buckets | Storage | Kubernetes | GKE |
| JupyterHub | Notebooks | Kubernetes | Compute, Storage |
| RStudio Server | Notebooks | Kubernetes | Compute, Storage |
| Nextflow | Pipelines | Kubernetes | Compute |
| cellxgene | Visualization | Any | None |
| QC Dashboard | Visualization | Any | None |
bioAF/
backend/ FastAPI application
frontend/ Next.js 14 application
docker/ Dockerfiles, compose, and nginx config
terraform/ GCP infrastructure as code
helm/ Kubernetes deployment chart
decisions/ Architecture Decision Records
documentation/ Product and architecture specs
docs/ User-facing documentation
scripts/ Utility scripts (seed data, update agent)
tests/shell/ BATS tests for install.sh and bioaf scripts
bioaf Management script (entry point)
install.sh First-time installer (prereq checks + env generation)
install-gcp.sh One-command GCP provisioning script
See the ADRs in decisions/ for architectural context before making changes. All infrastructure changes must go through the UI-driven Terraform workflow (ADR-007). The audit log is immutable by design (ADR-009).
