Skip to content
This repository was archived by the owner on May 11, 2026. It is now read-only.

brent-mills-engineering/bioAF

Repository files navigation

This repository has moved

All new development of bioAF has moved to github.com/bioAF/bioAF.

This repository is preserved for historical reference only. Issues, pull requests, and updates should be directed to the new repository. The documentation below is retained as-is and may be out of date.


bioAF

bioAF

Computational Biology Automation Framework

A turnkey computational biology platform for small biotech companies (5-50 researchers), deployed on Google Cloud Platform. bioAF provides a web-based control plane for managing HPC clusters, notebook environments, pipeline engines, and data visualization tools -- all provisioned through UI-driven Terraform.

Features

  • Experiment Tracking - MINSEQE-compliant metadata, sample management, batch processing, project organization
  • Compute Orchestration - Kubernetes (GKE) compute via the BioAF Adapter Layer, JupyterHub/RStudio notebooks, versioned compute environments, auto-scaling, Cloud Build image pipeline
  • Pipeline Engine - Nextflow integration, custom pipelines, pipeline catalog, run monitoring, parameter management
  • Data Management - File upload/download, dataset browser, GCS storage integration, GEO export, SuperSeries cross-experiment packaging
  • Results & Visualization - QC dashboards, cellxgene single-cell viewer, plot archive, search
  • SSH Access - One-click kubectl exec into running pipeline jobs and notebook sessions
  • Notifications - Event-driven alerts via in-app, email (SMTP), and Slack (OAuth integration)
  • Cost Center - GCP billing integration, budget alerts, component cost breakdown, projections
  • Backup & Recovery - 4-tier GCS backups (pg_dump, GCS versioning, platform config, terraform state), restore with review period
  • Session Credentials - Per-user RStudio credentials with PAM authentication, auto-generated usernames
  • Role-Based Access - Permission-based RBAC with four built-in roles, custom role creation, and per-resource/action grants
  • Upgrade System - GitHub-based version checking, managed upgrade flow with rollback
  • Audit Log - Immutable audit trail with filtering, pagination, and human-readable descriptions
  • GitOps - Version-controlled platform configuration with diff and rollback

Architecture

bioAF System Architecture

How it works

A computational biologist registers an experiment, links FASTQ files (uploaded or auto-ingested from a sequencer drop), selects a pipeline from the catalog (nf-core/scrnaseq, rnaseq, or custom), and launches a run. The BioAF Adapter Layer handles everything below that: staging inputs from GCS, submitting Kubernetes Jobs to GKE Autopilot, monitoring execution via Nextflow trace parsing, collecting outputs back to GCS, and transitioning the experiment through its status lifecycle (registered -> library_prep -> sequencing -> fastq_uploaded -> processing -> pipeline_complete -> [reviewed ->] analysis -> complete). Pipeline completion triggers event-driven notifications (in-app, email, Slack), and results are browsable through the plot archive, cellxgene viewer, and GEO export tools. Jupyter and RStudio sessions run as Kubernetes Pods with GCS-backed home directories and SSH access. RStudio sessions use per-user PAM authentication (ADR-030), and notebook container images are managed as versioned environments (ADR-033), built automatically via Cloud Build (ADR-031).

The adapter layer (ADR-020) abstracts compute, storage, and notebook providers behind clean interfaces, so all application logic is decoupled from infrastructure specifics. Today that means GKE + GCS (ADR-021, ADR-022).

Infrastructure is provisioned through UI-driven Terraform (ADR-007) -- researchers never touch HCL. All secrets live in Secret Manager (ADR-008), all actions are recorded in an immutable audit log (ADR-009), and data portability is guaranteed (ADR-012).

See all architecture decision records in decisions/README.md.

Quick Start

Prerequisites

  • Docker and Docker Compose
  • Git
  • openssl (for secret generation)

Deploy on GCP (one command)

Run this on your local machine to provision a GCP VM and get started:

curl -fsSL https://raw.githubusercontent.com/not-that-guy-again/bioAF/main/install-gcp.sh | bash

The script sets up gcloud, creates a VM with Docker, and walks you through the process. Once the VM is ready, SSH in and run:

git clone https://github.com/not-that-guy-again/bioAF.git
cd bioAF
./bioaf setup

Deploy on an existing server

If you already have a Linux server with Docker installed:

git clone https://github.com/not-that-guy-again/bioAF.git
cd bioAF
./bioaf setup

The setup command handles everything: checks prerequisites, generates secrets and TLS certs, pulls pre-built images, runs migrations, and prints a one-time setup code. Open the URL it shows in your browser and enter the code to create your admin account and configure the platform.

Management Commands

Command Description
./bioaf setup First-run setup (pulls images, generates secrets, prints setup code)
./bioaf start Start all services in dependency order
./bioaf stop Stop all services
./bioaf restart Restart all services
./bioaf status Show service status
./bioaf logs [service] Tail logs (all or one service)
./bioaf build [service] Build container images locally (development only)
./bioaf migrate Run database migrations
./bioaf migrate-down <rev> Downgrade database to a specific revision
./bioaf seed <script.py> Run a seed/data script in the backend container
./bioaf backup Create a database backup
./bioaf update [version] Update to latest (or specific) version
./bioaf reset-db Destroy and recreate the database (with confirmation)
./bioaf shell [service] Open a shell in a container (default: backend)
./bioaf dbshell Open a psql session to the database
./bioaf register-outputs Register pipeline output files from GCS
./bioaf help Show all commands

See the full Deployment Guide for detailed instructions.

Documentation

Development Setup

Using Docker Compose (recommended)

# Start backend, frontend, and PostgreSQL
docker compose -f docker/docker-compose.dev.yml up

# Backend:  http://localhost:8000
# Frontend: http://localhost:3000
# Postgres: localhost:5432

Manual Setup

# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt -r requirements-dev.txt
uvicorn app.main:app --reload

# Frontend
cd frontend
npm install
npm run dev

# Database (requires PostgreSQL 16)
cd backend
alembic upgrade head

Running Tests

# Backend tests (requires PostgreSQL)
docker compose -f docker/docker-compose.dev.yml up -d db
cd backend && python -m pytest tests/ -v

# Frontend tests
cd frontend && npm test

Component Catalog

bioAF manages these infrastructure components through its UI:

Component Category Compute Stack Dependencies
GKE Cluster Compute Kubernetes None
GCS Buckets Storage Kubernetes GKE
JupyterHub Notebooks Kubernetes Compute, Storage
RStudio Server Notebooks Kubernetes Compute, Storage
Nextflow Pipelines Kubernetes Compute
cellxgene Visualization Any None
QC Dashboard Visualization Any None

Project Structure

bioAF/
  backend/           FastAPI application
  frontend/          Next.js 14 application
  docker/            Dockerfiles, compose, and nginx config
  terraform/         GCP infrastructure as code
  helm/              Kubernetes deployment chart
  decisions/         Architecture Decision Records
  documentation/     Product and architecture specs
  docs/              User-facing documentation
  scripts/           Utility scripts (seed data, update agent)
  tests/shell/       BATS tests for install.sh and bioaf scripts
  bioaf              Management script (entry point)
  install.sh         First-time installer (prereq checks + env generation)
  install-gcp.sh     One-command GCP provisioning script

Contributing

See the ADRs in decisions/ for architectural context before making changes. All infrastructure changes must go through the UI-driven Terraform workflow (ADR-007). The audit log is immutable by design (ADR-009).