bioAF

This repository has moved

All new development of bioAF has moved to github.com/bioAF/bioAF.

This repository is preserved for historical reference only. Issues, pull requests, and updates should be directed to the new repository. The documentation below is retained as-is and may be out of date.

bioAF

Computational Biology Automation Framework

A turnkey computational biology platform for small biotech companies (5-50 researchers), deployed on Google Cloud Platform. bioAF provides a web-based control plane for managing HPC clusters, notebook environments, pipeline engines, and data visualization tools -- all provisioned through UI-driven Terraform.

Features

Experiment Tracking - MINSEQE-compliant metadata, sample management, batch processing, project organization
Compute Orchestration - Kubernetes (GKE) compute via the BioAF Adapter Layer, JupyterHub/RStudio notebooks, versioned compute environments, auto-scaling, Cloud Build image pipeline
Pipeline Engine - Nextflow integration, custom pipelines, pipeline catalog, run monitoring, parameter management
Data Management - File upload/download, dataset browser, GCS storage integration, GEO export, SuperSeries cross-experiment packaging
Results & Visualization - QC dashboards, cellxgene single-cell viewer, plot archive, search
SSH Access - One-click kubectl exec into running pipeline jobs and notebook sessions
Notifications - Event-driven alerts via in-app, email (SMTP), and Slack (OAuth integration)
Cost Center - GCP billing integration, budget alerts, component cost breakdown, projections
Backup & Recovery - 4-tier GCS backups (pg_dump, GCS versioning, platform config, terraform state), restore with review period
Session Credentials - Per-user RStudio credentials with PAM authentication, auto-generated usernames
Role-Based Access - Permission-based RBAC with four built-in roles, custom role creation, and per-resource/action grants
Upgrade System - GitHub-based version checking, managed upgrade flow with rollback
Audit Log - Immutable audit trail with filtering, pagination, and human-readable descriptions
GitOps - Version-controlled platform configuration with diff and rollback

Architecture

How it works

A computational biologist registers an experiment, links FASTQ files (uploaded or auto-ingested from a sequencer drop), selects a pipeline from the catalog (nf-core/scrnaseq, rnaseq, or custom), and launches a run. The BioAF Adapter Layer handles everything below that: staging inputs from GCS, submitting Kubernetes Jobs to GKE Autopilot, monitoring execution via Nextflow trace parsing, collecting outputs back to GCS, and transitioning the experiment through its status lifecycle (registered -> library_prep -> sequencing -> fastq_uploaded -> processing -> pipeline_complete -> [reviewed ->] analysis -> complete). Pipeline completion triggers event-driven notifications (in-app, email, Slack), and results are browsable through the plot archive, cellxgene viewer, and GEO export tools. Jupyter and RStudio sessions run as Kubernetes Pods with GCS-backed home directories and SSH access. RStudio sessions use per-user PAM authentication (ADR-030), and notebook container images are managed as versioned environments (ADR-033), built automatically via Cloud Build (ADR-031).

The adapter layer (ADR-020) abstracts compute, storage, and notebook providers behind clean interfaces, so all application logic is decoupled from infrastructure specifics. Today that means GKE + GCS (ADR-021, ADR-022).

Infrastructure is provisioned through UI-driven Terraform (ADR-007) -- researchers never touch HCL. All secrets live in Secret Manager (ADR-008), all actions are recorded in an immutable audit log (ADR-009), and data portability is guaranteed (ADR-012).

See all architecture decision records in decisions/README.md.

Quick Start

Prerequisites

Docker and Docker Compose
Git
openssl (for secret generation)

Deploy on GCP (one command)

Run this on your local machine to provision a GCP VM and get started:

curl -fsSL https://raw.githubusercontent.com/not-that-guy-again/bioAF/main/install-gcp.sh | bash

The script sets up gcloud, creates a VM with Docker, and walks you through the process. Once the VM is ready, SSH in and run:

git clone https://github.com/not-that-guy-again/bioAF.git
cd bioAF
./bioaf setup

Deploy on an existing server

If you already have a Linux server with Docker installed:

git clone https://github.com/not-that-guy-again/bioAF.git
cd bioAF
./bioaf setup

The setup command handles everything: checks prerequisites, generates secrets and TLS certs, pulls pre-built images, runs migrations, and prints a one-time setup code. Open the URL it shows in your browser and enter the code to create your admin account and configure the platform.

Management Commands

Command	Description
`./bioaf setup`	First-run setup (pulls images, generates secrets, prints setup code)
`./bioaf start`	Start all services in dependency order
`./bioaf stop`	Stop all services
`./bioaf restart`	Restart all services
`./bioaf status`	Show service status
`./bioaf logs [service]`	Tail logs (all or one service)
`./bioaf build [service]`	Build container images locally (development only)
`./bioaf migrate`	Run database migrations
`./bioaf migrate-down <rev>`	Downgrade database to a specific revision
`./bioaf seed <script.py>`	Run a seed/data script in the backend container
`./bioaf backup`	Create a database backup
`./bioaf update [version]`	Update to latest (or specific) version
`./bioaf reset-db`	Destroy and recreate the database (with confirmation)
`./bioaf shell [service]`	Open a shell in a container (default: backend)
`./bioaf dbshell`	Open a psql session to the database
`./bioaf register-outputs`	Register pipeline output files from GCS
`./bioaf help`	Show all commands

See the full Deployment Guide for detailed instructions.

Documentation

Quickstart - Documentation hub
Deployment Guide - Full deployment walkthrough
Bench Scientist Guide - Experiments, samples, results
Computational Biologist Guide - Pipelines, notebooks, environments
Admin Guide - User management, costs, backups, notifications
Life After bioAF - Data portability after teardown
ADR Index - Architecture Decision Records
SSH Access Guide - Connecting to running workloads
GEO Export Guide - Exporting to NCBI GEO
Reference Data Guide - Managing reference genomes and annotations
Compute Stack Setup - Kubernetes configuration

Development Setup

Using Docker Compose (recommended)

# Start backend, frontend, and PostgreSQL
docker compose -f docker/docker-compose.dev.yml up

# Backend:  http://localhost:8000
# Frontend: http://localhost:3000
# Postgres: localhost:5432

Manual Setup

# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt -r requirements-dev.txt
uvicorn app.main:app --reload

# Frontend
cd frontend
npm install
npm run dev

# Database (requires PostgreSQL 16)
cd backend
alembic upgrade head

Running Tests

# Backend tests (requires PostgreSQL)
docker compose -f docker/docker-compose.dev.yml up -d db
cd backend && python -m pytest tests/ -v

# Frontend tests
cd frontend && npm test

Component Catalog

bioAF manages these infrastructure components through its UI:

Component	Category	Compute Stack	Dependencies
GKE Cluster	Compute	Kubernetes	None
GCS Buckets	Storage	Kubernetes	GKE
JupyterHub	Notebooks	Kubernetes	Compute, Storage
RStudio Server	Notebooks	Kubernetes	Compute, Storage
Nextflow	Pipelines	Kubernetes	Compute
cellxgene	Visualization	Any	None
QC Dashboard	Visualization	Any	None

Project Structure

bioAF/
  backend/           FastAPI application
  frontend/          Next.js 14 application
  docker/            Dockerfiles, compose, and nginx config
  terraform/         GCP infrastructure as code
  helm/              Kubernetes deployment chart
  decisions/         Architecture Decision Records
  documentation/     Product and architecture specs
  docs/              User-facing documentation
  scripts/           Utility scripts (seed data, update agent)
  tests/shell/       BATS tests for install.sh and bioaf scripts
  bioaf              Management script (entry point)
  install.sh         First-time installer (prereq checks + env generation)
  install-gcp.sh     One-command GCP provisioning script

Contributing

See the ADRs in decisions/ for architectural context before making changes. All infrastructure changes must go through the UI-driven Terraform workflow (ADR-007). The audit log is immutable by design (ADR-009).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repository has moved

bioAF

Features

Architecture

How it works

Quick Start

Prerequisites

Deploy on GCP (one command)

Deploy on an existing server

Management Commands

Documentation

Development Setup

Using Docker Compose (recommended)

Manual Setup

Running Tests

Component Catalog

Project Structure

Contributing

About

Uh oh!

Releases 33

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.github		.github
assets		assets
backend		backend
cli		cli
decisions		decisions
docker		docker
docs		docs
documentation		documentation
frontend		frontend
helm/bioaf		helm/bioaf
installer		installer
scripts		scripts
sdk		sdk
templates		templates
terraform		terraform
tests/shell		tests/shell
.env.example		.env.example
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
bioaf		bioaf
install-gcp.sh		install-gcp.sh
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

This repository has moved

bioAF

Features

Architecture

How it works

Quick Start

Prerequisites

Deploy on GCP (one command)

Deploy on an existing server

Management Commands

Documentation

Development Setup

Using Docker Compose (recommended)

Manual Setup

Running Tests

Component Catalog

Project Structure

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 33

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages