HeadlessX is a self-hosted scraping platform with a web dashboard, protected API, queue-backed workflows, and a remote MCP endpoint.
Current live surfaces:
- Website scraping: scrape, crawl, map, content extraction, screenshots
- Google SERP
- Tavily
- Exa
- YouTube
- Queue jobs, logs, API keys, proxy management, and config management
- Remote MCP over
/mcp
- Simplified the dashboard around one global browser/runtime model
- Added Tavily, Exa, and YouTube workspaces
- Added queued crawl and job flows with Redis + worker support
- Added remote MCP secured with normal dashboard-created API keys
- Added setup and API guides aligned with the current route tree
|
|
- Node.js 22+
- pnpm 9+
- PostgreSQL
- Redis
- Python/uv for
yt-engine - Go for the HTML-to-Markdown sidecar
Recommended for most developers:
- PostgreSQL: Supabase or Docker
- Redis: Docker
- App runtime:
pnpm devormise run dev
This keeps infrastructure simple while still running the app locally.
- Clone and install:
git clone https://github.com/saifyxpro/HeadlessX.git
cd HeadlessX
pnpm install- Create root
.envfrom the full example:
cp .env.example .envCurrent root .env.example:
# ==============================================
# HEADLESSX V2.1.0 - LOCAL DEVELOPMENT
# ==============================================
# ------------------------------
# 1. DATABASE
# ------------------------------
DATABASE_URL="postgresql://postgres.xxxxx:YOUR_PASSWORD@aws-0-region.pooler.supabase.com:5432/postgres"
# ------------------------------
# 2. SERVER
# ------------------------------
PORT=8000
HOST=0.0.0.0
NODE_ENV=development
# ------------------------------
# 2A. SECURITY (REQUIRED)
# ------------------------------
# Used by the Next.js dashboard server to authenticate against the API.
DASHBOARD_INTERNAL_API_KEY=replace-with-a-long-random-string
# Used to encrypt stored credentials at rest.
CREDENTIAL_ENCRYPTION_KEY=replace-with-a-different-long-random-string
# ------------------------------
# 3. QUEUE / REDIS
# ------------------------------
# BullMQ uses Redis to persist async scrape, extract, and index jobs.
REDIS_URL=redis://localhost:6379
# Search providers and local engines
TAVILY_API_KEY=
EXA_API_KEY=
YT_ENGINE_URL=http://localhost:8090
YT_ENGINE_PORT=8090
YT_ENGINE_TIMEOUT_MS=45000
YT_ENGINE_TEMP_DIR=./tmp/yt-engine
YT_ENGINE_JOB_TTL_HOURS=12
HTML_TO_MARKDOWN_SERVICE_URL=http://localhost:8081
HTML_TO_MARKDOWN_PORT=8081
HTML_TO_MARKDOWN_TIMEOUT_MS=60000
# Optional queue tuning
BULLMQ_QUEUE_NAME=headlessx-jobs
QUEUE_WORKER_CONCURRENCY=2
QUEUE_JOB_ATTEMPTS=3
QUEUE_JOB_BACKOFF_MS=5000
QUEUE_STREAM_POLL_MS=1000
QUEUE_CONNECTION_RETRY_MS=10000
# Browser and anti-detection settings are managed from the dashboard
# ------------------------------
# 4. FRONTEND (Next.js)
# ------------------------------
WEB_PORT=3000
NEXT_PUBLIC_API_URL=http://localhost:8000
INTERNAL_API_URL=http://localhost:8000
# CORS: Add your frontend URL for custom deployments
FRONTEND_URL=http://localhost:3000
# ------------------------------
# 5. DEFAULT RUNTIME SETTINGS
# ------------------------------
BROWSER_TIMEOUT=60000
MAX_CONCURRENCY=5
STEALTH_MODE=advancedIf you are using Docker instead of local services, start from the complete Docker env too:
cp infra/docker/.env.example infra/docker/.env- Prepare services:
pnpm db:push
pnpm camoufox:fetch- Start the workspace:
pnpm devThis starts:
- web
- api
- worker
- HTML-to-Markdown service
- yt-engine
Important:
pnpm devdoes not provision PostgreSQL or Redis- Website Crawl requires both Redis and the worker
For the current Docker path:
cp infra/docker/.env.example infra/docker/.env
cd infra/docker
docker compose --profile all up --build -dImportant notes:
- use
--profile all - partial profile runs are not currently reliable because of
depends_onrelationships - the core Docker stack does not yet define a
yt-enginecontainer, so YouTube may still need to run locally
See docs/setup-guide.md for the full matrix:
- no-Docker setup
- mixed local setup
- full Docker setup
- MCP client configuration
All non-health backend routes are protected with x-api-key.
Core backend surfaces:
GET /api/healthGET/PATCH /api/configGET /api/dashboard/statsGET /api/logsGET/POST/PATCH/DELETE /api/keys- proxy CRUD under
/api/proxies - website routes under
/api/website/* - Google SERP routes under
/api/google-serp/* - Tavily routes under
/api/tavily/* - Exa routes under
/api/exa/* - YouTube routes under
/api/youtube/* - queue job routes under
/api/jobs/* - remote MCP endpoint at
/mcp
See the full route reference in docs/api-endpoints.md.
HeadlessX exposes a remote MCP endpoint from the API:
http://localhost:8000/mcp
Use a normal API key created from the dashboard API Keys page.
Do not use DASHBOARD_INTERNAL_API_KEY for MCP clients.
Example client config:
{
"mcpServers": {
"headlessx": {
"transport": "http",
"url": "http://localhost:8000/mcp",
"headers": {
"x-api-key": "hx_your_dashboard_created_key"
}
}
}
}apps/
api/ Express API + worker + MCP
web/ Next.js dashboard
yt-engine/ Python YouTube engine
go-html-to-md-service/ Go HTML-to-Markdown sidecar
docs/
setup-guide.md
api-endpoints.md
infra/docker/
- The dashboard uses the internal dashboard key for server-side internal requests
- MCP uses normal user-created API keys, not the dashboard internal key
- Queue-backed features return degraded/unavailable behavior when Redis is missing
- Docker support is available for the core stack, but yt-engine still needs separate Docker wiring
See CONTRIBUTING.md for the current contribution workflow, local setup expectations, pull request guidance, and commit message conventions.
MIT





