Skip to content

Chispasgg/HeadlessX

 
 

Repository files navigation

πŸš€ HeadlessX v1.2.0

Open Source Browserless Web Scraping API with Human-like Behavior

License: MIT Node.js Playwright GitHub Open Source

🎯 Unified Solution: Website + API on a single domain
🧠 Human-like Behavior: 40+ anti-detection techniques
πŸš€ Deploy Anywhere: Docker, Node.js+PM2, or Development


✨ Key Features

  • 🌐 Unified Architecture: Website and API on one domain
  • 🧠 Human-like Intelligence: Natural mouse movements, smart scrolling, behavioral randomization
  • πŸ“Š Multiple Formats: HTML, text, screenshots, PDFs
  • ⚑ Batch Processing: Handle multiple URLs efficiently
  • πŸ”’ Production Ready: Docker, PM2, Nginx, SSL support
  • πŸ›‘οΈ Anti-Detection: 40+ stealth techniques for reliable scraping

🎯 Quick Start

# 1. Clone and configure
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX

# Quick setup (makes scripts executable + creates .env)
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh
# Then edit: nano .env  # Update DOMAIN, SUBDOMAIN, and AUTH_TOKEN

Choose your deployment:

Method Command Best For
🐳 Docker docker-compose up -d Production, easy deployment
πŸ”§ Auto Setup chmod +x scripts/setup.sh && sudo ./scripts/setup.sh VPS/Server with full control
πŸ’» Development npm install && npm start Local development, testing

Access your HeadlessX:

🌐 Website:  https://your-subdomain.yourdomain.com
πŸ”§ Health:   https://your-subdomain.yourdomain.com/api/health
πŸ“Š Status:   https://your-subdomain.yourdomain.com/api/status?token=YOUR_AUTH_TOKEN

πŸ—οΈ New Modular Architecture v1.2.0

HeadlessX v1.2.0 introduces a completely refactored modular architecture for better maintainability, scalability, and development experience.

Key Improvements:

  • πŸ”§ Separation of Concerns: Distinct modules for configuration, services, controllers, and middleware
  • πŸš€ Better Performance: Optimized browser management and resource usage
  • πŸ› οΈ Developer Experience: Clear module boundaries and dependency injection
  • πŸ“¦ Production Ready: Enhanced error handling and logging with correlation IDs
  • πŸ”’ Security: Improved authentication and rate limiting
  • πŸ“Š Monitoring: Structured logging and health monitoring

Architecture Overview:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Routes        │───▢│   Controllers   │───▢│   Services      β”‚
β”‚   (api.js)      β”‚    β”‚   (rendering.js)β”‚    β”‚   (browser.js)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Middleware    β”‚    β”‚   Utils         β”‚    β”‚   Config        β”‚
β”‚   (auth.js)     β”‚    β”‚   (logger.js)   β”‚    β”‚   (index.js)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Migration from v1.1.0:

  • The original src/server.js (3079 lines) has been broken down into 20+ focused modules
  • Environment variable TOKEN is now AUTH_TOKEN
  • PM2 config moved from config/ecosystem.config.js to ecosystem.config.js
  • All functionality preserved with improved performance and maintainability

πŸ“– Detailed Documentation: MODULAR_ARCHITECTURE.md


πŸš€ Deployment Guide

🐳 Docker Deployment (Recommended)

# Install Docker (if needed)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# Deploy HeadlessX
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env  # Configure DOMAIN, SUBDOMAIN, AUTH_TOKEN

# Start services
docker-compose up -d

# Optional: Setup SSL
sudo apt install certbot
sudo certbot --standalone -d your-subdomain.yourdomain.com

Docker Management:

docker-compose ps              # Check status
docker-compose logs headlessx  # View logs
docker-compose restart         # Restart services
docker-compose down            # Stop services

πŸ”§ Node.js + PM2 Deployment

# Automated setup (recommended)
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env  # Configure environment
chmod +x scripts/setup.sh
sudo ./scripts/setup.sh  # Installs dependencies, builds website, starts PM2

🌐 Nginx Configuration (Auto-handled by setup script):

The setup script automatically configures nginx, but if you need to manually configure:

# Copy and configure nginx site
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx

# Replace placeholders with your actual domain
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/your-subdomain.yourdomain.com/g' /etc/nginx/sites-available/headlessx

# Enable the site
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default

# Test and reload nginx
sudo nginx -t && sudo systemctl reload nginx

Manual setup (if not using setup script):

sudo apt update && sudo apt upgrade -y
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs build-essential
npm install && npm run build
sudo npm install -g pm2
npm run pm2:start

PM2 Management:

npm run pm2:status     # Check status
npm run pm2:logs       # View logs
npm run pm2:restart    # Restart server
npm run pm2:stop       # Stop server

πŸ’» Development Setup

git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env  # Set AUTH_TOKEN, DOMAIN=localhost, SUBDOMAIN=headlessx

# Make scripts executable
chmod +x scripts/*.sh

# Install dependencies
npm install
cd website && npm install && npm run build && cd ..

# Start development server
npm start  # Access at http://localhost:3000

🌐 API Routes & Structure

HeadlessX Routes:
β”œβ”€β”€ /favicon.ico         β†’ Favicon
β”œβ”€β”€ /robots.txt          β†’ SEO robots file
β”œβ”€β”€ /api/health         β†’ Health check (no auth required)
β”œβ”€β”€ /api/status         β†’ Server status (requires token)
β”œβ”€β”€ /api/render         β†’ Full page rendering
β”œβ”€β”€ /api/html           β†’ HTML extraction  
β”œβ”€β”€ /api/content        β†’ Clean text extraction
β”œβ”€β”€ /api/screenshot     β†’ Screenshot generation
β”œβ”€β”€ /api/pdf            β†’ PDF generation
└── /api/batch          β†’ Batch URL processing

πŸ”„ Request Flow:

  1. Nginx receives request on port 80/443
  2. Proxies to Node.js server on port 3000
  3. Server routes based on path:
    • /api/* β†’ API endpoints
    • /* β†’ Website files (built Next.js app)

πŸš€ API Examples & HTTP Integrations

Quick Health Check (No Auth)

curl https://your-subdomain.yourdomain.com/api/health

πŸ”§ cURL Examples

Extract HTML Content

curl -X POST "https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "timeout": 30000}'

Generate Screenshot

curl "https://your-subdomain.yourdomain.com/api/screenshot?token=YOUR_AUTH_TOKEN&url=https://example.com&fullPage=true" \
  -o screenshot.png

Extract Text Only

curl -X POST "https://your-subdomain.yourdomain.com/api/text?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "waitForSelector": "main"}'

Generate PDF

curl -X POST "https://your-subdomain.yourdomain.com/api/pdf?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "A4"}' \
  -o document.pdf

πŸ€– Make.com (Integromat) Integration

HTTP Request Module Configuration:

{
  "url": "https://your-subdomain.yourdomain.com/api/html",
  "method": "POST",
  "headers": {
    "Content-Type": "application/json"
  },
  "qs": {
    "token": "YOUR_AUTH_TOKEN"
  },
  "body": {
    "url": "{{url_to_scrape}}",
    "timeout": 30000,
    "waitForSelector": "{{optional_selector}}"
  }
}

⚑ Zapier Integration

Webhooks by Zapier Setup:

  • URL: https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN
  • Method: POST
  • Headers: Content-Type: application/json
  • Body:
{
  "url": "{{url_from_trigger}}",
  "timeout": 30000,
  "humanBehavior": true
}

πŸ”— n8n Integration

HTTP Request Node:

{
  "url": "https://your-subdomain.yourdomain.com/api/html",
  "method": "POST",
  "authentication": "queryAuth",
  "query": {
    "token": "YOUR_AUTH_TOKEN"
  },
  "headers": {
    "Content-Type": "application/json"
  },
  "body": {
    "url": "={{$json.url}}",
    "timeout": 30000,
    "humanBehavior": true
  }
}

Available via n8n Community Node:

🐍 Python Example

import requests

def scrape_with_headlessx(url, token):
    response = requests.post(
        "https://your-subdomain.yourdomain.com/api/html",
        params={"token": token},
        json={
            "url": url,
            "timeout": 30000,
            "humanBehavior": True
        }
    )
    return response.json()

# Usage
result = scrape_with_headlessx("https://example.com", "YOUR_TOKEN")
print(result['html'])

🟨 JavaScript/Node.js Example

const axios = require('axios');

async function scrapeWithHeadlessX(url, token) {
  try {
    const response = await axios.post(
      `https://your-subdomain.yourdomain.com/api/html?token=${token}`,
      {
        url: url,
        timeout: 30000,
        humanBehavior: true
      }
    );
    return response.data;
  } catch (error) {
    console.error('Scraping failed:', error.message);
    throw error;
  }
}

// Usage
scrapeWithHeadlessX('https://example.com', 'YOUR_TOKEN')
  .then(result => console.log(result.html))
  .catch(error => console.error(error));

πŸ”„ Batch Processing Example

curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example1.com",
      "https://example2.com",
      "https://example3.com"
    ],
    "timeout": 30000,
    "humanBehavior": true
  }'

Batch Processing

curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com", "https://httpbin.org"],
    "format": "text",
    "options": {"timeout": 30000}
  }'

πŸ“ Project Structure

HeadlessX v1.2.0 - Modular Architecture/
β”œβ”€β”€ πŸ“‚ src/                         # Modular application source
β”‚   β”œβ”€β”€ πŸ“‚ config/                  # Configuration management
β”‚   β”‚   β”œβ”€β”€ index.js               # Main configuration loader
β”‚   β”‚   └── browser.js             # Browser-specific settings
β”‚   β”œβ”€β”€ πŸ“‚ utils/                   # Utility functions
β”‚   β”‚   β”œβ”€β”€ errors.js              # Error handling & categorization
β”‚   β”‚   β”œβ”€β”€ logger.js              # Structured logging
β”‚   β”‚   └── helpers.js             # Common utilities
β”‚   β”œβ”€β”€ πŸ“‚ services/                # Business logic services
β”‚   β”‚   β”œβ”€β”€ browser.js             # Browser lifecycle management
β”‚   β”‚   β”œβ”€β”€ stealth.js             # Anti-detection techniques
β”‚   β”‚   β”œβ”€β”€ interaction.js         # Human-like behavior
β”‚   β”‚   └── rendering.js           # Core rendering logic
β”‚   β”œβ”€β”€ πŸ“‚ middleware/              # Express middleware
β”‚   β”‚   β”œβ”€β”€ auth.js                # Authentication
β”‚   β”‚   └── error.js               # Error handling
β”‚   β”œβ”€β”€ πŸ“‚ controllers/             # Request handlers
β”‚   β”‚   β”œβ”€β”€ system.js              # Health & status endpoints
β”‚   β”‚   β”œβ”€β”€ rendering.js           # Main rendering endpoints
β”‚   β”‚   β”œβ”€β”€ batch.js               # Batch processing
β”‚   β”‚   └── get.js                 # GET endpoints & docs
β”‚   β”œβ”€β”€ πŸ“‚ routes/                  # Route definitions
β”‚   β”‚   β”œβ”€β”€ api.js                 # API route mappings
β”‚   β”‚   └── static.js              # Static file serving
β”‚   β”œβ”€β”€ app.js                     # Main application setup
β”‚   β”œβ”€β”€ server.js                  # Entry point for PM2
β”‚   └── rate-limiter.js            # Rate limiting implementation
β”œβ”€β”€ πŸ“‚ website/                     # Next.js website (unchanged)
β”‚   β”œβ”€β”€ app/                        # Next.js 13+ app directory
β”‚   β”œβ”€β”€ components/                 # React components
β”‚   β”œβ”€β”€ .env.example               # Website environment template
β”‚   β”œβ”€β”€ next.config.js             # Next.js configuration
β”‚   └── package.json               # Website dependencies
β”œβ”€β”€ πŸ“‚ scripts/                     # Deployment & management scripts
β”‚   β”œβ”€β”€ setup.sh                   # Automated installation (updated)
β”‚   β”œβ”€β”€ update_server.sh           # Server update script (updated)
β”‚   β”œβ”€β”€ verify-domain.sh           # Domain verification
β”‚   └── test-routing.sh            # Integration testing
β”œβ”€β”€ πŸ“‚ nginx/                       # Nginx configuration
β”‚   └── headlessx.conf             # Nginx proxy config
β”œβ”€β”€ πŸ“‚ docker/                      # Docker deployment (updated)
β”‚   β”œβ”€β”€ Dockerfile                 # Container definition
β”‚   └── docker-compose.yml         # Docker Compose setup
β”œβ”€β”€ ecosystem.config.js            # PM2 configuration (moved to root)
β”œβ”€β”€ .env.example                   # Environment template (updated)
β”œβ”€β”€ package.json                   # Server dependencies (updated)
β”œβ”€β”€ MODULAR_ARCHITECTURE.md        # Architecture documentation
└── README.md                      # This file

πŸ› οΈ Development

Local Development

# 1. Install dependencies
npm install

# 2. Build website
cd website
npm install
npm run build
cd ..

# 3. Set environment variables
export AUTH_TOKEN="development_token_123"
export DOMAIN="localhost"
export SUBDOMAIN="headlessx"

# 4. Start server
npm start  # Uses src/app.js

# 5. Access locally
# Website: http://localhost:3000
# API: http://localhost:3000/api/health

Testing Integration

# Test server and website integration
bash scripts/test-routing.sh localhost

# Test with environment variables
bash scripts/verify-domain.sh

βš™οΈ Configuration

🌐 Environment Variables (.env)

Create your .env file from the template:

cp .env.example .env
nano .env

Required configuration:

# Security Token (Generate a secure random string)
AUTH_TOKEN=your_secure_token_here

# Domain Configuration  
DOMAIN=yourdomain.com
SUBDOMAIN=headlessx

# Optional: Browser Settings
BROWSER_TIMEOUT=60000
MAX_CONCURRENT_BROWSERS=5

# Optional: Server Settings
PORT=3000
NODE_ENV=production

🌐 Nginx Domain Setup

Option 1: Automatic (Recommended)

# The setup script automatically replaces domain placeholders
sudo ./scripts/setup.sh

Option 2: Manual Configuration

# Copy nginx configuration
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx

# Replace domain placeholders (replace with your actual domain)
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/headlessx.yourdomain.com/g' /etc/nginx/sites-available/headlessx

# Example: If your domain is "api.example.com"
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/api.example.com/g' /etc/nginx/sites-available/headlessx

# Enable site and reload nginx
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Your final URLs will be:

  • Website: https://your-subdomain.yourdomain.com
  • API Health: https://your-subdomain.yourdomain.com/api/health
  • API Endpoints: https://your-subdomain.yourdomain.com/api/*

πŸ“Š API Reference

πŸ”§ Core Endpoints

Endpoint Method Description Auth Required
/api/health GET Health check ❌
/api/status GET Server status βœ…
/api/render POST Full page rendering (JSON) βœ…
/api/html GET/POST Raw HTML extraction βœ…
/api/content GET/POST Clean text extraction βœ…
/api/screenshot GET Screenshot generation βœ…
/api/pdf GET PDF generation βœ…
/api/batch POST Batch URL processing βœ…

πŸ”‘ Authentication

All endpoints (except /api/health) require a token via:

  • Query parameter: ?token=YOUR_TOKEN
  • Header: X-Token: YOUR_TOKEN
  • Header: Authorization: Bearer YOUR_TOKEN

πŸ“– Complete Documentation

Visit your HeadlessX website for full API documentation with examples, or check:


πŸ“Š Monitoring & Troubleshooting

πŸ” Health Checks

curl https://your-subdomain.yourdomain.com/api/health
curl "https://your-subdomain.yourdomain.com/api/status?token=YOUR_TOKEN"

πŸ“‹ Log Management

# PM2 logs
npm run pm2:logs
pm2 logs headlessx --lines 100

# Docker logs
docker-compose logs -f headlessx

# Nginx logs
sudo tail -f /var/log/nginx/access.log

πŸ”„ Updates

git pull origin main
npm run build          # Rebuild website
npm run pm2:restart     # PM2
# OR
docker-compose restart  # Docker

πŸ”§ Common Issues

"npm ci" Error (missing package-lock.json):

chmod +x scripts/generate-lockfiles.sh
./scripts/generate-lockfiles.sh  # Generate lock files
# OR
npm install --production  # Use install instead

"Cannot find module 'express'":

npm install  # Install dependencies

System dependency errors (Ubuntu):

sudo apt update && sudo apt install -y \
  libatk1.0-0t64 libatk-bridge2.0-0t64 libcups2t64 \
  libatspi2.0-0t64 libasound2t64 libxcomposite1

PM2 not starting:

sudo npm install -g pm2
chmod +x scripts/setup.sh  # Make script executable
pm2 start config/ecosystem.config.js
pm2 logs headlessx  # Check errors

Script permission errors:

# Make all scripts executable
chmod +x scripts/*.sh

# Or use the quick setup
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh

Playwright browser installation errors:

# Use dedicated Playwright setup script
chmod +x scripts/setup-playwright.sh
./scripts/setup-playwright.sh

# Or install manually:
sudo apt update && sudo apt install -y \
  libgtk-3-0t64 libpangocairo-1.0-0 libcairo-gobject2 \
  libgdk-pixbuf-2.0-0 libdrm2 libxss1 libxrandr2 \
  libasound2t64 libatk1.0-0t64 libnss3

# Install only Chromium (most stable)
npx playwright install chromium

# Alternative: Use Docker (avoids dependency issues)
docker-compose up -d

πŸ” Security Features

  • Token Authentication: Secure API access with custom tokens
  • Rate Limiting: Nginx-level request throttling
  • Security Headers: XSS, CSRF, and clickjacking protection
  • Bot Protection: Common attack vector blocking
  • SSL/TLS: Automatic HTTPS with Let's Encrypt

🀝 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ†˜ Support


🎯 Built by SaifyXPRO

HeadlessX v1.1.0 - The most advanced open-source browserless web scraping solution.

Made with ❀️ for the developer community.

About

A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 60.9%
  • TypeScript 20.0%
  • Shell 16.9%
  • CSS 1.6%
  • Dockerfile 0.6%