Flask API service for automated job scraping with database backend
The OpenAPI Swagger UI for this API is available at:
π View the API Docs
- Features
- Architecture
- Getting Started
- Usage
- Directory Structure
- Logging
- Testing
- CI/CD
- License
- Example Requests
- Headless scraping of Workdayβpowered job postings
- Persist scraped data in a SQLite database
- Expose a RESTful Flask API to query and trigger scrapes
- Configurable via environment variables and
.env - Structured logging to console and rotating log files
- Automated changelog and daily ingestion via GitHub Actions
- Core: Python 3.12, Flask
- Scraper: Vendored Workday-scraper logic under
app/scraper_pkg - Storage: SQLite (
jobs.db) - CLI:
run.pypowered by Clickβsupportsscrape&servecommands - API: Blueprint
jobs_bpexposes/jobs/...routes - Config:
python-dotenv+app/config.pyenvironment-driven
- Python 3.12
- Git
- (Optional) Conda or virtualenv
git clone https://github.com/jharemza/workday-scraper-api.git
cd workday-scraper-api
# Using virtualenv
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root to override defaults (see app/config.py):
# Path to SQLite DB
JOBS_DB_PATH=./jobs.db
# Scrape settings
SCRAPE_LIMIT=20
# API server settings
API_HOST=127.0.0.1
API_PORT=5000
# Logging
LOG_LEVEL=INFOOn first run the table is auto-created. To reset or customize:
sqlite3 jobs.db << 'EOF'
DROP TABLE IF EXISTS job_postings;
# paste the CREATE TABLE DDL from app/db.py here
EOF- Scrape all companies
python run.py scrape- Scrape specific companies
python run.py scrape -c "M&T Bank" -c "Acme Corp"- Start the API server
python run.py serveAll responses are JSON.
| Method | Path | Description |
|---|---|---|
| GET | /jobs/all |
List all current job postings |
| GET | /jobs/today |
Jobs scraped on the current date |
| GET | /jobs/company/{company} |
All current jobs for a given company |
| GET | /jobs/company/{company}/new |
Jobs added today for a given company |
| POST | /jobs/scrape |
Trigger a fresh scrape (body: {"companies": [...]}) |
.
βββ .github/
β βββ workflows/ # CI/CD (release & daily ingest)
βββ app/
β βββ main.py # Flask app & logging setup
β βββ routes.py # API endpoints
β βββ db.py # SQLite schema & CRUD
β βββ config.py # env-driven settings
β βββ scraper.py # orchestrates vendored scraper + DB upserts
β βββ scraper_pkg/ # vendored workday_scraper modules
βββ docs/
β βββ openapi.yaml # (optional) OpenAPI spec
βββ logs/
β βββ app.log # auto-rotated logs
βββ tests/ # pytest suite
βββ .env # environment overrides (gitignored)
βββ jobs.db # SQLite DB (auto-generated)
βββ README.md
βββ requirements.txt
βββ run.py # CLI commands (scrape & serve)- Console: Verbose, timestamped output
- File:
logs/app.log(rotates at 10 MB, keeps 5 backups) - Level: Controlled by
LOG_LEVEL(DEBUG, INFO, etc.)
pytest --cov=app --cov-report=xml tests/.github/workflows/release.yml: Auto-updateCHANGELOG.mdon tags/schedule.github/workflows/ingest.yml: Daily or manual scrape & optional DB commit
This project is licensed under the MIT License.
You can interact with the API directly via curl or import these into Postman.
Replace
{{API_HOST}}and{{API_PORT}}with your configured values (defaults:127.0.0.1:5000).
- List all jobs
curl -X GET http://{{API_HOST}}:{{API_PORT}}/jobs/all- Jobs scraped today
curl -X GET http://{{API_HOST}}:{{API_PORT}}/jobs/today- All current jobs for a company
curl -X GET http://{{API_HOST}}:{{API_PORT}}/jobs/company/"M&T%20Bank"- New jobs for a company (added today)
curl -X GET http://{{API_HOST}}:{{API_PORT}}/jobs/company/"M&T%20Bank"/new- Trigger a scrape for one or more companies
curl -X POST http://{{API_HOST}}:{{API_PORT}}/jobs/scrape \
-H "Content-Type: application/json" \
-d '{"companies": ["M&T Bank", "Acme Corp"]}'-
Create a new Collection called βWorkday Scraper APIβ.
-
Add a Request for each endpoint:
-
Method: GET or POST
-
URL:
http://127.0.0.1:5000/jobs/all(or other endpoints) -
Headers: for POST, set
Content-Type: application/json -
Body (raw JSON) for
/jobs/scrape:{ "companies": ["M&T Bank", "Acme Corp"] }
-
-
Save and Sendβyouβll see the JSON response in Postmanβs response pane.