Tally AI is an autonomous, natural language Text-to-SQL engine that lets users query their Tally ERP accounting data conversationally. Built on a multi-tenant PostgreSQL vector architecture, it ensures complete data isolation between companies while securely translating informal questions (e.g., "What's my cash flow?") into accurate, executable PostgreSQL queries.
- Backend: FastAPI (Python 3.10+) β‘
- Frontend: Vanilla HTML/CSS/JS with a futuristic dynamic UI.
- Database: PostgreSQL (with
pgvectorfor similarity search). - LLM Engine: Local Ollama instance (
qwen3-coder-next:cloud). - ORM / Drivers: SQLAlchemy and Psycopg2.
/home/josh/tally/
βββ app.py # Main FastAPI application entry point.
βββ requirements.txt # Python dependency list.
βββ api/
β βββ routes.py # Core API endpoints (receives chat POST requests).
βββ core/
β βββ orchestrator.py # Pipeline coordinator (ties together embedding + RAG + AI + Execution).
β βββ embedding.py # Generates text embeddings for similarity search.
β βββ retriever.py # Connects to `pgvector` to find relevant schema & few-shot examples.
β βββ generator.py # The AI Brain: constructs the strict prompt and calls Ollama.
β βββ executor.py # Advanced execution and safety engine (handles column corrections, validation).
β βββ formatter.py # Formats the returned database rows into Markdown tables for the UI.
βββ db/
β βββ connection.py # DB connection utilities.
β βββ schema_data.py # Core DDL Definitions + Semantic Human Language mappings.
βββ scripts/
β βββ setup_db.py # Creates core database tables and sets up RLS (Row-Level Security).
β βββ tally_import.py # ETL pipeline that normalizes and imports Tally Excel exports.
β βββ sync_rag.py # The Learning Engine: indexes few-shot templates and schema into pgvector.
βββ static/
βββ index.html # Chat interface UI.
Before beginning setup, ensure your system has the following installed:
- Python 3.10+ (with
pip) - PostgreSQL (running locally on port 5432)
- pgvector extension (Must be installed and enabled in your PostgreSQL database)
- Ollama (Local LLM runner for serving
qwen3-coder-next:cloud)
Install all required Python packages via the requirements.txt file:
pip install -r requirements.txtMake sure you have PostgreSQL running locally with the pgvector extension installed.
Run the setup script to drop existing data, set up the schema, enable vector extensions, and enforce Row-Level Security:
export PYTHONPATH=$PYTHONPATH:.
python3 scripts/setup_db.pyLoad your raw Tally Excel exports into the newly created database. (This script normalizes columns and units natively).
python3 scripts/tally_import.pyTo make the AI aware of the schema and our expertly crafted few-shot logic templates, we need to embed the memory into pgvector.
Run the RAG Sync script:
python3 scripts/sync_rag.pyEnsure you have Ollama up and running with the required model:
ollama serve
ollama pull qwen3-coder-next:cloudStart the FastAPI server:
uvicorn app:app --host 0.0.0.0 --port 8000 --reload
# Or directly simply run: python3 app.pyWhen a user sends a message through the frontend, the following intricate pipeline executes to return an accurate result:
-
Routing (
app.pyβapi/routes.py): A user accesses a tenant-specific URL (http://localhost:8000/{UUID}). All chat queries are automatically scoped to this specific UUID. -
Orchestration (
core/orchestrator.py): The request drops into the orchestrator pipeline. -
Retrieval (
core/embedding.pyβcore/retriever.py): The user's question (e.g., "Show pending receivables") is converted to an embedding. The system asks PostgreSQL/pgvectorto find the top 5 most semantically related Few-Shot logic templates (fromsync_rag.py) and schema definitions. -
Generation (
core/generator.py): A highly strict Prompt Template is injected with:- The user's query.
- The verified database schema (with semantic annotations).
- "Hard Rules" forbidding math abstractions on text columns.
- A semantic alias mapping block.
This prompt is zipped off to the local Ollama LLM (
qwen3-coder-next:cloud).
-
Safety Engine & Execution (
core/executor.py): The returned raw SQL is intercepted by an advanced auto-cleaner. The script intelligently strips out hallucinations (account_group_nameβgroup_name), removes unwanted numeric functions liketo_numberagainstunitcolumns, and fixes collapsed wildcardILIKEphrases. Finally, the executor runs the query as an enforced Read-Only connection filtered locally underSET LOCAL app.current_tenant. -
Formatting (
core/formatter.py): Thepsycopg2output rows are formatted neatly into markdown structures. The frontend (static/index.html) parses the markdown into a beautiful UI response!