Tally AI Multi-Tenant Chatbot

Tally AI is an autonomous, natural language Text-to-SQL engine that lets users query their Tally ERP accounting data conversationally. Built on a multi-tenant PostgreSQL vector architecture, it ensures complete data isolation between companies while securely translating informal questions (e.g., "What's my cash flow?") into accurate, executable PostgreSQL queries.

Architecture & Technology Stack

Backend: FastAPI (Python 3.10+) ⚡
Frontend: Vanilla HTML/CSS/JS with a futuristic dynamic UI.
Database: PostgreSQL (with pgvector for similarity search).
LLM Engine: Local Ollama instance (qwen3-coder-next:cloud).
ORM / Drivers: SQLAlchemy and Psycopg2.

Project Structure

/home/josh/tally/
├── app.py                  # Main FastAPI application entry point.
├── requirements.txt        # Python dependency list.
├── api/
│   └── routes.py           # Core API endpoints (receives chat POST requests).
├── core/
│   ├── orchestrator.py     # Pipeline coordinator (ties together embedding + RAG + AI + Execution).
│   ├── embedding.py        # Generates text embeddings for similarity search.
│   ├── retriever.py        # Connects to `pgvector` to find relevant schema & few-shot examples.
│   ├── generator.py        # The AI Brain: constructs the strict prompt and calls Ollama.
│   ├── executor.py         # Advanced execution and safety engine (handles column corrections, validation).
│   └── formatter.py        # Formats the returned database rows into Markdown tables for the UI.
├── db/
│   ├── connection.py       # DB connection utilities.
│   └── schema_data.py      # Core DDL Definitions + Semantic Human Language mappings.
├── scripts/
│   ├── setup_db.py         # Creates core database tables and sets up RLS (Row-Level Security).
│   ├── tally_import.py     # ETL pipeline that normalizes and imports Tally Excel exports.
│   └── sync_rag.py         # The Learning Engine: indexes few-shot templates and schema into pgvector.
└── static/
    └── index.html          # Chat interface UI.

Prerequisites & Installation

Before beginning setup, ensure your system has the following installed:

Python 3.10+ (with pip)
PostgreSQL (running locally on port 5432)
pgvector extension (Must be installed and enabled in your PostgreSQL database)
Ollama (Local LLM runner for serving qwen3-coder-next:cloud)

Install Python Dependencies

Install all required Python packages via the requirements.txt file:

pip install -r requirements.txt

Setup Instructions

1. Database Setup

Make sure you have PostgreSQL running locally with the pgvector extension installed.

Run the setup script to drop existing data, set up the schema, enable vector extensions, and enforce Row-Level Security:

export PYTHONPATH=$PYTHONPATH:.
python3 scripts/setup_db.py

2. Import Data (ETL Phase)

Load your raw Tally Excel exports into the newly created database. (This script normalizes columns and units natively).

python3 scripts/tally_import.py

3. Train the AI (RAG Synchronization)

To make the AI aware of the schema and our expertly crafted few-shot logic templates, we need to embed the memory into pgvector. Run the RAG Sync script:

python3 scripts/sync_rag.py

4. Run the Local LLM Engine

Ensure you have Ollama up and running with the required model:

ollama serve
ollama pull qwen3-coder-next:cloud

5. Launch the Server

Start the FastAPI server:

uvicorn app:app --host 0.0.0.0 --port 8000 --reload
# Or directly simply run: python3 app.py

Detailed Execution Flow

When a user sends a message through the frontend, the following intricate pipeline executes to return an accurate result:

Routing (app.py → api/routes.py): A user accesses a tenant-specific URL (http://localhost:8000/{UUID}). All chat queries are automatically scoped to this specific UUID.
Orchestration (core/orchestrator.py): The request drops into the orchestrator pipeline.
Retrieval (core/embedding.py → core/retriever.py): The user's question (e.g., "Show pending receivables") is converted to an embedding. The system asks PostgreSQL/pgvector to find the top 5 most semantically related Few-Shot logic templates (from sync_rag.py) and schema definitions.
Generation (core/generator.py): A highly strict Prompt Template is injected with:
- The user's query.
- The verified database schema (with semantic annotations).
- "Hard Rules" forbidding math abstractions on text columns.
- A semantic alias mapping block. This prompt is zipped off to the local Ollama LLM (qwen3-coder-next:cloud).
Safety Engine & Execution (core/executor.py): The returned raw SQL is intercepted by an advanced auto-cleaner. The script intelligently strips out hallucinations (account_group_name → group_name), removes unwanted numeric functions like to_number against unit columns, and fixes collapsed wildcard ILIKE phrases. Finally, the executor runs the query as an enforced Read-Only connection filtered locally under SET LOCAL app.current_tenant.
Formatting (core/formatter.py): The psycopg2 output rows are formatted neatly into markdown structures. The frontend (static/index.html) parses the markdown into a beautiful UI response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tally AI Multi-Tenant Chatbot

Architecture & Technology Stack

Project Structure

Prerequisites & Installation

Install Python Dependencies

Setup Instructions

1. Database Setup

2. Import Data (ETL Phase)

3. Train the AI (RAG Synchronization)

4. Run the Local LLM Engine

5. Launch the Server

Detailed Execution Flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
core		core
db		db
scripts		scripts
static		static
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Tally AI Multi-Tenant Chatbot

Architecture & Technology Stack

Project Structure

Prerequisites & Installation

Install Python Dependencies

Setup Instructions

1. Database Setup

2. Import Data (ETL Phase)

3. Train the AI (RAG Synchronization)

4. Run the Local LLM Engine

5. Launch the Server

Detailed Execution Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages