Skip to content

joshsoftware/tally.ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tally AI Multi-Tenant Chatbot

Tally AI is an autonomous, natural language Text-to-SQL engine that lets users query their Tally ERP accounting data conversationally. Built on a multi-tenant PostgreSQL vector architecture, it ensures complete data isolation between companies while securely translating informal questions (e.g., "What's my cash flow?") into accurate, executable PostgreSQL queries.

Architecture & Technology Stack

  • Backend: FastAPI (Python 3.10+) ⚑
  • Frontend: Vanilla HTML/CSS/JS with a futuristic dynamic UI.
  • Database: PostgreSQL (with pgvector for similarity search).
  • LLM Engine: Local Ollama instance (qwen3-coder-next:cloud).
  • ORM / Drivers: SQLAlchemy and Psycopg2.

Project Structure

/home/josh/tally/
β”œβ”€β”€ app.py                  # Main FastAPI application entry point.
β”œβ”€β”€ requirements.txt        # Python dependency list.
β”œβ”€β”€ api/
β”‚   └── routes.py           # Core API endpoints (receives chat POST requests).
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ orchestrator.py     # Pipeline coordinator (ties together embedding + RAG + AI + Execution).
β”‚   β”œβ”€β”€ embedding.py        # Generates text embeddings for similarity search.
β”‚   β”œβ”€β”€ retriever.py        # Connects to `pgvector` to find relevant schema & few-shot examples.
β”‚   β”œβ”€β”€ generator.py        # The AI Brain: constructs the strict prompt and calls Ollama.
β”‚   β”œβ”€β”€ executor.py         # Advanced execution and safety engine (handles column corrections, validation).
β”‚   └── formatter.py        # Formats the returned database rows into Markdown tables for the UI.
β”œβ”€β”€ db/
β”‚   β”œβ”€β”€ connection.py       # DB connection utilities.
β”‚   └── schema_data.py      # Core DDL Definitions + Semantic Human Language mappings.
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ setup_db.py         # Creates core database tables and sets up RLS (Row-Level Security).
β”‚   β”œβ”€β”€ tally_import.py     # ETL pipeline that normalizes and imports Tally Excel exports.
β”‚   └── sync_rag.py         # The Learning Engine: indexes few-shot templates and schema into pgvector.
└── static/
    └── index.html          # Chat interface UI.

Prerequisites & Installation

Before beginning setup, ensure your system has the following installed:

  1. Python 3.10+ (with pip)
  2. PostgreSQL (running locally on port 5432)
  3. pgvector extension (Must be installed and enabled in your PostgreSQL database)
  4. Ollama (Local LLM runner for serving qwen3-coder-next:cloud)

Install Python Dependencies

Install all required Python packages via the requirements.txt file:

pip install -r requirements.txt

Setup Instructions

1. Database Setup

Make sure you have PostgreSQL running locally with the pgvector extension installed.

Run the setup script to drop existing data, set up the schema, enable vector extensions, and enforce Row-Level Security:

export PYTHONPATH=$PYTHONPATH:.
python3 scripts/setup_db.py

2. Import Data (ETL Phase)

Load your raw Tally Excel exports into the newly created database. (This script normalizes columns and units natively).

python3 scripts/tally_import.py

3. Train the AI (RAG Synchronization)

To make the AI aware of the schema and our expertly crafted few-shot logic templates, we need to embed the memory into pgvector. Run the RAG Sync script:

python3 scripts/sync_rag.py

4. Run the Local LLM Engine

Ensure you have Ollama up and running with the required model:

ollama serve
ollama pull qwen3-coder-next:cloud

5. Launch the Server

Start the FastAPI server:

uvicorn app:app --host 0.0.0.0 --port 8000 --reload
# Or directly simply run: python3 app.py

Detailed Execution Flow

When a user sends a message through the frontend, the following intricate pipeline executes to return an accurate result:

  1. Routing (app.py β†’ api/routes.py): A user accesses a tenant-specific URL (http://localhost:8000/{UUID}). All chat queries are automatically scoped to this specific UUID.

  2. Orchestration (core/orchestrator.py): The request drops into the orchestrator pipeline.

  3. Retrieval (core/embedding.py β†’ core/retriever.py): The user's question (e.g., "Show pending receivables") is converted to an embedding. The system asks PostgreSQL/pgvector to find the top 5 most semantically related Few-Shot logic templates (from sync_rag.py) and schema definitions.

  4. Generation (core/generator.py): A highly strict Prompt Template is injected with:

    • The user's query.
    • The verified database schema (with semantic annotations).
    • "Hard Rules" forbidding math abstractions on text columns.
    • A semantic alias mapping block. This prompt is zipped off to the local Ollama LLM (qwen3-coder-next:cloud).
  5. Safety Engine & Execution (core/executor.py): The returned raw SQL is intercepted by an advanced auto-cleaner. The script intelligently strips out hallucinations (account_group_name β†’ group_name), removes unwanted numeric functions like to_number against unit columns, and fixes collapsed wildcard ILIKE phrases. Finally, the executor runs the query as an enforced Read-Only connection filtered locally under SET LOCAL app.current_tenant.

  6. Formatting (core/formatter.py): The psycopg2 output rows are formatted neatly into markdown structures. The frontend (static/index.html) parses the markdown into a beautiful UI response!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors