Data Analyst & ML Developer who builds and deploys real, working AI systems — not just notebooks.
I'm a Computer Science undergraduate at ITER, SOA University (graduating 2027), selected for McKinsey Forward Program, Google Gen AI Academy APAC, and Guidewire Hackathon (2026) — and a Round 2 qualifier at L&T Technology Services Techgium national hackathon (Oct 2025).
My work spans RAG-based LLM systems, ML pipelines, business intelligence dashboards, and funnel analytics — all deployed end-to-end.
- 🔭 Currently building Churn Autopsy — a production-style churn prediction system with a FastAPI inference layer and SHAP explainability
- 🧠 Exploring LLMs, vector databases, and agentic AI architectures
- 📊 Building analytics tools that translate raw data into business decisions
- 📍 Based in India — open to Data Analyst / Junior Data Scientist internships & fresher roles (India & International)
| Programme | Organisation | Year |
|---|---|---|
| Forward Program — Selected Participant | McKinsey & Company | 2026 |
| Gen AI Academy APAC — Selected Participant | 2026 | |
| Guidewire Hackathon — Seed 2 Phase | Guidewire Software | 2026 |
| Techgium National Hackathon — Round 2 Qualifier | L&T Technology Services | Oct 2025 |
Why customers leave — predicted before they do
Stack: Python · Scikit-learn · SMOTE · SHAP · FastAPI · Streamlit · Joblib · Pydantic
Highlights:
- 🧠 Trained Logistic Regression vs Random Forest vs Gradient Boosting — best model selected via 5-fold stratified CV (ROC-AUC: 0.744)
- ⚖️ SMOTE oversampling inside the pipeline to handle 33% class imbalance — test set never touched
- 🔌 Served predictions via FastAPI REST endpoint (
POST /predict) returning churn probability + top-3 SHAP explanations + business retention action per customer - 📊 Interactive Streamlit dashboard consuming the API — risk banner, probability bar, SHAP reason cards, recommended intervention
- 🏗️ Production-style separation:
src/train.py→models/churn_pipeline.pkl→api/main.py→app.py
Ask questions over any PDF in natural language — powered by a full RAG pipeline
Stack: Python · LangChain · FAISS · Sentence Transformers · Groq LLM · Streamlit · PyPDF
Highlights:
- 📄 Multi-PDF ingestion with semantic chunking
- 🔍 FAISS vector store for fast similarity search
- 💬 Context-aware Q&A with source-page citation
- ⚡ Groq LLM for low-latency inference — answers across 100-page documents in under 5 seconds
- 🖥️ Deployed chat-style UI via Streamlit
Production-deployed ML pipeline with real-time cluster predictions
Stack: Python · Scikit-learn · KMeans · Pandas · Joblib · Streamlit · Matplotlib
Highlights:
- 🧮 StandardScaler + KMeans (5 clusters) pipeline with Joblib model persistence — iterated across 72 commits
- 📈 Named business profiles: VIP · Upsell Targets · Retention Risk · Budget Loyalists · Discount Seekers
- 💡 Dashboard auto-generates segment-specific marketing recommendations at prediction time
- 📥 Downloadable CSV prediction reports
5,000-row employee survey → 6 publication-quality charts → HR policy brief
Stack: Python · Pandas · Seaborn · Plotly · Jupyter Notebooks
Highlights:
- 📋 76.1% of employees reported a mental health condition; 51.1% lacked employer resources
- 📉 Stress rates highest in Finance (35.6%), Healthcare (35.3%), Education (34.5%)
- 📝 Findings structured as a 4-recommendation HR policy brief written for a non-analytical audience
Track user drop-off across 4 product lifecycle stages
Stack: Python · Pandas · Plotly
Highlights:
- 📉 63.7% of users dropped at Site Visit → Sign-Up — identified as the single highest-leverage intervention point
- 📱 Mobile conversion (0.6%) trailed desktop (4.2%) by 3.6 percentage points — representing $8,828 in recoverable monthly revenue
- 📊 Output structured as a one-page prioritisation brief written for a product manager
Revenue seasonality, top-SKU identification, and regional patterns
Stack: Python · Pandas · Plotly · SQL
Highlights:
- 📦 SQL window functions and CTEs to identify top-margin SKUs driving disproportionate returns across 8 categories
- 🗺️ Regional revenue pattern analysis via interactive Plotly dashboards
- 💰 Monthly trend and category-mix views structured for a category manager audience
Automated dataset profiling — no code required for the end user
Stack: Python · Pandas · Streamlit · Plotly
Highlights:
- 🔍 Flags implicit nulls, mixed column types, and silent zero-inflation — anomalies visual inspection misses
- 📊 One-click stakeholder-ready Excel/CSV report generation
- ⏱️ Reduces a 30+ minute manual profiling process to a single file upload
| Strength | Evidence |
|---|---|
| Ships production-style systems | Churn Autopsy: training pipeline → REST API → dashboard — 3 independent layers |
| Explains ML decisions | SHAP per-customer explanations — not just global feature importance |
| Translates data into decisions | Mental Health EDA → HR policy brief; Funnel tool → $8,828 revenue opportunity identified |
| Selected by global organisations | McKinsey · Google · Guidewire · L&T Technology Services |
| Builds for real users | All deployed apps have clean UIs designed for non-technical audiences |
| End-to-end ML engineering | train.py → .pkl → FastAPI endpoint → Streamlit UI |
Actively seeking Data Analyst / Junior Data Scientist / ML Engineer internships and fresher roles — open to India and international opportunities.