[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
-
Updated
May 13, 2026 - Python
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
97% token reduction for AI coding sessions β zero deps, 31 languages, MCP server
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
CLI proxy that reduces LLM token usage by 60-90%. Declarative YAML filters for Claude Code, Cursor, Copilot, Gemini. rtk alternative in Go.
A smart context filter that removes noise, refines and enhances responses, also slashes token usage by up to 90%.
Make your agents leaner and faster. Itβs not just about saving time; itβs about the feeling of not wasting it.
A discovery and compression tool for your Python codebase. Creates a knowledge graph for a LLM context window, efficiently outlining your project | Code structure visualization | LLM Context Window Efficiency | Static analysis for AI | Large Language Model tooling #LLM #AI #Python #CodeAnalysis #ContextWindow #DeveloperTools
AI-powered text compression library for RAG systems and API calls. Reduce token usage by up to 50-60% while preserving semantic meaning with advanced compression strategies.
A lightweight tool to optimize your Javascript / Typescript project for LLM context windows by using a knowledge graph | AI code understanding | LLM context enhancement | Code structure visualization | Static analysis for AI | Large Language Model tooling #LLM #AI #JavaScript #TypeScript #CodeAnalysis #ContextWindow #DeveloperTools
[CVPR 2025] PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
ZON β 35-70% cheaper LLM prompts than JSON/TOON. Zero overhead.
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
A lightweight tool to optimize your C# project for LLM context windows by using a knowledge graph | Code structure visualization | Static analysis for AI | Large Language Model tooling | .NET ecosystem support #LLM #AI #CSharp #DotNet #CodeAnalysis #ContextWindow #DeveloperTools
token-ninja routes deterministic shell commands locally β zero LLM calls, ~19Β΅s latency. Works silently inside AI tools via MCP.
A discovery and compression tool for your Java codebase. Creates a knowledge graph for a LLM context window, efficiently outlining your project #LLM #AI #Java #CodeAnalysis #ContextWindow #DeveloperTools #StaticAnalysis #CodeVisualization
An Open Source Intelligent Codebase Visualizer for javascript, reactjs, nextjs and nodejs for easy PR review, fast Onboarding and deep architectural understanding
CLI proxy for coding agents that cuts noisy terminal output while preserving command behavior
β‘ Cut Claude token usage by 90%+ β free, open-source, local-first context compression for Claude Code. Hybrid RAG (BM25 + ONNX vectors), AST chunking, reranking. No API needed.
Token-compression skill. An adaptation of caveman β short common words, trust context, say just enough, be laconic.
π Awesome papers on token redundancy reduction
Add a description, image, and links to the token-reduction topic page so that developers can more easily learn about it.
To associate your repository with the token-reduction topic, visit your repo's landing page and select "manage topics."