A comprehensive framework for optimizing deep learning models through advanced compression techniques
Key Features โข Architecture โข Installation โข Usage โข Results โข Roadmap โข Contributing
The Model Compression Pipeline is an end-to-end framework designed to compress state-of-the-art deep learning models while preserving accuracy. This project implements various compression techniques including pruning, quantization, and knowledge distillation, allowing researchers and practitioners to optimize models for deployment on resource-constrained devices.
- Modular Architecture: Easily extensible framework for experimenting with different compression techniques
- Multiple Compression Techniques:
- ๐ช Pruning: Remove redundant weights and connections
- ๐ข Quantization: Convert weights to lower precision formats
- ๐ง Knowledge Distillation: Train smaller "student" models from larger "teacher" models
- ๐ซ Lottery Ticket Hypothesis: Find and train sparse subnetworks with initial weights
- Model Support:
- CNN architectures (ResNet, MobileNet, EfficientNet)
- Vision Transformers (ViT variants)
- Dataset Integration:
- CIFAR-10/100, ImageNet, Oxford Flowers102
- Comprehensive Evaluation:
- Accuracy, model size, inference latency, memory usage benchmarks
- Detailed visualization and reporting
The pipeline consists of the following key components:
model_compression_pipeline/
โโโ src/ # Source code
โ โโโ data/ # Data loading and preprocessing
โ โโโ models/ # Model architectures and training
โ โโโ compression/ # Compression techniques implementation
โ โโโ evaluation/ # Benchmarking and comparison
โ โโโ utils/ # Helper utilities
โโโ experiments/ # Jupyter notebooks for experiments
โโโ docs/ # Documentation
โโโ results/ # Saved models and metrics
| Technique | Description | Benefits |
|---|---|---|
| Pruning | Removes weights based on magnitude, importance, or structure | Reduces model size and computation with minimal accuracy impact |
| Quantization | Converts 32-bit floats to lower-precision (8-bit, 4-bit, 2-bit) | Significantly decreases model size and improves inference speed |
| Knowledge Distillation | Trains compact models using the output of larger models | Creates smaller, faster models that retain knowledge from larger ones |
| Lottery Ticket Hypothesis | Finds sparse subnetworks with comparable performance to full networks | Identifies highly efficient subnetworks that train effectively from initialization |
- Python 3.8+
- CUDA-compatible GPU (recommended)
# Clone the repository
git clone https://github.com/1Utkarsh1/model-compression-pipeline.git
cd model-compression-pipeline
# Install dependencies
pip install -r requirements.txt# Train a baseline model and apply all compression techniques
python src/main.py --mode full --model resnet50 --dataset cifar10Baseline Model Training
# Train baseline ResNet50 on CIFAR-10
python src/main.py --mode baseline --model resnet50 --dataset cifar10 --epochs 100Pruning
# Apply magnitude-based pruning with 50% sparsity
python src/main.py --mode prune --model resnet50 --dataset cifar10 --prune_rate 0.5 --prune_method magnitudeQuantization
# Apply 8-bit post-training quantization
python src/main.py --mode quantize --model resnet50 --dataset cifar10 --bits 8 --quantize_method post_trainingKnowledge Distillation
# Distill from ResNet50 to ResNet18
python src/main.py --mode distill --model resnet50 --dataset cifar10 --student resnet18Lottery Ticket Hypothesis
# Apply Lottery Ticket pruning with 5 iterations
python src/main.py --mode lottery_ticket --model resnet50 --dataset cifar10 --lottery_iterations 5 --lottery_prune_percent 0.2Generate Comparison Report
# Generate a comprehensive HTML report comparing all techniques
python src/main.py --mode report --model resnet50 --dataset cifar10Here's a comparison of different compression techniques applied to ResNet50 on CIFAR-10:
| Model | Accuracy | Size (MB) | Inference Time (ms) | Memory Usage (MB) |
|---|---|---|---|---|
| Baseline (ResNet50) | 92.5% | 97.8 | 125 | 550 |
| Pruned (50%) | 91.8% | 49.2 | 110 | 320 |
| Quantized (8-bit) | 92.1% | 24.6 | 85 | 210 |
| Distilled (ResNet18) | 89.3% | 44.7 | 60 | 290 |
| Lottery Ticket (5 iter) | 91.2% | 31.5 | 105 | 280 |
You can easily extend the pipeline to support custom models:
# In src/models/custom_model.py
class MyCustomModel(nn.Module):
def __init__(self, num_classes):
super(MyCustomModel, self).__init__()
# Define your model architecture
self.features = ...
self.classifier = nn.Linear(feature_dim, num_classes)
def forward(self, x):
x = self.features(x)
return self.classifier(x)
# Then register it in load_baseline_model functionSupport for new datasets can be added as follows:
# In src/data/data_loader.py
# Add to get_transforms and load_dataset functions
elif dataset_name.lower() == 'my_dataset':
# Define transforms
train_transforms = transforms.Compose([...])
test_transforms = transforms.Compose([...])
# Load dataset
train_dataset = MyDataset(...)
val_dataset = MyDataset(...)
test_dataset = MyDataset(...)The modular architecture allows adding new compression techniques:
# In src/compression/my_technique.py
def apply_my_compression(model, ...):
# Implement your compression logic
return compressed_model- Implement basic pruning techniques
- Implement quantization (8-bit, 4-bit, 2-bit)
- Implement knowledge distillation
- Add Vision Transformer support
- Add Lottery Ticket Hypothesis implementation
- Support for hardware-aware compression
- Add NLP model support (BERT, GPT variants)
- Deploy compressed models to mobile/edge devices
- Add AutoML for finding optimal compression strategies
- Support for continuous compression during training
If you use this work in your research, please cite:
@software{model_compression_pipeline,
author = {Utkarsh Rajput},
title = {Model Compression Pipeline},
year = {2025},
url = {https://github.com/1Utkarsh1/model-compression-pipeline}
}Contributions are welcome! Please see CONTRIBUTING.md for detailed guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- PyTorch for the deep learning framework
- TensorFlow Model Optimization Toolkit for inspiration on compression techniques
- Hugging Face Transformers for transformer model implementations
- Jonathan Frankle and Michael Carbin for the Lottery Ticket Hypothesis