The Rosemann Sales Prediction Project aims to forecast the daily sales of retail stores based on historical sales data, promotions, holidays, and store-related attributes.
Accurate forecasting helps the company optimize inventory planning, workforce allocation, and promotional strategies, ultimately improving revenue management.
This project is inspired by the Rossmann Store Sales Forecasting Kaggle competition.
Predict daily sales for each store for the next 6 weeks using machine learning models that learn patterns from past data and business features.
Retail chains like Rosemann operate thousands of stores, each influenced by local factors such as holidays, promotions, store type, competition, and seasonality.
The task is to build a predictive model that can estimate future daily sales for each store given these dynamic conditions.
- Understand sales behavior through exploratory data analysis (EDA).
- Engineer meaningful time-based and business features.
- Build and compare multiple regression and ensemble models.
- Deploy a simple prediction app or API for real-time forecasting.
| Feature | Description |
|---|---|
Store |
Unique ID for each store |
Date |
Date of observation |
DayOfWeek |
Day of the week (1 = Monday, 7 = Sunday) |
Sales |
Target variable โ daily revenue of the store |
Customers |
Number of customers that visited the store |
Open |
Whether the store was open (0 = closed, 1 = open) |
Promo |
Whether the store was running a promotion |
StateHoliday |
Indicates a state or public holiday |
SchoolHoliday |
Indicates if a (school) holiday affected store operation |
StoreType |
Categorical variable describing the store model |
Assortment |
Level of product variety |
CompetitionDistance |
Distance to the nearest competitor store |
CompetitionOpenSince[Month/Year] |
When the nearest competition opened |
Promo2 |
Whether the store participates in a continuing promotion |
Promo2Since[Year/Week] |
When the store started participating in Promo2 |
The following patterns and insights were explored:
- Sales trends over time (seasonality, weekends, holidays)
- Effect of promotions and competition distance
- Correlation between store type and average sales
- Impact of holidays and school closures
EDA revealed that:
- Stores experience weekly seasonality.
- Promotions significantly increase sales.
- Stores closer to competitors tend to have slightly lower sales.
- Certain store types have consistently higher sales volumes.
Key engineered features:
- Temporal features:
day,month,year,weekofyear,is_weekend - Cyclical encoding:
sin_costransformation for time features - Competition & Promo features:
competition_age,promo_duration - Categorical encoding: One-hot or label encoding for
StoreType,Assortment, etc. - Log transformation of target variable (
Sales) for normalization
Models used:
- Linear Regression โ baseline
- Decision Tree
- Random Forest Regressor
- Gradient Descent
- Xgboost
- Time-based train-test split (to avoid data leakage)
- Cross-validation using TimeSeriesSplit
- Feature importance analysis from tree-based models
- Root Mean Squared Error (RMSE)
- Mean Absolute Percentage Error (MAPE)
- Rยฒ Score
Best model (XGBoost) achieved:
The model was deployed as:
- A Fast API endpoint that accepts
store_idanddate rangeto return predicted sales. - A Streamlit dashboard for interactive visualizations and predictions.
Example API endpoint:
POST /predict
{
"store_id": 102,
"date": "2015-08-01"
}
git clone https://github.com/Vaishnavi-vi/rossmann-sales-prediction.git
cd rossmann-sales-prediction
code .
fastapi->uvicorn main.main:app --reload
streamlit-> streamlit run frontend/frontend.py