A Machine Learning project based on the famous Titanic dataset from Kaggle.
This project predicts whether a passenger survived or not using different classification algorithms and compares their performance.
The Titanic dataset is one of the most popular beginner datasets in Machine Learning.
In this project:
- Data preprocessing and cleaning were performed
- Multiple ML models were trained
- Model accuracy and precision were compared
- Best-performing models were identified
Dataset used:
- Titanic Dataset from Kaggle
https://www.kaggle.com/competitions/titanic
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
| Algorithm | Accuracy | Precision |
|---|---|---|
| BaggingClassifier | 0.837838 | 0.947368 |
| Decision Tree | 0.837838 | 0.947368 |
| AdaBoost | 0.837838 | 0.904762 |
| Gradient Boosting | 0.810811 | 0.900000 |
| Random Forest | 0.756757 | 0.791667 |
| KNN | 0.702703 | 0.750000 |
| Logistic Regression | 0.729730 | 0.740741 |
| SVC | 0.621622 | 0.636364 |
| Extra Trees Classifier | 0.621622 | 0.636364 |
| Naive Bayes | 0.594595 | 0.611111 |
β
BaggingClassifier
β
Decision Tree
β
AdaBoost
These models achieved the highest accuracy and precision on the dataset.
- Data Cleaning
- Missing Value Handling
- Exploratory Data Analysis (EDA)
- Feature Encoding
- Model Training
- Model Evaluation
- Accuracy & Precision Comparison
Clone the repository:
git clone https://github.com/eddiebrock911/Titanic-Machine-Learning.gitMove into the project folder:
cd Titanic-Machine-LearningInstall dependencies:
pip install -r requirements.txtRun the project:
python app.py- Hyperparameter Tuning
- Cross Validation
- Feature Engineering
- XGBoost Integration
- Model Deployment
Contributions are welcome.
Feel free to fork this repository and submit pull requests.
This project is licensed under the MIT License.
Ankit Kumar
GitHub: https://github.com/eddiebrock911