top of page

Predicting Credit Card Default

Eight predictive classification techniques are trialled, and then an Extreme Gradient Boosted model is refined to deliver Kaggle competition topping default predictions.

Following the Cross Industry Process for Data Mining (CRISP-DM) methodology, classification techniques such as Random Forest, Support Vector Machines and Neural Networks are assessed for their predictive utility in a credit card default setting.




Extreme Gradient Boosted and Random Forest techniques show the greatest early promise and are both thereafter tested with a variety of feature engineering interventions and degrees of hyperparameter tuning. The Extreme Gradient Boosted model is shown to deliver materially better predictive performance than its Random Forest Counterpart. Statistcical confidence in this over-performance exceeds the 95% confidence threshold.


The refined Extreme Gradient Boosted model tops the leaderboard in the associated Kaggle competition table - see 'The Algorithmic Ensemble' Area Under the Curve (AUC) score of 0.79097.


An executive presentation of the modelling, as well as commercially valuable data insights into the performance and current state of the credit card lending portfolio, as well as a detailed technical report were delivered as part of this project.


crCardDefault_execPresentation
.pdf
Download PDF • 878KB

crCardDefault_detailedTechReport
.pdf
Download PDF • 2.44MB



bottom of page