Catching a Welcher: Classifying a Credit Card Defaulter (2024)

Published in

Towards Data Science

11 min read

What it means to default on a credit card (From a Credit Card User point of View)

In case you’re unclear on what defaulting on a credit card means, here’s the gist: After you’ve failed to make a payment on your credit card for 180 days (or as decided by your credit card company), your issuer assumes you’re probably never going to. At this point, the issuer can (and usually does) close your card, write off what you owe as bad debt and sell your account to a collections agency.

Why is it important?

Now your credit card issuer is out of the picture. Your debt belongs to a collector, and you’ll start getting flooded with phone calls asking you to pay the bill. Although you can send written notice asking them to stop calling you and there are laws limiting what debt collectors can say and do when they contact you, you’ll still owe the debt. And if you don’t deal with it one way or another, you could get sued.

Also, the default will be reported to the three major credit bureaus (or more, depending on your country). Your credit score will take a nosedive, and the blemish will stay on your credit report for up to seven years.

In short, defaulting on a credit card bill has serious consequences. You should make it a priority to handle this matter as soon as you can.

What does it mean from a Bank’s Point of View?

Well, it’s analogus to if someone asked to borrow some money from you, and never pays you back (I am severely oversimplifying it but you get it). You’re going to be 100 bucks short and the other person 100 bucks happier (I really need to work on my analogies).

A bad loan for the bank can therefore lead to a huge financial loss in the event of a defaulter and thus requires heavy risk-mitigation regarding this issue. Furthermore, the bank loaning process can be quite a manual and tiresome task in identifying who might be a potential defaulter. The loaning process is also quite ependent on the loan processors themselves, leaving a lot of it to subjectivity which might possibly turn into a bad loan in the future.

But then again, on the flipside, banks can also earn a lot from people who actually pay back their loans. Additionally, we don’t want to be turning people away who could really use that money to get their lives back on track or achieve their dreams (i.e. setting up a business they’ve always wanted to).

How then, do we properly assess if someone is going to be a credit risk or not?

What are the telltale signs of a person’s behavior that makes him/her a consistent welcher? (yes, I am aware that I am quoting urban dictionary)

Therefore, this project aims to bridge this gap of uncertainty by utilizing a data-driven approach by using past data of credit card customers in conjunction with machine learning to predict whether or not a consumer will default on their credit cards.

Catching a Welcher: Classifying a Credit Card Defaulter (4)

The goal behind using this model is to achieve two things:

Bring more consistency to the loaning process and;
Investigate what the key drivers are behind a potential defaulter

Catching a Welcher: Classifying a Credit Card Defaulter (5)

In addition to answering these pressing questions, I personally wanted to focus on the learning process as well; to become better/more efficient with a data science project workflow, and also enter a domain I was unfamiliar with (and uncomfortable with, having come from a Mechanical Engineering background) to step out of my comfort zone. With that being said, let’s delve into how one may embark on a Classifier Data Science Project.

Section 1.1: Dataset Description

The dataset considered in this analysis is the “Default of Credit Card Clients” dataset released under the public liscense of Creative Commons and available on the Kaggle website.

This dataset contains 30000 observations of 25 variables from a bank (and also a cash and credit card issuer in Taiwan); where each observation corresponds to a particular credit card client. Among the total 30000 observations, 6636 observations (22.1%) are cardholders with default payment.

The 25 variables in this dataset comprises of demographic variables (gender, education level, marriage status, and age) and financial variables of 6-months worth of payment data from April 2005 to September 2005 (amount of given credit, monthly repayment statuses, monthly amount of bill statements, and monthly amount of previous payments). The breakdown of what each variable means is stated in the next section.

Section 1.2: Understanding the Variables

ID: ID of each client

LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit

Section 1.3: Data Cleaning

Some of the columns didn’t make sense to me, so I decided to rename them into more understandable terms.

Catching a Welcher: Classifying a Credit Card Defaulter (6)

We can also see that the columns contain no null values — and a clean DataFrame is always more than welcome.

Catching a Welcher: Classifying a Credit Card Defaulter (7)

While all is fine and dandy in the no-null neighborhood, things aren’t going so smoothly in other areas (it never does — that’s just a fact in data cleaning).

Catching a Welcher: Classifying a Credit Card Defaulter (8)

We see from a simple df.describe().T that two things stand out: Education and Marriage.

Education

Catching a Welcher: Classifying a Credit Card Defaulter (9)

Marriage

Catching a Welcher: Classifying a Credit Card Defaulter (10)

These don’t align with the variable breakdown as per the dataset, thus I have decided to group the weird entries as ‘others’ instead.

Summarising data preprocessing

Here is a quick recap of pre-processing manipulations we did.

deault.payment.next.monthto DEFAULT
SEXto GENDER
PAY_0 to PAY_1
Grouped unknown EDUCATIONcategories (0,5,6) and re-assigned them to 4 (others)
Grouped unknown MARRIAGEcategories (0) and re-assigned them to 3 (others)

Section 2.1: Mapping the Target

Catching a Welcher: Classifying a Credit Card Defaulter (11)

Upon closer inspection of our dataset, we can see that there is some class imbalance, something of which we have to keep in mind when evaluating the efficacy of our model (i.e., we cannot use accuracy to measure our model’s effectiveness). Additionally an imbalanced class will also affect the performance of the trained model. There are, however, a few ways to deal with class imbalance:

Oversample the Dataset
SMOTE the Dataset
Undersample the Dataset
Do nothing (yes, you heard me right. do nothing. sometimes, it gives you better results than fiddling with the dataset)

Section 3.1: Vanilla (untouched) Dataset and Models Used

Now that we have our dataset prepped and features finalized, we are ready to do some training/cross-validation to get a baseline feel of how our models perform on the data.

The models I have selected for use are:

Gaussian Naive Bayes
Logistic Regression
K-Nearest Neighbors
Decision Tree
Random Forest
Linear SVC

Section 3.2: Model Evaluation Metric

As we have mentioned above, accuracy would not be an accurate (pun intended) metric to use to evaluate how each model performs over one another. F1 score will be used instead.

Catching a Welcher: Classifying a Credit Card Defaulter (12)

I have chosen to evaluate the models based on their F1 score because it considers both precision and recall. In essence, we can take the F1 score as a harmonic mean of both the precision and recall.

In layman terms, Recall, in the case of credit defaulters refers to:

Out of ALL the defaulters (True Positive and False Negatives), how many did our model actually get correct?

Precision refers to:

How correct is our model based on it’s own predictions (True Positives and False Positives)?

Additionally, recall will hold more importance over precision in the case of classifying defaulters, because we want to be able to catch as many potential defaulters as possible so as to not incur losses to the bank. This will be made clearer in the following sections.

Section 3.3: Preliminary Model Testing — Obtaining a Baseline Performance using Vanilla Dataset

Catching a Welcher: Classifying a Credit Card Defaulter (13)

From the preliminary results we get from testing our models on the vanilla dataset (no transformation or scaling done on the features), we can see that the F1 score comes out pretty low.

Now that we have a baseline, we can proceed to work on improving the F1 score.

Section 3.3: Model Selection using F1 Score

Catching a Welcher: Classifying a Credit Card Defaulter (14)

Each of the 5 models were trained on different variations of the dataset; scaled and unscaled oversampled, undersampled, SMOTE and vanilla datasets. As mentioned previously, there are a few ways to deal with imbalanced datasets. Oversampling, undersmapling and SMOTE were applied in this project.

Catching a Welcher: Classifying a Credit Card Defaulter (15)

After doing all the train/cv tests, the dataset that gave us the highest F1 score is the scaled vanilla dataset, with Gaussian Naive Bayes taking the lead at 0.518. However, the models were all trained on their default settings. Therefore, I have chosen to further hyper-parameter tune the leading two highest F1 scoring models (k-NN at 0.417 and RandomForest at 0.41) in a bid to obtain a higher F1 score than 0.518.

Catching a Welcher: Classifying a Credit Card Defaulter (16)

Despite hyper-parameter tuning k-NN and RandomForest, they each still gave an F1 score lower than Gaussian Naive Bayes’ 0.51. Therefore, Gaussian Naive Bayes was chosen as the Model to classify the defaulters.

Section 3.4: Optimizing Gaussian Naive Bayes for Credit Card Default Classification

Catching a Welcher: Classifying a Credit Card Defaulter (17)

Upon plotting the confusion matrix of the Gaussian Naive Bayes with it’s default threshold of 0.5 (with F1 score of 0.517), we observe that it gives a very poor recall score. The above matrix tells us that the model only catches 56% of all the defaulters (570 out of 1019). Which means that misses 44% of all the defautlers. This definitely does not spell good news for our hypothetical bank (very similar to credit card frauds) because we are letting a lot of non-loan-returning credit card defaulters go undetected. Therefore, there is a need to further optimize the threshold to increase the model’s recall.

Catching a Welcher: Classifying a Credit Card Defaulter (18)

After performing some optimzation, it is found that at a threshold of 0.25 gives a better recall score of 0.76 (778 out of 1019) at the expense of decreased precision (we see that the False Positives increased to 1847).

Catching a Welcher: Classifying a Credit Card Defaulter (19)

However, we have to understand this is an unavoidable tradeoff. A higher recall on the same model will lead to a lower precision (higher False Positives) since we essentially just adjusting the threshold, and not changing the model itself.

What does a higher False Positive mean?

A higher False Positive means that the model classifies more people as defaulters despite them not being actual defaulters. From our hypothetical bank’s point of view, this means that we will be inconveniencing people by labelling them as potential non-loan-returners.

The key consideration here is — is this tradeoff justifiable? How do we justify it as a good tradeoff? This actually depends on the goals of the business that you are building this model for, and the level of accuracy desired.

Since we don’t have an actual business advocating a score, we will decide a hypothetical case for ourselves — that we want to be more risk-adverse than precise. Therefore, having a higher recall to catch more defaulters outweighs the cost of inconveniencing more people (lower precision).

Now with our finalized permutation, using a Gaussian Naive Bayes model with a threshold of 0.25 on a scaled dataset, we are ready to do our final testing by training our model on 80% of the original data (that we did our train/cross-validation on) and testing on the other 20% unseen holdout data.

Catching a Welcher: Classifying a Credit Card Defaulter (20)

Final tests show that our model actually performs even better on unseen data — giving us a higher recall score of 0.79 (1058 out of 1341). This means that our model is able to accurately catch around 80% of all the defaulters.

This project was completed over a span of 2 weeks whilst juggling daily classes from 8am to 5pm. So of course, there is more that we can do to improve our model and analysis of this credit card defaulter case.

Given more time and resources, I’d like to do work on the following:

Feature Engineering; coming up with new features given the current dataset that could potentially be better predictors of credit card defaulters
Hyper-parameter tune other models

Catching a Welcher: Classifying a Credit Card Defaulter (21)

From this graph, we can derive some interesting insights behind the behaviour of a defaulter. The top 3 telling drivers of whether someone is a defaulter or not boils down to their:

Repayment status in just 1 month (which means that just by looking at how late someone is on their payments in 1 month, we can tell if they will default or not)
Age
Limit Balance

Feature Manipulation

In this case, categorical columns like PAY_0 , …, PAY_6,MARRIAGE,EDUCATION may not have been represented the dataset in the best way. A better way could have been to do one-hot encoding (creating dummy variables) instead. Classifiers like RandomForests are great at segregating columns like these and could have resulted in a model that leads to better prediction of credit card defaulters.

As always, you can find the code to this project here, and my linkedin here. Please feel free to drop me a message should you have any questions regarding this project. Till next time!