Understanding the Data
For banks, it's important to accurately predict whether a new customer is likely to repay a loan. In finance, mathematical models are often used to assign each person a credit score (e.g., the Schufa score). We assume that reliable data was used to calculate the credit score, and we treat the credit score as a meaningful measure of a person's creditworthiness.
In this learning module, we use fictional data from many loan applicants that could have realistically occurred in the past. Each of these individuals has a credit score between 0 (loan is unlikely to be repaid) and 100 (loan is very likely to be repaid). We also know whether each person actually repaid their loan in the past or not.
Before working with a larger dataset, you will first explore the structure of the data and how it can be visualized using scatter plots. In the plots, each point represents one person.
Dataset 1
Here is a table with fictional but realistic data.
Option A
Option B
Option C
Dataset 2
Here is another table.
Option A
Option B
Option C
Dataset 3
Here is another table.