Understanding the Data

For banks, it's important to accurately predict whether a new customer is likely to repay a loan. In finance, mathematical models are often used to assign each person a credit score (e.g., the Schufa score). We assume that reliable data was used to calculate the credit score, and we treat the credit score as a meaningful measure of a person's creditworthiness.

In this learning module, we use fictional data from many loan applicants that could have realistically occurred in the past. Each of these individuals has a credit score between 0 (loan is unlikely to be repaid) and 100 (loan is very likely to be repaid). We also know whether each person actually repaid their loan in the past or not.

Before working with a larger dataset, you will first explore the structure of the data and how it can be visualized using scatter plots. In the plots, each point represents one person.

Dataset 1

Here is a table with fictional but realistic data.

Which of the following scatter plots A, B, or C represents the data from the table? You can sort the table by name, credit score, or past payment reliability by clicking the column headers. Scroll through the table to see all entries. Write down your answer.

Option A

Option B

Option C

Dataset 2

Here is another table.

Which scatter plot represents the data in the table? Write down your answer.

Option A

Option B

Option C

Dataset 3

Here is another table.

Which scatter plot represents the data in the table? Write down your answer.

Option A

Option B

Option C

Now check your answers