mcm

MCM

mcm

Pull Requests MIT License Stars

Introduction

The mcm function is a simple tool for analyzed different metrics from a confusion matrix. Can be very useful to analyze the performance of a binary classification model.

This function depends of pandas and the function confusion_matrix from sklearn.metrics. With confusion_matrix the True Negative (TN) cases, True Positive (TP) cases, False Negative (FN) cases and False Positive (FP cases) are calculated.

The following table categorizes predicted and actual conditions into four fundamental outcomes: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

Here are some key sections and their purpose:

The table is designed to help in the interpretation of test results, particularly in the context of diagnostic tests or in the performance evaluation of classification algorithms. Each metric provides a different insight into the accuracy and reliability of the test or model, allowing practitioners to assess its effectiveness comprehensively.

The table below illustrates the confusion matrix used in our analysis. This matrix visualizes the performance of the classification algorithm, highlighting the true positives, true negatives, false positives, and false negatives. Understanding these values is crucial for evaluating the accuracy and reliability of our model.

confusion matrix

The following Python code demonstrates how to generate a confusion matrix using the pandas and sklearn.metrics libraries. This matrix is pivotal for calculating various performance metrics, such as precision and recall, which help further assess the effectiveness of the model.

import pandas as pd
from sklearn.metrics import confusion_matrix
?confusion_matrix
Compute confusion matrix to evaluate the accuracy of a classification.

By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`
is equal to the number of observations known to be in group :math:`i` and
predicted to be in group :math:`j`.

Thus in binary classification, the count of true negatives is
:math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
:math:`C_{1,1}` and false positives is :math:`C_{0,1}`.

Read more in the :ref:`User Guide <confusion_matrix>`.

Parameters
----------
y_true : array-like of shape (n_samples,)
    Ground truth (correct) target values.

y_pred : array-like of shape (n_samples,)
    Estimated targets as returned by a classifier.

labels : array-like of shape (n_classes), default=None
    List of labels to index the matrix. This may be used to reorder
    or select a subset of labels.
    If ``None`` is given, those that appear at least once
    in ``y_true`` or ``y_pred`` are used in sorted order.

sample_weight : array-like of shape (n_samples,), default=None
    Sample weights.

normalize : {'true', 'pred', 'all'}, default=None
    Normalizes confusion matrix over the true (rows), predicted (columns)
    conditions or all the population. If None, confusion matrix will not be
    normalized.

Returns
-------
C : ndarray of shape (n_classes, n_classes)
    Confusion matrix.

References
----------
.. [1] `Wikipedia entry for the Confusion matrix
       <https://en.wikipedia.org/wiki/Confusion_matrix>`_
       (Wikipedia and other references may use a different
       convention for axes)

Examples
--------
>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

>>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
>>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
>>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

In the binary case, we can extract true positives, etc as follows:

>>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
>>> (tn, fp, fn, tp)
(0, 2, 1, 1)
data = pd.DataFrame({
    'y_true': ['Positive']*47 + ['Negative']*18,
    'y_pred': ['Positive']*37 + ['Negative']*10 + ['Positive']*5 + ['Negative']*13})
confusion_matrix(y_true = data.y_true, 
                 y_pred = data.y_pred, 
                 labels = ['Negative', 'Positive'])
array([[13,  5],
       [10, 37]], dtype=int64)

In this case, from the confusion matrix we have the following results:

tn, fp, fn, tp = confusion_matrix(y_true = data.y_true, 
                                  y_pred = data.y_pred, 
                                  labels = ['Negative', 'Positive']).ravel()
(tn, fp, fn, tp)
(13, 5, 10, 37)

Metrics

Metrics are statistical measures of the performance of a binary classification model.

Sensitivity, Recall and True Positive Rate (TPR)

Sensitivity refers to the test’s ability to correctly detect negative class who do have the condition.

\[\Large \mbox{Sensitivity} = \dfrac{TP}{TP + FN}\]

Specificity and True Negative Rate (TNR)

Specificity relates to the test’s ability to correctly reject positive class without a condition.

\[\Large \mbox{Specifity} = \dfrac{TN}{TN + FP}\]

Precision and Positive Predictive Value (PPV)

\[\Large \mbox{Positive Predictive Value} = \dfrac{TP}{TP + FP}\]

Negative Predictive Value (NPV)

\[\Large \mbox{Negative Predictive Value} = \dfrac{TN}{TN + FN}\]

False Negative Rate (FNR)

\[\Large \mbox{False Negative Rate} = \dfrac{FN}{FN + TP}\]

False Positive Rate (FPR)

\[\Large \mbox{False Positive Rate} = \dfrac{FP}{FP + TN}\]

False Discovery Rate (FDR)

\[\Large \mbox{False Discovery Rate} = \dfrac{FP}{FP + TP}\]

Rate of Positive Predictions (RPP) or Acceptance Rate

\[\Large \mbox{Rate of Positive Predictions} = \dfrac{FP + TP}{TN + TP + FN + FP}\]

Rate of Negative Predictions (RNP)

\[\Large \mbox{Rate of Negative Predictions} = \dfrac{FN + TN}{TN + TP + FN + FP}\]

Accuracy

\[\Large \mbox{Accuracy} = \dfrac{TP + TN}{TP + TN + FP + FN}\]

F1 Score

\[\Large \mbox{F1 Score} = \dfrac{2\cdot TP}{2\cdot TP + FP + FN}\]

MCM function

The mcmhas been developed as:

def mcm(tn, fp, fn, tp):
    """Let be a confusion matrix like this:
    
    
      N    P
    +----+----+
    |    |    |
    | TN | FP |
    |    |    |
    +----+----+
    |    |    |
    | FN | TP |
    |    |    |
    +----+----+
    
    The observed values by columns and the expected values by rows and the positive class in right column. With these definitions, the TN, FP, FN and TP values are that order.
    
    
    Parameters
    ----------
    TN : integer
         True Negatives (TN) is the total number of outcomes where the model correctly predicts the negative class.
    FP : integer
         False Positives (FP) is the total number of outcomes where the model incorrectly predicts the positive class.
    FN : integer
         False Negatives (FN) is the total number of outcomes where the model incorrectly predicts the negative class.
    TP : integer
         True Positives (TP) is the total number of outcomes where the model correctly predicts the positive class.

    Returns
    -------
    sum : float
        Sum of values
    
    Notes
    -----
    https://en.wikipedia.org/wiki/Confusion_matrix
    https://developer.lsst.io/python/numpydoc.html
    https://www.mathworks.com/help/risk/explore-fairness-metrics-for-credit-scoring-model.html
    
    Examples
    --------
    data = pd.DataFrame({
    'y_true': ['Positive']*47 + ['Negative']*18,
    'y_pred': ['Positive']*37 + ['Negative']*10 + ['Positive']*5 + ['Negative']*13})
    
    tn, fp, fn, tp = confusion_matrix(y_true = data.y_true, 
                                  y_pred = data.y_pred, 
                                  labels = ['Negative', 'Positive']).ravel()
    
    """
    mcm = []
    
    mcm.append(['Sensitivity', tp / (tp + fn)])
    mcm.append(['Recall', tp / (tp + fn)])
    mcm.append(['True Positive rate (TPR)', tp / (tp + fn)])
    mcm.append(['Specificity', tn / (tn + fp)])
    mcm.append(['True Negative Rate (TNR)', tn / (tn + fp)])
    
    mcm.append(['Precision', tp / (tp + fp)])
    mcm.append(['Positive Predictive Value (PPV)', tp / (tp + fp)])
    mcm.append(['Negative Predictive Value (NPV)', tn / (tn + fn)])
        
    mcm.append(['False Negative Rate (FNR)', fn / (fn + tp)])
    mcm.append(['False Positive Rate (FPR)', fp / (fp + tn)])
    mcm.append(['False Discovery Rate (FDR)', fp / (fp + tp)])

    df_mcm.append(['Rate of Positive Predictions (PRR)'], (fp + tp) / (tn + tp + fn + fp))
    df_mcm.append(['Rate of Negative Predictions (RNP)'], (fn + tn) / (tn + tp + fn + fp))
    
    mcm.append(['Accuracy', (tp + tn) / (tp + tn + fp + fn)])
    mcm.append(['F1 Score', 2*tp / (2*tp + fp + fn)])
    
    tpr = tp / (tp + fn)
    fpr = fp / (fp + tn)
    
    fnr = fn / (fn + tp)
    tnr = tn / (tn + fp)
    
    mcm.append(['Positive Likelihood Ratio (LR+)', tpr / fpr])
    mcm.append(['Negative Likelihood Ratio (LR-)', fnr / tnr])
    
    return pd.DataFrame(mcm, columns = ['Metric', 'Value'])


?mcm
Let be a confusion matrix like this:


  N    P
+----+----+
|    |    |
| TN | FP |
|    |    |
+----+----+
|    |    |
| FN | TP |
|    |    |
+----+----+

The observed values by columns and the expected values by rows and the positive class in right column. With these definitions, the TN, FP, FN and TP values are that order.


Parameters
----------
TN : integer
     True Negatives (TN) is the total number of outcomes where the model correctly predicts the negative class.
FP : integer
     False Positives (FP) is the total number of outcomes where the model incorrectly predicts the positive class.
FN : integer
     False Negatives (FN) is the total number of outcomes where the model incorrectly predicts the negative class.
TP : integer
     True Positives (TP) is the total number of outcomes where the model correctly predicts the positive class.

Returns
-------
sum : DataFrame
      DataFrame with several metrics

Notes
-----
https://en.wikipedia.org/wiki/Confusion_matrix
https://developer.lsst.io/python/numpydoc.html
https://www.mathworks.com/help/risk/explore-fairness-metrics-for-credit-scoring-model.html

Examples
--------
data = pd.DataFrame({
'y_true': ['Positive']*47 + ['Negative']*18,
'y_pred': ['Positive']*37 + ['Negative']*10 + ['Positive']*5 + ['Negative']*13})

tn, fp, fn, tp = confusion_matrix(y_true = data.y_true, 
                              y_pred = data.y_pred, 
                              labels = ['Negative', 'Positive']).ravel()

The arguments of mcm function are: true positive (tn), false positive (fp), false negative (fn) and true positive (tp) is this order.

mcm(tn, fp, fn, tp)
Metric Value
0 Sensitivity 0.787234
1 Recall 0.787234
2 True Positive rate (TPR) 0.787234
3 Specificity 0.722222
4 True Negative Rate (TNR) 0.722222
5 Precision 0.880952
6 Positive Predictive Value (PPV) 0.880952
7 Negative Predictive Value (NPV) 0.565217
8 False Negative Rate (FNR) 0.212766
9 False Positive Rate (FPR) 0.277778
10 False Discovery Rate (FDR) 0.119048
11 Rate of Positive Predictions (PRR) 0.646154
12 Rate of Negative Predictions (RNP) 0.353846
13 Accuracy 0.769231
14 F1 Score 0.831461
15 Positive Likelihood Ratio (LR+) 2.834043
16 Negative Likelihood Ratio (LR-) 0.294599

Bootstrap

We can call the function mcm several times with a percentage of samples and estimate the distribution of each metrics.

In the following example, a 80% of the samples has been used in each iteration.

data = pd.DataFrame({
    'y_true': ['Positive']*47 + ['Negative']*18,
    'y_pred': ['Positive']*37 + ['Negative']*10 + ['Positive']*5 + ['Negative']*13})
mcm_bootstrap = []

for i in range(100):
    aux = data.sample(frac = 0.8) # 80% of the samples
    tn, fp, fn, tp =\
        confusion_matrix(y_true = aux.y_true, 
                         y_pred = aux.y_pred, 
                         labels = ['Negative', 'Positive']).ravel()

    mcm_bootstrap.append(mcm(tn, fp, fn, tp))

After 100 iterations we can be evaluate the mean, median, minimum, maximum and standar deviation for each metric.

pd\
    .concat(mcm_bootstrap)\
    .groupby('Metric')\
    .agg({'Value' : ['mean', 'median', 'min', 'max', 'std']})
Value
mean median min max std
Metric
Accuracy 0.769423 0.769231 0.711538 0.826923 0.026687
F1 Score 0.830781 0.828571 0.769231 0.883117 0.022230
False Discovery Rate (FDR) 0.120614 0.121212 0.055556 0.166667 0.024679
False Negative Rate (FNR) 0.212029 0.210526 0.131579 0.285714 0.031126
False Positive Rate (FPR) 0.278755 0.285714 0.142857 0.454545 0.053325
Negative Likelihood Ratio (LR-) 0.295672 0.294840 0.184211 0.416667 0.049432
Negative Predictive Value (NPV) 0.568801 0.571429 0.375000 0.722222 0.054842
Positive Likelihood Ratio (LR+) 2.942235 2.855263 1.824390 5.710526 0.641786
Positive Predictive Value (PPV) 0.879386 0.878788 0.833333 0.944444 0.024679
Precision 0.879386 0.878788 0.833333 0.944444 0.024679
Rate of Negative Predictions (RNP) 0.354346 0.365385 0.250000 0.423077 0.030707
Rate of Positive Predictions (PRR) 0.645654 0.634615 0.576923 0.750000 0.030707
Recall 0.787971 0.789474 0.714286 0.868421 0.031126
Sensitivity 0.787971 0.789474 0.714286 0.868421 0.031126
Specificity 0.721245 0.714286 0.545455 0.857143 0.053325
True Negative Rate (TNR) 0.721245 0.714286 0.545455 0.857143 0.053325
True Positive rate (TPR) 0.787971 0.789474 0.714286 0.868421 0.031126

Display

Loading the matplotlib and seaborn to display the results.

import matplotlib.pyplot as plt
import seaborn as sns

For example, if we want to display the distribution of accuracy we can execute the follwoing code.

aux = pd.concat(mcm_bootstrap)
aux\
    .query('Metric == "Accuracy"')\
    .plot(kind = 'density', label = 'Accuracy')
plt.show()

png

With seaborn is easy to display the distribution of all metrics.

g = sns.FacetGrid(pd.concat(mcm_bootstrap),
                  col = 'Metric',
                  col_wrap = 5,
                  sharey = False,
                  sharex = False)
g = g.map(sns.histplot, 'Value', bins = 12, kde = True, ec = 'white', )
g = g.set_titles('{col_name}', size = 12)

png

The mcm function can help us to analyse a confusion matrix. If this confusion matrix come from a performed model, we can evaluate it with this function: Sensitivity and Specificity as principal metrics.

pylint

$ pylint mcm.py --good-names=tn,fp,fn,tp
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)