Chapter 9: Explaining Model and Model Predictions to Layman

9.1: Introduction

Interpretability for machine learning models is the degree to which a layman can understand the cause of a specific prediction.

As long as further action will be taken on prediction results that cost time or money, it will be helpful to understand the root cause behind algorithm prediction. Unless the model is for a toy dataset or if it's not going to be used for any insight or action, it is useful to have justification for model predictions. An answer could be sought for the exact pattern that the model captured, to give the specific prediction.

Even though prediction results came from a scientific and high-end algorithm, it still warrants a justification. Explainability can help identify incorrect predictions more clearly. This can again be useful during periodic model retraining. Usually, we retrain models when covariate shifts happen. We also retrain the model when the model predicts the wrong labels for a class that was under-represented in the data. After we have enough labeled data for the class that was previously underrepresented, we can consider retraining. Explainable models can help identify false positives easily, as these cases will be hard to justify. This helps in building robust and high-quality model retraining datasets to further improve the model.

Some models are inherently explainable, while others can be explained with the help of explanation techniques.