Chapter 1: Introduction

Quite often machine learning projects are shelved even when labeled data is available. This happens either because of less than desired predictive performance of the model. Or difficulty in being able to explain model prediction to non-statistician decision-makers. Building a machine learning model for real-world problems is way more difficult than developing a model for toy datasets. Real-world machine learning problems deal with inadequate features and messy, unclean data. Beyond data cleaning, we also need to do feature engineering, feature extraction, and feature reduction. Finding the combination of features that gives the best predictive performance is important for the success of the project. It is equally important to explain to non-statistician business owners how the model works. Why it predicts certain values, given certain input values? Doing so will ensure that the machine learning project is successful and the model is finally deployed and used. The model is of value to the business.

The scope of the book is limited to tabular data. Toward the end of the book, we will briefly cover natural language processing from the perspective of traditional machine learning. An additional chapter on signal processing is provided for readers whose work overlaps signal processing and machine learning. We will briefly discuss ensemble learning and how we can do feature selection for ensemble models to reduce the complexity and computational power needed while deploying such an ensemble model. We will explain the theory behind each technique and for most of the methods discussed, we will end with a worked python example.

By the time you have completed reading the book, you will take each machine-learning project as a combinatorial problem. You will initiate your projects to identify the combination of feature engineering and feature selection to achieve the best possible model performance. You will also be able to explain your model predictions, as to why it predicts certain values when it sees certain feature values. This book will equip you with all the necessary tools and methods to increase the likelihood of success of your machine learning projects.