Chapter 1: Introduction
Quite often machine learning
projects are shelved even when labeled
data is available. This happens either
because of less than desired predictive
performance of the model. Or difficulty
in being able to explain model
prediction to non-statistician
decision-makers.
Building a machine learning model for
real-world problems is way more
difficult
than developing a model for toy
datasets. Real-world machine learning
problems
deal with inadequate features and messy,
unclean data. Beyond data cleaning, we
also need to do feature engineering,
feature extraction, and feature
reduction.
Finding the combination of features that
gives the best predictive performance
is important for the success of the
project. It is equally important to
explain
to non-statistician business owners how
the model works. Why it predicts
certain values, given certain input
values? Doing so will ensure that the
machine learning project is successful
and the model is finally deployed and
used. The model is of value to the
business.
The scope of the book is limited to
tabular data. Toward the end of the
book, we will briefly cover natural
language processing from the perspective
of traditional machine learning. An
additional chapter on signal processing
is provided for readers whose work
overlaps signal processing and machine
learning. We will briefly discuss
ensemble learning and how we can do
feature selection for ensemble models to
reduce the complexity and computational
power needed while deploying such an
ensemble model. We will explain the
theory behind each technique and for
most
of the methods discussed, we will end
with a worked python example.
By the time you have completed
reading the book, you will take each
machine-learning project as a
combinatorial problem. You will initiate
your projects to identify the
combination of feature engineering and
feature selection to achieve the best
possible model performance. You will
also be able to explain your model
predictions, as to why it predicts
certain values when it sees certain
feature
values. This book will equip you with
all the necessary tools and methods to
increase the likelihood of success of
your machine learning projects.