1.1: Terminology
Over the next section in this chapter
and subsequent chapters, we will use a few terms frequently. Let's understand
the meaning of these terms.
1.1.1 Dataset,
Variable, and Observation
If we are considering a CSV file
that has relevant data for a machine learning project, we will refer to it as a
dataset. Each column has data values and is of value for the project. We will
refer to it as a variable . A variable could be considered a
dependent or
target or outcome variable if it is the central point of focus of the project.
The variables that we want to use for modeling the behavior of the target
variable could be referred to as features or independent variables. Each row in
the dataset will be referred to as an observation or record.
1.1.2 Feature
Engineering
Feature engineering or feature
construction is the process of creating new features in the data set using
domain knowledge, as well as creating higher order features from original
features.
A bank employee responsible for the
upkeep of minimum cash reserve in a bank branch might be aware more customers
visit branches for cash withdrawals just a few days before holidays, than the
number of customers who visit during other days. Similarly, fewer customers
might come to the branch for cash withdrawals, on the next day of the festival.
This information can be used for creating 2 separate binary 1|0 indicator
features to represent days before and after the festivals. This is an example
of feature engineering using domain knowledge.
In addition to domain knowledge, we
can also use data-driven approaches for feature construction, such as
exploratory data analysis (EDA). For example, during EDA if we observe data
anomalies, we can probe this with subject matter experts and if it indeed turns
out to be a valid pattern in the data, and is of use for the machine learning
project, we can create features that represent the pattern in raw data.
We can also create new features from
existing features using transformations of the original features. For example,
instead of taking original values, we can take the square of the original value
to increase the magnitude. If body weight recorded for an adult is 71.9, 72,
and 69.9, etc., we can instead take 5169.61, 5184, and 4886.01, which are
squares of the original values. By performing a square, the difference in
weight is more noticeable than the original values.
1.1.3 Feature
Extraction
Feature extraction is the process of
finding alternative representations for the original features created during
feature engineering. It converts features into a lower dimension. This makes
the structure of the data clearer for the model to learn.
For example, if there are 500
features, a technique such as principal component analysis (PCA) can convert
these features into a lesser number of principal components, say 20. It might
be easier and faster for the model to learn from fewer features that have
maximum information.
1.1.4 Feature
Selection
Feature selection deals with
choosing a subset of features from a list of given features. This helps
identify features that are rich in useful information for the model, ultimately
resulting in a less complex model of high predictive power.
For example, if we have 5000
features, by performing feature selection we can identify a very small number
of features that gives the highest performance. By having a small set of
features, it also becomes easier to explain model prediction to laymen.
1.1.5 Cost Function
Cost function , model metric ,
model performance , and predictive performance are used
interchangeably as
synonyms in this book. For a regression model, example cost functions are root
mean square error (RMSE), mean absolute error (MAE), etc. For a classification
model, example cost functions could be F1 score, precision, and recall.