1.1: Terminology

Over the next section in this chapter and subsequent chapters, we will use a few terms frequently. Let's understand the meaning of these terms.

1.1.1  Dataset, Variable, and Observation

If we are considering a CSV file that has relevant data for a machine learning project, we will refer to it as a dataset. Each column has data values and is of value for the project. We will refer to it as a  variable . A variable could be considered a dependent or target or outcome variable if it is the central point of focus of the project. The variables that we want to use for modeling the behavior of the target variable could be referred to as features or independent variables. Each row in the dataset will be referred to as an observation or record.

1.1.2    Feature Engineering

Feature engineering or feature construction is the process of creating new features in the data set using domain knowledge, as well as creating higher order features from original features.

A bank employee responsible for the upkeep of minimum cash reserve in a bank branch might be aware more customers visit branches for cash withdrawals just a few days before holidays, than the number of customers who visit during other days. Similarly, fewer customers might come to the branch for cash withdrawals, on the next day of the festival. This information can be used for creating 2 separate binary 1|0 indicator features to represent days before and after the festivals. This is an example of feature engineering using domain knowledge.

In addition to domain knowledge, we can also use data-driven approaches for feature construction, such as exploratory data analysis (EDA). For example, during EDA if we observe data anomalies, we can probe this with subject matter experts and if it indeed turns out to be a valid pattern in the data, and is of use for the machine learning project, we can create features that represent the pattern in raw data.

We can also create new features from existing features using transformations of the original features. For example, instead of taking original values, we can take the square of the original value to increase the magnitude. If body weight recorded for an adult is 71.9, 72, and 69.9, etc., we can instead take 5169.61, 5184, and 4886.01, which are squares of the original values. By performing a square, the difference in weight is more noticeable than the original values.

1.1.3    Feature Extraction

Feature extraction is the process of finding alternative representations for the original features created during feature engineering. It converts features into a lower dimension. This makes the structure of the data clearer for the model to learn.

For example, if there are 500 features, a technique such as principal component analysis (PCA) can convert these features into a lesser number of principal components, say 20. It might be easier and faster for the model to learn from fewer features that have maximum information.

1.1.4    Feature Selection

Feature selection deals with choosing a subset of features from a list of given features. This helps identify features that are rich in useful information for the model, ultimately resulting in a less complex model of high predictive power.

For example, if we have 5000 features, by performing feature selection we can identify a very small number of features that gives the highest performance. By having a small set of features, it also becomes easier to explain model prediction to laymen.

1.1.5    Cost Function

 Cost function ,  model metric ,  model performance , and  predictive performance  are used interchangeably as synonyms in this book. For a regression model, example cost functions are root mean square error (RMSE), mean absolute error (MAE), etc. For a classification model, example cost functions could be F1 score, precision, and recall.