Artificial Intelligence - Feature Engineering
One of the fundamental stages of implementing a successful Artificial Intelligence solution is the feature engineering phase. As the term suggests feature engineering is the process of applying engineering methodologies to transform the raw data into a dataset of features that can be used for AI inference. Feature engineering can be a very computationally intensive process and requires a lot of creativity and domain knowledge. During the process the data scientist is in very close collaboration with business domain experts who will define candidate features that model the problem space. The goal of the process is the maximization of the AI predictive power by engineering highly informative features, well correlated with the desired AI inference space. The algorithms and actions involved range from simple decisions, such as record or feature removals and imputations to complex machine learning algorithms trying to enhance the raw dataspace by constructing never seen before data. One important aspect of this phase is traceability during the entire pipeline. All the transformations applied during feature engineering need to be readily available to be replicated during inference as new data is flowing through the AI solution.
Numerous algorithms and methodologies exist that can be utilised during the phase, and we present here a taxonomy of the methodologies, as well as some examples of algorithms belonging to the corresponding groups. Under each group, you will find a list of links to the relevant Blueprints that explain the methodology in more detail.
Basic data transformations
These are some of the simpler transformations that can be applied on the dataset. The derived features can be created from the initial schema of the dataset using several feature engineering methodologies. For example, the AI engineer may decide to use simple mathematical transformations such as rolling averages. Combining features is also a very often process, perhaps a new column is introduced in the dataset that is the result of a computation between two existing columns. Finally, the AI engineer may need to filter on an existing feature e.g., specific timestamps may need to be used to filter the entire dataset.
Compression & feature space transformations
These are more complex techniques that aim to transform the feature space in such a way that the resulting space will be more easily navigable by the AI algorithms. One of the most characteristic examples is applying kernel transformations on non-linearly separable spaces to enable linear separation. Imagine a data space of two classes as the one shown in the left side of the below graph.
This is a one-dimensional space and as we can see, there is no linear classifier that is able to distinguish between the two classes with a 100% accuracy. However, if we apply the appropriate kernel to transform the space into the two-dimensional space we see in the right part of the graph, the problem becomes linear and even a simple classifier, such as a Support Vector Classifier, can easily separate the space. Another frequently applied category of transformations are encodings. Other feature space transformations such as PCA aim at compressing the dataset while keeping as much of the information available in the original space. Encoding techniques aside from being able to enhance the predictive power of a feature, are often a pre-requisite for specific AI algorithms. Such methodologies include binning and quantisation, as well as labelling. Oftentimes, categorical features need to be turned to numerical ones that can be achieved using various encoding schemes such as one-hot encoding or hash-maps and vectorisations.
Often, our predictive features are not in perfect condition and cannot be directly used in the following steps of the AI pipeline. Many issues can rise with regards to the quality of the data during the data engineering phase and should be dealt with during feature engineering. A typical problem that may arise is that of missing values. The AI engineer needs to decide on the strategy that will be used with regards to the imputation of these missing values. Perhaps, it is decided that the missing value rows are not that many, and the feature can be kept just by losing these few rows. In other situations, the missing values are quite a few and weaken the predictive strength of the feature, perhaps the feature cannot be used. Other times the feature is essential given the business context and a more sophisticated strategy needs to be defined to fill in the missing values. Scaling is also an important matter that needs to be addressed during this stage can include scaling, especially if the Neural Networks are to be used during the Modelling and Evaluation phase. Another example of a data problem that appears is outliers. Outlier detection is a complex process itself and many techniques can be deployed from simple statistical ones to more sophisticated full AI pipelines
Sometimes, the task at hand might require more complicated data manipulation, perhaps the data is too noisy or too sparse. In this situation the AI engineer will have many advanced feature engineering techniques at hand, that can range from complex algorithms to Artificial Intelligence structures, trying to understand patterns within and enhance the initial dataset. An example of such an advanced AI methodology is the use of Generative Adversarial Networks in the case of sparse or small datasets to generate new synthetic data that follow the existing sample distribution. Other examples, amongst many, are class rebalancing using SMOTE techniques, the use of convolutions to generate possible new data from existing datasets and the use of auto-encoders to generate interim features that deconstruct existing features and that can then be used as features describing the dataset themselves.
Feature selection can be a very important step of the feature engineering phase and is usually applied iteratively as more and more features are being engineered. The reason for this recursive nature is that as more features are introduced in the solution, they may be affecting the predictive strength of already existing ones. While the output of this stage differs from prior stages in the sense that it does not act upon the features but rather is a selection of the important features in the set, we still include it as part of this phase of the pipeline. Calculating feature importance using quantitative or AI techniques, allows us to choose the most prominent features and maximise AI inference performance during production. The most frequently used techniques during the task, include amongst other techniques such as forward or backward feature selection, feature elimination and use of univariate statistics e.g., information gain, correlation coefficients. More complex methodologies derive from the Machine Learning genre of decision trees and their derivatives such as random forests feature selection.