1. Home
  2. Usecases Catalog
  3. Feature Engineering

Artificial Intelligence - Feature Engineering

Overview

One of the fundamental stages of a successful Artificial Intelligence solution is feature engineering. As the term suggests feature engineering is the process of applying engineering methodologies in an effort to transform the raw data into a dataset that can be used for inference. Feature engineering can be a very labor-intensive process and requires a lot of creativity and domain knowledge from the data scientist. The goal of the process is the maximization of the AI predictive power by providing highly informative features and well-designed feature sets. The algorithms involved in this stage of the AI pipeline range from simple decisions such as record or feature removals and imputations to complex machine learning algorithms that try to enhance the raw data such as random forests explaining the predictive power feature ranking within a dataset. One important aspect of this phase is traceability as the exact same feature engineering techniques used need to be readily available to be applied during inference.

Basic data transformations

These are the simpler transformations that can take place on top of a dataset. The derived data can be created from the initial schema of the dataset using many feature engineering methodologies including:

  • Mathematical transformations
  • Combination of features
  • Filtering

Compression & feature space transformations

These are more complex techniques that aim to transform the feature space in such a way that the resulting space will be more easily navigable by the AI cores. Techniques include:

  • Principal Component Analysis
  • Kernel transformations
  • Data distributions transformations

Feature Enhancement

Often our predictive features are not in perfect condition to use in the following steps of the AI pipeline. Many issues can rise with regards to the quality of the data during the data engineering phase and should be dealt with during feature engineering  including:

  • Imputation of missing values
  • Scaling
  • Outlier Detection    

Encodings

Another frequent step during feature engineering is encoding existing features to generate new feature sets. These techniques are often used as a pre-requisite for specific AI cores or simply to enhance the predictive power of the feature:

  • Binning – Quantisations – labeling
  • Binary, One-hot encodings
  • Hashes, Vectorisations

Feature importance

This is a very important step of the feature engineering phase as most of the time we will have numerous features to pick from when it comes to building and training our AI models. Using quantitative and even AI feature engineering techniques we are able to choose the most prominent features that will maximise the predictive power of the AI models. Such techniques include: 

  • Statistical tests
  • Recursive Feature Elimination
  • Principal Components
  • Random forests
  • Extra trees

Advanced techniques

In this category we find advanced data manipulation techniques that can themselves be complex Artificial Intelligence structures trying to understand and act on the data and possibly enhance the dataset through generating new synthetic data. In this section you can find tutorials, deep dives as well as related use case articles on feature engineering techniques such as:

  • Generative Adversarial Networks
  • Autoencoders
  • Convolutional Networks
  • SMOTE rebalancing