Feature Engineering vs Feature Scaling
What is Feature Engineering?
Feature engineering is an informal topic, and there are many possible definitions. We define feature engineering as creation of new features from our existing ones to improve model performance.
A typical data science process might look like this:
- Project Scoping / Data Collection
- Exploratory Analysis
- Data Cleaning
- Feature Engineering
- Model Training (including cross-validation to tune hyper-parameters)
- Project Delivery / Insights
What does not come in Feature Engineering?
Initial data collection is not feature engineering.
Creating the target variable is not feature engineering.
Removing duplicates, handling missing values, or fixing mis-labelled classes is not feature engineering. They fall under data cleaning.
Scaling or normalization is not feature engineering because these steps belong inside the cross-validation loop (i.e. after we’ve already built our analytical base table).
Lastly, Feature selection is not feature engineering. This also belongs inside cross-validation loop.
What is Feature Scaling?
Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data pre-processing step.