Feature Selection

Feature selection is the process of selecting those features that are relevant for use in a certain model. As opposed to feature engineering, where features are created, feature selection is about identifying which of the available features are (most) relevant and should therefore be included in the model. The goal of feature selection is to exclude irrelevant features from the model. This can be done both manually and with the help of algorithms that automatically select the most relevant variables.

Just because you have a lot of variables, that does not mean you have to use all of them. In fact, adding more variables often has a detrimental rather than beneficial effect on the performance of a model. Limiting yourself to using the most relevant variables decreases the likelihood of overfitting the model, of collinearity and of running into problems with the curse of dimensionality, and it increases the interpretability of the model

Data Navigator Newsletter