One of the tasks I always face during my Data Science journey is working in high dimensionality. That’s mean dealing with what we call Curse of Dimensionality. This phenomena refers to that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality.
Basically, if you face this kind of problem you have two choices to optimize your machine learning solution:
- Increase the data available
- Use dimension reduction to avoid this phenomenon
Here in this article, we will show you how use PCA to reduce dimensions base on world-cup data and introduce a second method that we’ll detail in the next article.
PCA Goals
PCA for principal component analysis is a popular technique for analyzing large datasets. Large dataset means dataset with a high number of features (variables/dimensions)
PCA is a statistical method for reducing the dimensionality of your dataset. This method allows you to create new features (called principal components) based on a linear combination of initial ones and each new feature is built with the objective of maximizing…