World Cup 2022: Become a Real Soccer Data Consultant!

Elfao
8 min readDec 9, 2022
Generated by Dall-e

One of the tasks I always face during my Data Science journey is working in high dimensionality. That’s mean dealing with what we call Curse of Dimensionality. This phenomena refers to that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality.

Basically, if you face this kind of problem you have two choices to optimize your machine learning solution:

  • Increase the data available
  • Use dimension reduction to avoid this phenomenon

Here in this article, we will show you how use PCA to reduce dimensions base on world-cup data and introduce a second method that we’ll detail in the next article.

PCA Goals

PCA for principal component analysis is a popular technique for analyzing large datasets. Large dataset means dataset with a high number of features (variables/dimensions)

PCA is a statistical method for reducing the dimensionality of your dataset. This method allows you to create new features (called principal components) based on a linear combination of initial ones and each new feature is built with the objective of maximizing…

--

--

Elfao

Data scientist with 4 years experience. I worked in different field like Marketing digital, Consulting and currently I work for a start-up in finance.