World Cup 2022: Clustering Explained Step By Step

Elfao
Analytics Vidhya
Published in
10 min readJan 4, 2023

--

Clustering is a method which consists on grouping data points (clients, texts, images…) based on similarities. Clustering is an unsupervised machine learning problem that aims to process data and find similar structure in a set of data without any target values (dataset without labels).
Clusters are groups similar elements that differ from the elements in other clusters.

Clustering results

Clustering benefits are many and varied depending the field:

  • Client clustering: optimize and adapt strategy based on behavior
  • Increase company productivity: deal with groups of clients and not clients (reduce worklaod)

Clustering Types

  • Hierarchical Clustering (e.g., CAH)
  • Centroid-based Clustering (e.g., K-means)
  • Density-based Clustering (e.g., DBSCAN)
  • Distribution-based Clustering (e.g., DBCLASD)

Clustering Workflow

Clustering workflow

In this article, we will cover different clustering algorithms: K-means, CAH, Optics, DBSCAN…

--

--

Elfao
Analytics Vidhya

Data scientist with 4 years experience. I worked in different field like Marketing digital, Consulting and currently I work for a start-up in finance.