Clustering is a method which consists on grouping data points (clients, texts, images…) based on similarities. Clustering is an unsupervised machine learning problem that aims to process data and find similar structure in a set of data without any target values (dataset without labels).
Clusters are groups similar elements that differ from the elements in other clusters.
Clustering benefits are many and varied depending the field:
- Client clustering: optimize and adapt strategy based on behavior
- Increase company productivity: deal with groups of clients and not clients (reduce worklaod)
Clustering Types
- Hierarchical Clustering (e.g., CAH)
- Centroid-based Clustering (e.g., K-means)
- Density-based Clustering (e.g., DBSCAN)
- Distribution-based Clustering (e.g., DBCLASD)
Clustering Workflow
In this article, we will cover different clustering algorithms: K-means, CAH, Optics, DBSCAN…