DBSCAN is a different type of clustering algorithm with some unique advantages. As the name indicates, this method focuses more on the proximity and density of observations to form clusters. This is very different from KMeans, where an observation becomes a part of cluster represented by nearest centroid. DBSCAN clustering can identify outliers, observations which won’t belong to any cluster. Since DBSCAN clustering identifies the number of clusters as well, it is very useful with unsupervised learning of the data when we don’t know how many clusters could be there in the data.
K-Means clustering may cluster loosely related observations together. Every observation becomes a part of some cluster eventually, even if the observations are scattered far away in the vector space. Since clusters depend on the mean value of cluster elements, each data point plays a role in forming the clusters. Slight change in data points might affect the clustering outcome. This problem is greatly reduced in DBSCAN due to the way clusters are formed.
In DBSCAN, clustering happens based on two important parameters viz.,
There are three types of points after the DBSCAN clustering is complete viz.,
DBSCAN clustering can be summarized in following steps…
Since clustering entirely depends on the parameters n and m (above), choosing these values correctly is very important. While good domain knowledge of the subject helps choosing good values for these parameters, there are also some approaches where these parameters can be fairly approximated without deep expertise in the domain.
See DBSCAN demo in sklearn examples and try it with sklearn.cluster.DBSCAN in sklearn.