Abhijit Annaldas

For the love of data and machines that can learn!

GitHub | Kaggle | LinkedIn | Quora | Twitter | Feed

K-Means vs KNN


K-Means vs KNN

K-Means (K-Means Clustering) and KNN (K-Nearest Neighbour) are often confused with each other in Machine Learning. In this post, I’ll explain some attributes and some differences between both of these popular Machine Learning techniques.

K-Means KNN
It is an Unsupervised learning technique It is a Supervised learning technique
It is used for Clustering It is used mostly for Classification, and sometimes even for Regression
‘K’ in K-Means is the number of clusters the algorithm is trying to identify/learn from the data. The clusters are often unknown since this is used with Unsupervised learning. ‘K’ in KNN is the number of nearest neighbours used to classify or (predict in case of continuous variable/regression) a test sample
It is typically used for scenarios like understanding the population demomgraphics, market segmentation, social media trends, anomaly detection, etc. where the clusters are unknown to begin with. It is used for classification and regression of known data where usually the target attribute/variable is known before hand.
In training phase of K-Means, K observations are arbitrarily selected (known as centroids) and the clusters are formed around (similar to) them. Once the clusters are formed, for each cluster the centroid is updated to the mean of all cluster members. And the cluster formation restarts with new centroids. This repeats until best centroids of the clusters are identified. The prediction of a test observation is done based on nearest centroid. K-NN doesn’t have a training phase as such. But the prediction of a test observation is done based on the K-Nearest (often euclidean distance) Neighbours (observations) based on weighted averages/votes.

Learn more about K-Means Clustering and K-Nearest Neighbors

Abhijit Annaldas
avannaldas [at] hotmail [dot] com