Use of different distance measures in KNN algorithm

knn
distance

#1

While studying KNN algorithm I came across three distance measures
1-Euclidean
2-Manhattan
3-Minkowski
I am not able to understand that which distance measure would be use and where ??


#2

Minkowski is the generalized distance formula. Using a parameter we can get both the Euclidean and the Manhattan distance from this

Let us take an example

I have 5 rows with x,y,z coordinates with the manhattan and the euclidean distances calculated w.r.t the test point

Now, if we set the K=2 then if we find out the 2 closest fruits
TEST Fruit = Apple by Euclidean
TEST Fruit = mix of Apple and Orange by Manhattan

if we set the K=3 then
TEST Fruit = mix of Apple, Orange by Euclidean
TEST Fruit = mix of Apple, Orange by Manhattan

You see where i am going.
If you plot the euclidean and manhattan distances for the 5 points w.r.t to the test point

The following is MANHATTAN

The following is EUCLID

As you increase the number of dimensions, things will become more complex. You need to figure out from the plots whether the row 3 in train data should be closer to the test data or row 4 in train data
Accordingly you can choose manhattan or euclidean distances. This will not be feasible solution for large data sets though

Let me know if this helps


#3

I was working on something else, when this piece of information struck me

EuclideanDistance
SquaredEuclideanDistance
ManhattanDistance
ChessboardDistance or Chebyshev distance
CanberraDistance
CosineDistance
CorrelationDistance : The correlation distance 1-(u-Mean[u]).(v-Mean[v])/(Abs[u-Mean[u]]Abs[v-Mean[v]])
BrayCurtisDistance

Just thought it would be a good idea to find out use cases for each of them