# What is a matriz o norm vector?

#1

Usually, to get the Euclidian distance, is used the `numpy.linalg.norm` to get the distance between some data points and clusters centroids.

Precisely, in this context analyzing the KMeans algorithm implementation here presented, we have the following:

``````# Importing the dataset
print(data.shape)

(3000, 2)
``````

Get the `V1` and `V2` columns on `f1` and `f2` variables

``````# Getting the values and plotting it
f1 = data['V1'].values
f2 = data['V2'].values

# We associate every  i value of the column f1 with f2 and we put them as elements of a list
X = np.array(list(zip(f1, f2)))

#  array X
print(X)

[  2.072345  -3.241693]
[ 17.93671   15.78481 ]
[  1.083576   7.319176]
...
[ 64.46532  -10.50136 ]
[ 90.72282  -12.25584 ]
[ 64.87976  -24.87731 ]]

# And we put the data on a scatter diagram
plt.scatter(f1, f2, c='black', s=7)
``````

## Euclidean distance calculator

``````# Euclidean Distance Caculator
# https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.linalg.norm.html
# Geting the Euclidean distance between data points
def dist(a, b, ax=1):
return np.linalg.norm(a - b, axis=ax)
``````

How to understand the norm vector inside Euclidian distance context?

``````# Number of clusters
k = 3

# We generate a random data between 0 and the maximum value -20 of the X array that has the data of the
# columns f1 and f2, with these inputs we generate the values of X and Y coordinates on which
# will position the centroids

# X coordinates of random centroids
C_x = np.random.randint(0, np.max(X)-20, size=k)
# Y coordinates of random centroids
C_y = np.random.randint(0, np.max(X)-20, size=k)

print(" x coordinates" +'\n', C_x)
print("*****")
print(" y coordinates" +'\n', C_y)
print("*****")

x coordinates
[51 41 25]
*****
y coordinates
[18 76 53]
``````

We associate these lists `C_x` and `C_y` with `zip` so that in a single list, have the location values for each centroid in the dispersion graph

``````C = np.array(list(zip(C_x, C_y)), dtype=np.float32)
print("Coordinates pair x,y associated inside list to" +'\n', "INITIALIZE RANDOM CENTROIDS" +'\n', C)

Coordinates pair x,y associated inside list to
INITIALIZE RANDOM CENTROIDS
[[51. 18.]
[41. 76.]
[25. 53.]]

# Plotting along with the Centroids
plt.scatter(f1, f2, c='#050505', s=7)
plt.scatter(C_x, C_y, marker='*', s=200, c='g')

# To store the value of centroids when it updates
C_old = np.zeros(C.shape)
``````

Why is necessary store the old coordinates of centroids?

``````# Cluster Lables(0, 1, 2)
clusters = np.zeros(len(X))
``````

How to works the grab this distance? In the sense of norm/matriz vector …

``````# Error func. - Distance between new centroids and old centroids
error = dist(C, C_old, None)
``````

I have been understanding the implementation that the author creates, but this topic of norm vector is something new for me.