Plotting scatter graph x and y same size

numpy
python

#1

I have 37 numpy arrays with some data, associated in a list with zip, in two arrays (X e Y) such as follow:

X = np.array(list(zip(f1,f2,f3,f4,f5,f6,f7,f8,f9,
                  f10,f11,f12,f13,f14,f15,f16,f17,f18)))
print( " ARRAY X" +'\n', X, '\n' )

Y = np.array(list(zip(f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,
                  f30,f31,f32,f33,f34,f35,f36,f37)))
print( " ARRAY Y" +'\n', Y, '\n' )

And I get an association list between elements:

ARRAY X
 [[0.27726829 0.         0.70441255 ... 0.17073171 0.53012048 0.39759036]
 [0.03315646 0.02175    0.19936204 ... 0.42073171 0.71686747 0.65060241]
 [0.28709117 0.         0.15948963 ... 0.41463415 0.3313253  0.38554217]
 ...
 [0.19825924 0.         0.33371671 ... 0.37849168 0.63201559 0.61497241]
 [0.19825924 0.         0.33371671 ... 0.37849168 0.63201559 0.61497241]
 [0.19825924 0.         0.33371671 ... 0.37849168 0.63201559 0.61497241]] 

 ARRAY Y
 [[0.43902439 0.52258065 0.33774834 ... 0.41975309 0.51315789 0.26060606]
 [0.69512195 0.68387097 0.59602649 ... 0.68518519 0.53947368 0.50909091]
 [0.67682927 0.66451613 1.         ... 0.56790123 0.44736842 0.08484848]
 ...
 [0.64469909 0.65824665 0.5558981  ... 0.6195577  0.5980531  0.44742931]
 [0.64469909 0.65824665 0.5558981  ... 0.6195577  0.5980531  0.44742931]
 [0.64469909 0.65824665 0.5558981  ... 0.6195577  0.5980531  0.44742931]]

I want plot the X and Y array values, but I get this error:

plt.scatter(X, Y, c='black', s=17)
~/anaconda3/envs/sioma/lib/python3.6/site-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4241         y = np.ma.ravel(y)
   4242         if x.size != y.size:
-> 4243             raise ValueError("x and y must be the same size")
   4244 
   4245         if s is None:

ValueError: x and y must be the same size

I know that the scatter function needs two iterable parameters x and y as coordinate points. So with the zip function I group or associate in X 18 elements/columns and in Y 19.
Logically it needs that both have the same number of elements so that a value of X has its counterpart in Y

Of course, the shape in the columns of X and Y are differents:

print(X.shape)
(13807, 18)
print(Y.shape)
(13807, 19)

My Objective

I want to apply the K-Means algorithm, and initially, my objective is to locate in a scatter chart the dataset points, which I have gathered in the X and Y arrays, which total 37 columns. Initially, I want to see in a scatter chart all my data points to check their location before I apply clustering algorithms.

Due to my particular situation, I try reduce the number of elements in my array X of this way, getting f1_f2 .

f1_f2 = np.array(list(zip(f1,f2)))

X = np.array(list(zip(f1_f2,f3,f4,f5,f6,f7,f8,f9,f10,
              f11,f12,f13,f14,f15,f16,f17,f18,f19)))
Y = np.array(list(zip(f20,f21,f22,f23,f24,f25,f26,f27,
              f28,f29,f30,f31,f32,f33,f34,f35,f36,f37)))

… But … of course f1_f2 array is different of my other fi arrays, because the f1_f2 is a column that contains arrays instead of floats, (like my others fi columns) and those arrays are the result of grouping two columns in one.

In this answer question explain a few my situation … I think so … the array should be set up with a sequence, and

Fail, can’t convert a tuple into a numpy array element

print( " ARRAY X" +'\n', X, '\n' )
ARRAY X
        # ------- TUPLE ?--------
 [[array([0.27726829, 0.        ]) 0.7044125465178095 0.08053691275000001
  ... 0.5301204819277109 0.3975903614457832 0.4390243902439024]
 [array([0.03315646, 0.02175   ]) 0.19936204146730474 0.11162790698333333
  ... 0.7168674698795181 0.6506024096385544 0.6951219512195121]
 [array([0.28709117, 0.        ]) 0.1594896331738438 0.08988764045 ...
  0.3313253012048193 0.3855421686746988 0.6768292682926829]
 ...
 [array([0.19825924, 0.        ]) 0.3337167140951549 0.030059729580028987
  ... 0.6320155908754577 0.6149724073440681 0.6446990878406867]
 [array([0.19825924, 0.        ]) 0.3337167140951549 0.030059729580028987
  ... 0.6320155908754577 0.6149724073440681 0.6446990878406867]
 [array([0.19825924, 0.        ]) 0.3337167140951549 0.030059729580028987
  ... 0.6320155908754577 0.6149724073440681 0.6446990878406867]]

Then, by obvious reasons when I try plotting the values I get this error:

plt.scatter(X, Y, c='black', s=17) 
~/anaconda3/envs/sioma/lib/python3.6/site-packages/numpy/core/numeric.py 
in asanyarray(a, dtype, order)
    542 
    543     """
--> 544     return array(a, dtype, copy=False, order=order, subok=True)
    545 
    546 

ValueError: setting an array element with a sequence.

How can I locate in a dispersion chart the totality of my X and Y array/matrix, that is, the 37 columns that represent in total?

I know that I should found a way of that X e Y arrays have them the same elements number, but in this moment I ignore it. My want is sisplay all my data on scatter plot before to found some clustering or data groups.

What I want is to show the data like this, only here I am deleting the array f37 in Y so that X and Y have the same number of elements:

X = np.array(list(zip(f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18)))
Y = np.array(list(zip(f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,))) # only arrive until f36

And the scatter plot is this:

image

Here I need to add a column or data array f37, which is important to keep in mind.