How Word2Vec work?

data_science
python

#1

Hi Friends,

I am trying to understand mechanism of Word2Vec for word embedding. I gone through from following links:
https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/

https://machinelearningmastery.com/develop-word-embeddings-python-gensim/

But Not clear idea behind Word2Vec or how its work.

I understand Follow steps:

  1. It will take sentence and split in to words.
  2. It will make vocabulary of all those words
  3. But when I see do model['word in vocabulary'] it gives numerical vector.
    Questions:
    1) What is Vector representation?
    2) What is each numerical value represent?

I am confuse about it.

Please if anyone have tutorial or any link from which I can understand better It will be helpful for me.

Thank you so much in advance


#2

Let me address your queries one by one.

Given below is a figure of Word2Vec model with a single word context window. The input layer takes a word in one-hot encoded form. The weights between the input layer and the hidden layer can be represented by a V x N matrix, where V = vocabulary size and N = no. of hidden units.

Each row of this matrix is the N-dimensional vector representation of the associated word of the input layer. Hence, each word in the vocabulary would have an N-dimensional vector representation.

image

The numbers in a word vector are nothing but the weights learned by the model mentioned above.

Regards


#3

@pjoshi15 Thank you so much for your very nice Explanation