What is word embedding and character embedding?

What is word embedding and character embedding ? Explain in simple terms with simple example .Why the words are represented in vector with huge size and what value do those vectors represent for a particular embedded word ?

Hey @spiel

To answer in simple terms, your ML model doesn’t understand text it only understands numbers and that’s why you need a way to represent the text into numbers. One such representation is converting it into an embedding.

Take this sentence as an example: “I love Analytics Vidhya”

Now, you can convert text at word level by assigning a number to each word:

["I", "love", "Analytics", "Vidhya"] = [0,1,2,3]

These numbers are usually not selected randomly but “learned” so that they represent the text in the best way possible. You can learn in-depth about word embeddings in this article.

Similarly, you can create an embedding at character level too:

["I", "l", "o", "v", "e"..."V","i","d","h","y","a"] = [0,1,2,3...]

We do this because in some use cases characters have shown to perform better but it totally depends upon your project. Flair by FB is a great example of character level embedding

1 Like

Hi mohdsanadzakirizvi ,
Thank you for the information .I went through the flair document paper and github .
They have give tutorial for word and document embedding but not for character embedding ?
how dose character embedding work ?
Dose document embedding embed the whole document .If document A has 100 sentence will everything be represent as single vector?

© Copyright 2013-2019 Analytics Vidhya