What is word embedding and character embedding ? Explain in simple terms with simple example .Why the words are represented in vector with huge size and what value do those vectors represent for a particular embedded word ?
To answer in simple terms, your ML model doesn’t understand text it only understands numbers and that’s why you need a way to represent the text into numbers. One such representation is converting it into an embedding.
Take this sentence as an example: “I love Analytics Vidhya”
Now, you can convert text at word level by assigning a number to each word:
["I", "love", "Analytics", "Vidhya"] = [0,1,2,3]
These numbers are usually not selected randomly but “learned” so that they represent the text in the best way possible. You can learn in-depth about word embeddings in this article.
Similarly, you can create an embedding at character level too:
["I", "l", "o", "v", "e"..."V","i","d","h","y","a"] = [0,1,2,3...]
We do this because in some use cases characters have shown to perform better but it totally depends upon your project. Flair by FB is a great example of character level embedding
Hi mohdsanadzakirizvi ,
Thank you for the information .I went through the flair document paper and github .
They have give tutorial for word and document embedding but not for character embedding ?
how dose character embedding work ?
Dose document embedding embed the whole document .If document A has 100 sentence will everything be represent as single vector?