How to interpret clusters after doing Latent Semantic Analysis on review data

lsa
clustering
wordcloud
text_analytics

#1

Hello,

I am trying to learn about latent semantic analysis and am working on some movie review data.After doing LSA on the review data I have clustered the Tk matrix into 2 clusters to find words which match concept wise(dimension) wise.Not sure if it is the right interpretation.The image below contains 3 wordclouds,the first one is from the tdm and second one from each of the clusters after doing LSA.

So can we say the process has weeded out the unimportant words not relating to some concepts generated by the algo??
Is this the right way or can someone please help me on this?? @Lesaffrea


#2

Hi @pagal_guy
sorry I was busy I check this today
Alain


#3

Hi @pagal_guy

so If I understand well you build your clusters base on the TK table after LSA. What happen there LSA makes one assumption that words with similar meaning are mapped in same direction in the latent space ( the synthetic dimension build during the svd) therefore some words will disappear expecting they have similar meaning, keep in mind this is one assumption here, as you do a dimension reduction you will most some information, meaning in this case.
Usually with text for what I red the assumption hold, what you can say if the word do not appear the meanings of the word is still there (somewhere !!)
Hope this help. If you want to understand more about SVD check this book, Mining Large Data Set, pdf available. The SVD chapter is really good.
Have a good day.
Alain