I’m working on a dataset for language identification. It has generated a large number of features by doing document term matrix. I divided the words into grams upto 5. Please tell me how to better predict the language of each word. I tried neural networks in R but could not succeded
Could you define your problem more briefly? Because it seems your approach is correct.
Some queries I have,
- What dataset are you using? How is it structured (i.e. contents of dataset)?
- Could you compare your results with some benchmarks (other people’s results)?
- Have you tried algorithms other than Neural networks?
Thank you for your reply jalFaizy.
I’m working on “http://research.microsoft.com/en-us/events/fire13_st_on_transliteratedsearch/fire15st.aspx” 's subtask1.
Yes, I got maximum of 78% sentence level accuracy with naive bayes using mallet tool. So I want to improve that preferably using R
I have not worked on NLP so I may not be the right person to suggest you. But you could refer the approaches in NLP competitions like this one