I have a corpus of complaints against a mobile product and in that I need to find the entities such as heat problem and battery problem and all. My qus is how to train the model to find these entities without annotation.?
If you want to “train” a machine learning model, then you have to have labeled dataset or annotated dataset. One option is to annotate a large amount of data and train a named-entity-recognition (NER) model on it from scratch. This approach is both time consuming and tediuos.
Another approach is to use a pre-trained NER model from spaCy and finetune it on an annotated sample of your data. Annotating a sample dataset should be doable and finetuning a pre-trained model would be much quiker than training a model from scratch.
Hope it helps. Thanks.
Thank you got some insight.
But is there any other way to annotate without human involving. ??
I think if we sample data from the corpus and initially annotate it., then try to predict on the rest of the corpus in steps could be possible i think