I have a dataset of invoices with invoice description in text.
Variable Value : Plants and Animals; Animals; Lifestock; Chicken
Explaination : Hierarchical structure of Plants and Animals > Animals > Lifestock > Cattle
Each category has its unique identifier.
Now i have a training and testing data which has relatively similar invoice description but not necessarily the same.
For Eg: The same variable can be as simple as Beef
The reason the variable is so differnet from the reference is because the data is from various vendors that sell their products and have thier own invoice description.
I have about 50,000 Categories and need to train the data so that any new text invoice description can be tagged to our id.
Any suggestion is appreciated.
My intial thought is to use something like Information retrival system