Today I want to put a very fundamental and quintessential problem in the field of AI and machine learning and to be honest you will be very happy as this is a real time problem on which I’m working for a client so every relevant answer/suggestion goes for the implementation in the production.
Here is the Objective and then the Challenge!
I have got data for shipping industry having free text which is basically the comments put by field inspector in a report which talks about the inspected quantity and quality shipped. They see whether the shipped item has properly shipped at the destination, so they check the weight and perform other analysis to be sure of its perfect delivery. In case of a shipment not meeting a expected delivery they will raise that issue for that Metrics and also write the root cause of it.
so to give insight about all the shipments and their respective issues, we extracted the information manually and using keyword matching. Say, whether it has an ‘X’ issue or not we labelled that comment as 1/0 .similarly, a comment can have multiple issues(i.e. multiple labels). This methodology works for few comments but not for all , as it fails to detect the context and because I don’t have pre-labelled data, I can’t go with supervised modelling directly. As I told we labelled few data rows manually or using Keyword match and now I have Labelled data for modelling. I can perform Text mining and perform a classification on it .But what I feel is that-this labelled data (did by us) is not enough to detect the context and unstructured information to train a model. Further to this, even if train a model and to do some predictions How will I automate the process so that we can do an incremental learning with the time which means training the model on data set with new patterns.
I hope you are able to understand- Its a catch 22 problem!
Thanks in advance!