Pattern matching, Text Mining



Here is my problem statement. on huge number of records

input -

JOHN C/O ABEGE LOT 45 BLK 39 41547 Nonpareil Dr Palmdale, CA 93551-2802

Need to extract multiple values from this like name, c/o name lot,block direction , road/street name , state , contact number , pin, etc…

Now we are using Regular expression and KDD to find patterns . is there any better way to do using R or any analytic program to process records faster and quality results.


You did not mention what pattern you want to match. The first problem in your case is about extracting variables.


Thanks for reply.

Yes i am looking to extract data.

whatever values i mentioned in problem

ex - LOT values will starts with LOT, LT , LOTS different patterns