Extracting data from strings (in R)




How can I extract the data from these strings given in the test file?

What I want to do is map these incomplete addresses to some approximately correct addresses viz. House Number, Locality, Area, City, Pincode.

Which R packages/ api I can use to accomplish this?test_address.csv (6.8 KB)


After some research, I found out that I need to use the Google geocode API…how can I use it in R?

Any help would be appreciated.


Can someone please help me on this?..I am really stuck.


Can you paste the data here ( a few rows ). I am unable to download the csv file



1 #3, Sr no 33/6, saifitness building, near icchapurti temple, behind bharti vidyapeeth, ambegaon bk, pune

2 1027 sector 28 ground floor faridabad haryana

3 106b U&V block shalimar bagh delhi

201 west enclave pitampura

202 wz-865 rani bagh pitampura

203 yamunagar

You might find this helpful:-


It will take some time to be able to work on this.

By the way given the licensing issues around using the google api, take a look at http://wiki.openstreetmap.org/wiki/Nominatim


Hi Anant,

Any luck with the problem?


Hey Sarthak,

Dude apologies. This slipped off man. Give me 2 days i will be able to work up something and share some git code with you



Thank you. I will keep an eye. It will be huge help if you can come up with something…:slight_smile:


@sarthakgirdhar I checked the test file you’ve shared. The data is entirely unstructured. I think, using any R package wouldn’t help since there is no specific length of house number, street name, State name etc. In fact, some of the observations are incomplete. In such cases, your first attempt should be at collecting more data. If that’s not possible, use Excel.

Data is small, excel will be great and easy to do this.
If you are ready for some challenge, this is a good regular expression problem for R. You can use stringr package combined with regular expressions for pattern matching.
If neither of the above two works, you can do it manually since the numbers of observations aren’t so many.


Thank you @Manish for your response…I will take it from here.