How to group similar college name in one group?


Hi all,

Data set has UG and PG college name (these are written in multiple combinations like IIM Ahmedabad is also written as Indian Institute of Management, Ahmedabad) and I have a hypothesis that it would be most significant variable. Any idea how to segment similar college names in one group.



I just combined all the IITs & NITs into a single category with a very crude logic (mostly manual).

The classes definitely slope with this



You can use grep function in R to extract some colleges by their different names and put them into the same category.

Hope it helps.