How to group similar college name in one group?


#1

Hi all,

Data set has UG and PG college name (these are written in multiple combinations like IIM Ahmedabad is also written as Indian Institute of Management, Ahmedabad) and I have a hypothesis that it would be most significant variable. Any idea how to segment similar college names in one group.

Mark


#2

I just combined all the IITs & NITs into a single category with a very crude logic (mostly manual).

The classes definitely slope with this


#3

@Mark

You can use grep function in R to extract some colleges by their different names and put them into the same category.

https://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html

Hope it helps.