I am working on H!B visa practice project.
It has 1000+ unique occupation under SOC_NAME field/feature…most of them differ by a small change in names e.g. teacher, teacher maths, teacher maths post studies, teacher maths high school etc etc
I need to map them to standard feature names so that their number comes down and become more manageable.
I can use .loc or a command like the following
df.OCCUPATION[df[‘SOC_NAME’].str.contains(‘computer’,‘programmer’)] = ‘computer occupations’
df.OCCUPATION[df[‘SOC_NAME’].str.contains(‘software’,‘web developer’)] = 'computer occupations
but this is a cumbersome method and is a repetative process.
Is there any other way by which the end result of mapping 1000+ field can be achieved…for example by use of Regex