Hackathon - How to handle skills?

text_mining

#1

Guys - Any good tips to handle the Skills column, how do I separate the comma and choose the best skills to form part of my model ?

I can use R as well as Python

@Karthik_Ramasubraman I was facing the same problem as well, so started this discussion


#2

@RoveR

You can use

strsplit(as.character(train$Skills), “,”)

to separate strings by comma and then use grepl() function to extract skills which seem most important to you.
use of grepl->

grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)

Hope it helps.


#3

you could use the tm package in R and use the ‘Bag of
Words’ technique to use individual words used in the skills category as predictors


#4

u are right nalin… bag of words can come up with wonderful insights
but i am feeling we need more time to work on this data
it would have been better if we got 24 hours to work on this