Extract skill from resume using NLP

Hi Team

How to extract skill from resume data in python using NLP



Here are a few sources I found that might be helpful:

  • A resume parser
  • The reply to this post, that gives you some text mining basics (how to deal with text data, what operations to perform on it, etc, as you said you had no prior experience with that)
  • This paper on skills extraction, I haven’t read it, but it could give you some ideas

For some personal suggestions :

I’m going to make the assumption you don’t want to label those resumes by hand, and will try to explain how I would proceed first. It might not be the best solution, as there’s a lot of heuristics in it, but it’s a beginning that’s mostly aimed at studying your data and gain some insight on it.

  • You could see if the structure of the document helps: the skills you’re looking for might often be appearing in a section with a specific title.
  • You could also work with gazeteers, which are lists of key words of interest, and combine these with colloquations (groups of words frequently appearing together). If, for example, you have a keyword appearing in your gazeteer, you could use ngrams to see what words appear around it the most, and turn your single word match into a multiple word skill (e.g : machine learning (2grams), natural language processing (3grams), etc)
  • Once you have an interesting list of keywords, you could move to ML and try word vectors, maybe use word2vec, and build vectors around your word of interests. If new skills come in, their vectors might be similar to the ones you have defined.

This is no silver bullet solution, but this should get you started.

Good luck !


Thank a ton

Thanks for the nice suggestions. I would further add below python packages that are helpful to explore with for PDF extraction.

  1. minecart : this provides pythonic interface for extracting text, images, shapes from PDF documents.
    The above package depends on pdfminer for low-level parsing.
  2. pdfminer : https://github.com/euske/pdfminer
  3. pdfminer.six : this is fork of pdfminer with python 2+3 compatibility.

Hope this helps.:smiley:

1 Like
© Copyright 2013-2019 Analytics Vidhya