Identifying dictionary words in email addresses using SAS




I am trying to identify dictionary words in email addresses using SAS.

For ex given any email address like - The program should identify dictionary words - am, lucky.

I am struck in this problem and don’t know how to proceed. Can someone guide me through it.


You can use Regex operations in SAS



Thanks for the response.

I guess you are suggesting to store all the dictionary words in a separate datafile and match the text of email handle from the dictionary data set.
But the dictionary data set would be huge. Won’t it make the program messy.

  1. tokenize your data first
  2. this will give you list of words with their frequencies
  3. you can use this as a look up table too.

I don’t know how to achieve this in SAS, but you can refer following link

Hope this help!!