Identifying dictionary words in email addresses using SAS

sas
text_analytics

#1

Hi,

I am trying to identify dictionary words in email addresses using SAS.

For ex given any email address like - iamlucky@gmail.com. The program should identify dictionary words - am, lucky.

I am struck in this problem and don’t know how to proceed. Can someone guide me through it.


#2

You can use Regex operations in SAS

Thanks.


#3

Hi,
Thanks for the response.

I guess you are suggesting to store all the dictionary words in a separate datafile and match the text of email handle from the dictionary data set.
But the dictionary data set would be huge. Won’t it make the program messy.


#4
  1. tokenize your data first
  2. this will give you list of words with their frequencies
  3. you can use this as a look up table too.

I don’t know how to achieve this in SAS, but you can refer following link

Hope this help!!

Thanks