Text categorization

text_mining
classification
categorical
text_analytics

#1

Hi Everyone

I need help in categorizing the texts …I have a list of merchants
like this and we can see that first few belong to CENTURYLINK next to
SMART ATT …is there a way to classify/label these texts with a single
label or categorize these texts as per the pool they fall into …

Thanks in advance

001 CENTURYLINK IREP

003 CENTURYLINK MY ACCOUNT

003-ClearTalk Wireless

004 CENTURYLINK IVR

005 CENTURYLINK RECURRING

006 CENTURYLINK WIFI

007 CENTURYLINK CABLE

111 SMART ATT

112 SMART ATT

113 - SMART - ATT

114 SMART ATT

120 - SMART - ATT

131 - SMART - ATT

137 - SMART - ATT

A WIRELESS AMERY

A WIRELESS ANNA

A WIRELESS APTOS

A WIRELESS ARCADIA

A WIRELESS ARNOLDS PAR

A WIRELESS ASHLAND

A WIRELESS ATHENS


#2

The most naive method to classify these texts without any complications :

Step 1: Manually Try to find similarities
Step 2: Once You’ve figured out some similarity between the data points,strip/identify the common portion of the string by using regex (regexpr in R)
Step 3: Flag the data points according to required buckets.

Thanks !


#3

@Sunil0108
Hi there, I will suggest you using google refine for the task. It has a GUI framework and is excellent at performing clustering task at the click of a button.

Hope, this helps.

Neeraj