Find Number of unique words in a string/ text file using Python

text_mining
ipython
python
text_analytics

#1

Hi,

I want to extract the number of unique words in a string or text file.

For example, sentence "If you are a Python programmer or you are looking for a robust library you can use to bring machine learning into a production system" has 20 unique words.

Please help with the methods/ library in python that can perform this task.

Thanks,
Steve


#2

Hi @Steve,

This is one approach, which can work:

# In[1]:

a = "If you are a Python programmer or you are looking for a robust library you can use to bring machine learning into a production system"


# In[2]:

a


# In[3]:

b = a.split()


# In[4]:

b


# In[5]:

myset = set(b)


# In[6]:

print len(myset)


# In[ ]:

Basically, I have converted your string to a list using split() function. Then I have converted it to a set, which will leave only unique values in the list.

Once you have the set, you can print the length. Hope this helps. You can of course do this in a single statement:

print len(set(a.split()))