What does stemmer do in python nltk library




from nltk.stem.porter import *
stemmer = PorterStemmer()

While participating in a Kaggle competition I came across the above library for doing the things as shown below in one of the scripts:

df_all['search_term'] = df_all['search_term'].map(lambda x:str_stem(x))
df_all['product_title'] = df_all['product_title'].map(lambda x:str_stem(x))
df_all['product_description'] = df_all['product_description'].map(lambda x:str_stem(x))

But I am not being able to understand what is being done here,so can someone please help me with this??


Hi @pagal_guy

Steeming is common in text, it mean extracting the root of a word more or less, see the following wikipedia explanation Wikipedia .

Have fun with home depot :slightly_smiling:


Hi @pagal_guy

I was just going through one document and found this definition, certainly better than mine :

Once a character stream has been segmented into a sequence of tokens, the next
possible step is to convert each of the tokens to a standard form, **a process usually **
referred to as stemming or lemmatization. Whether or not this step is necessary is
application-dependent. For the purpose of document classification, stemming can
provide a small positive benefit in some cases. Notice that one effect of stemming is
to reduce the number of distinct types in a text corpus and to increase the frequency
of occurrence of some individual types

Lot better I guess. Still have fun with Home Depot.