How to create a TermDocument matrix in python

termdocumentmatrix
python

#1

hello,
I am trying to replicate the below code for a single column of a dataframe in python:

# Create initial documents list:
doc = [ ]
doc.append( 'It is a far, far better thing I do, than I have every done' )
doc.append( 'Call me Ishmael' )
doc.append( 'Is this a dagger I see before me?' )
doc.append( 'O happy dagger' )

I have done till:

import pandas as pd
review = pd.read_csv('/home/text/Downloads/reviews.csv')
review_df = pd.DataFrame(review)
# See the dimensions of the data frame:
review_df.shape

# Create tdm for each column after removing NA:
trgt_col = review_df[pd.notnull(review_df['col1'])]

What I would like to do is ultimately create a TermDocument Matrix of the words in the specified column.
So can someone please help me with this??


#2

Hi @pagal_guy
You can use

from sklearn.feature_extraction.text import TfidfTransformer

Here are some links:
http://scikit-learn.org/stable/modules/feature_extraction.html
http://textminingonline.com/dive-into-nltk-part-i-getting-started-with-nltk

Thanks,
Monica