How to create a term vector in python containing all the terms from a document list

termdocumentmatrix
python

#1

Hello ,

I am trying to create a term vector in python so that ultimately I can create a corpus for a wordcloud.Below is my code for it:

import pandas as pd
import gensim
from gensim import corpora, models, similarities
import nltk
import re
import string

survey = pd.read_csv('/home//downloads/survey.csv',dtype = str)
survey_df = pd.DataFrame(survey)
# See the dimensions of the data frame:
survey_df.shape

# Create tdm for each column after removing NA:
doc_df = pd.DataFrame(survey_df['col1'],dtype = str)
doc_df = doc_df[pd.notnull(doc_df['col1'])]
doc_list = doc_df.values


term_vec = [ ]

for d in doc_list:
    d = str(d)
    d = d.translate(None, string.punctuation)
    d = nltk.word_tokenize(d)
    d = term_vec.append(d)

# Print resulting term vectors
for vec in term_vec:
    print vec

What I am getting is:
[‘Clarity’, ‘on’, ‘product’, ‘conceptualization’]
[‘credit’, ‘card’]

This is a list of list but what I want is a single vector containing all the terms so that I can create a wordcloud and then ultimately a TDM from it.
I am very new to python so please pardon me if these are very basic questions but can someone pls help me with this??

@Sunil sir, @kunal sir.


#3

@pagal_guy I am not sure how vector will help you and not a list
However, you can convert the list to a normal array

import numpy as np
totalVec=[]
totalVec = totalVec + vec

Outside the loop
totalArray= np.asarray(totalVec)