I am trying to create a term vector in python so that ultimately I can create a corpus for a wordcloud.Below is my code for it:
import pandas as pd import gensim from gensim import corpora, models, similarities import nltk import re import string survey = pd.read_csv('/home//downloads/survey.csv',dtype = str) survey_df = pd.DataFrame(survey) # See the dimensions of the data frame: survey_df.shape # Create tdm for each column after removing NA: doc_df = pd.DataFrame(survey_df['col1'],dtype = str) doc_df = doc_df[pd.notnull(doc_df['col1'])] doc_list = doc_df.values term_vec = [ ] for d in doc_list: d = str(d) d = d.translate(None, string.punctuation) d = nltk.word_tokenize(d) d = term_vec.append(d) # Print resulting term vectors for vec in term_vec: print vec
What I am getting is:
[‘Clarity’, ‘on’, ‘product’, ‘conceptualization’]
This is a list of list but what I want is a single vector containing all the terms so that I can create a wordcloud and then ultimately a TDM from it.
I am very new to python so please pardon me if these are very basic questions but can someone pls help me with this??