I have been trying to build a beer recommendation engine , so far looking at stackoverflow i have decided to make it simply using tf-idf and Cosine similarity .
So far my code like this : `
import pandas as pd import re import numpy as np from bs4 import BeautifulSoup from sklearn.feature_extraction.text import TfidfVectorizer from nltk.corpus import stopwords from nltk.stem import WordNetLemmatizer wnlzer = WordNetLemmatizer() train = pd.read_csv("labeledTrainData.tsv" , header = 0 , \ delimiter = '\t' , quoting = 3) def raw_string_to_list_clean_string( raw_train_review ): remove_html = BeautifulSoup( raw_train_review ).text remove_punch = re.sub('[^A-Za-z ]' , "" , remove_html) token = remove_punch.lower().split() srm_token = [wnlzer.lemmatize(i) for i in token if not i in set(stopwords.words('english'))] clean_text = " ".join(srm_token) return(clean_text) ready_train_list =  length = len(train['review']) for i in range(0 , length): if (i%100 == 0): print "doing %d of %d of training data set" % (i+1 , length) a = raw_string_to_list_clean_string(train['review'][i]) ready_train_list.append(a) vectorizer = TfidfVectorizer(analyzer = "word" , tokenizer = None , preprocessor = None , \ stop_words = None , max_features = 20000) training_our_vectorizer = vectorizer.fit_transform(ready_train_list)``
Now i know how to use cosine similarity but i am not able to figure out ::
1 -> how to use the matrix generated by cosine similarity
2–> how to restrict the recommendation to a max of 5 beers