TfIdf features on non NLP problems


Could you please tell me how tfidf features are used incased of non-textual data(NLP).Was going through this repo solution to BlackFridayDataHack.I did not understand how he used the Tfidf features.



Hi kris_ml,

In past, I have used Tfidf weighting on non textual data, the effect of Tfidf weighting is same as one hot encoding if we set the the parameters(ngram_range=(1, 1),min_df=1, max_features=None).

Here Is the link to my github code(line 121) where I used Tfidf weighting in one of my past Kaggle competition. In this code the I used each events from e1 to en as a word and concatenate them so that each observation in data represent a Sentence, then I applied tfidf weighting on this.