I’ve read this article:
The author’s explanation of the content-based recommendation is pretty awesome.
Most of the reader’s questions are how the author calculates the IDF scores. I noticed he mentioned that he used an assumed digit like 10 to represent the total number of the documents. One interesting finding is that the author interprets the prediction scores as the percentage of how likely the user might like the specific article, and it seems all his calculated scores are in [-1, 1]. I’m wondering whether the prediction score has any range. As in his example, the total number of the article is just assumed. In the practice, whether the prediction scores we calculated should also fall in an interpretable digit. I’m asking this because I implemented one project according to this tutorial, while my prediction scores do not fall into this range. And I read other references by another author, who used the same method, I notice that his results are also not in the range: https://github.com/youonf/recommendation_system/blob/master/content_based_filtering/content_based_recommender_approach2_v2.ipynb (but I thought he might used another method when he calculates the DF and IDF scores).
Could anyone have any thoughts or comments on whether there should be a specific range for the prediction scores which might be used to help us validate our results?
Thank you in advance!