This is in reference to an article published here -
A Guide to Sequence Prediction using Compact Prediction Tree (with codes in Python)
In the prediction phase, step 3 it writes the logic to compute score be given as -
If the item is not present in the dictionary, then,
*score = 1 + (1/number of similar sequences) +(1/number of items currently in the countable dictionary+1)0.001
*score = (1 + (1/number of similar sequences) +(1/number of items currently in the countable dictionary+1)0.001) * oldscore
Can anyone explain how was this score formulation derived?
This is the original paper -
Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction
Where the score computing logic is -
"The primary scoring measure is the support. But in the case where the support of two items is equal, the confidence is used. We define the support of an item s_i as the number of times s_i appears in sequences similar to S, where S is the sequence to predict. The confidence of an item s_i is defined as the support of si divided by the total number of training sequences that contain si (the cardinality of the bitset of s_i in the II). "