Apriori algorithm is used to find the frequent features/ items that occur together.
An association rule is a pattern that states when X occurs, Y occurs with certain probability.
This process is done iteratively i.e. frequent item-sets with 1 item are found first, then 2 items, then 3 and so on…
Before we move on to the algorithm it is important to understand some important terms:
Support: The rule holds with support sup in T (the transaction data set) if sup% of transactions contain X U Y.
sup = Pr(X U Y) = count( X U Y) / total transaction count
Confidence: The rule holds in T with confidence conf if conf% of transactions that contain X also contain Y.
conf = Pr(Y | X) = count( X U Y) / count(X)
- First we find the single items that have the required count/support.
- Then we combine this single item with all the other items to shortlist the 2-item data sets that satisfy the required support.
- Then, we generate all the possible rules that are contained in these 2-item datasets and obtain the rules that satisfy the minimum confidence.
- Then, we move on to calculate the frequent 3-item data sets instead of 2-item data sets using 2 and 3 recursively and so on…
The algorithm will be clear when you solve the following question by yourself: