How to generate all Frequent itemset-1 generation using R

r
data_mining

#1

I plan to consider minsup=0 (minimum support)
I want to generate all frequent item set, along with their confidence value being displayed.
As minsup=0, thus I need confidence between all pairs of items in the basket

Ex: I have 3 items(A,B,C) in the market basket with N transactions
I need confidence between (A,b); (B,c), (C,A) ;
How it can be achieved for m items in R or matlab


#2

see this
http://datamining.togaware.com/survivor/Basket_Analysis.html
and
www.cs.uic.edu/~liub/...fall-06/CS583-association-sequential-patterns.ppt

the first link here shows you how to do it in a GUI with very nice example. Hope it helps

If you need more info on the package ( to tweak the code by hand)

see

Michael Hahsler, et al. has authored and maintains two very useful R packages relating to association rule mining: the arules package and the arulesViz package. Furthermore, Hahsler has provided two very good example articles providing details on how to use these packages in Introduction to arules and Visualizing Association Rules.

data("Adult")
## Mine association rules.
rules <- apriori(Adult, 
                 parameter = list(supp = 0.5, conf = 0.9,
                                  target = "rules"))
summary(rules)

Calls the C implementation of the Apriori algorithm by Christian Borgelt for mining frequent itemsets, rules or hyperedges.

Note: Apriori only creates rules with one item in the RHS (Consequent)!

Note: The default value in APparameter for minlen is 1. This means that rules with only one item (i.e., an empty antecedent/LHS) like {} => {beer} will be created. These rules mean that no matter what other items are involved the item in the RHS will appear with the probability given by the rule’s confidence (which equals the support). If you want to avoid these rules then use the argument parameter=list(minlen=2).

read

http://www.inside-r.org/packages/cran/arules/docs/apriori

but since you need to generate all

see this

Here we can look at the frequent itemsets and we can use the eclat algorithm rather than the apriori algorithm.

itemFrequencyPlot(Adult, support = 0.1, cex.names=0.8);
 
fsets = eclat(trans, parameter = list(support = 0.05), control = list(verbose=FALSE));
 
singleItems = fsets[size(items(fsets)) == 1];
 
singleSupport = quality(singleItems)$support;
 
names(singleSupport) = unlist(LIST(items(singleItems), decode = FALSE));
 
head(singleSupport, n = 5);
 
itemsetList = LIST(items(fsets), decode = FALSE);
 
allConfidence = quality(fsets)$support / sapply(itemsetList, function(x)
 
max(singleSupport[as.character(x)]));
 
quality(fsets) = cbind(quality(fsets), allConfidence);
 
summary(fsets);

Using these approaches a researcher can narrow down and determine association rules and determine what leads to frequent items

HOPE it helps


#3

Thanks @ajay_ohri .
You have mentioned a way to creates rules with one item in the RHS (Consequent)!

Do we have a method where I can restrict the rules with only one item in LHS(Antecednet) as well.


#4

Hi sowmiyanm,

what factors influenced an event ‘X’

To find out what customers had purchased before buying ‘Whole Milk’. This will help you understand the patterns that led to the purchase of ‘whole milk’.

rules <- apriori (data=Groceries, parameter=list (supp=0.001,conf = 0.08), appearance = list (default=“lhs”,rhs=“whole milk”), control = list (verbose=F)) # get rules that lead to buying ‘whole milk’

Find out what events were influenced by a given event

In this case: the Customers who bought ‘Whole Milk’ also bought. In the equation, ‘whole milk’ is in LHS (left hand side).
rules <- apriori (data=Groceries, parameter=list (supp=0.001,conf = 0.15,minlen=2), appearance = list (default=“rhs”,lhs=“whole milk”), control = list (verbose=F)) # those who bought ‘milk’ also bought…

Hope this helps

Regards,
tony