Show the implementation of Association Algorithm.
The Microsoft
Association algorithm is an association algorithm provided by Analysis Services
that is useful for recommendation engines. A recommendation engine recommends
products to customers based on items they have already bought, or in which they
have indicated an interest. The Microsoft Association algorithm is also useful
for market basket analysis. For an example of a market basket analysis,
The Apriori
algorithm does not analyze patterns, but rather generates and then counts
candidate item sets. An item can represent an event, a product, or the value of
an attribute, depending on the type of data that is being analyzed.
In the most
common type of association model Boolean variables, representing a Yes/No or
Missing/Existing value, are assigned to each attribute, such as a product or
event name. A market basket analysis is an example of an association rules
model that uses Boolean variables to represent the presence or absence of
particular products in a customer's shopping basket.
For each item
set, the algorithm then creates scores that represent support and confidence.
These scores can be used to rank and derive interesting rules from the item
sets.
Association
models can also be created for numerical attributes. If the attributes are
continuous, the numbers can be discretized, or grouped in buckets. The
discretized values can then be handled either as Booleans or as attribute-value
pairs.
The general
form of an association rule is X => Y, where X and Y are two disjoint item
sets. The "support" of an item set is the number of transactions that
contain all the items of that item set; whereas the support of an association
rule is the number of transactions that contain all items of both X and Y. The
"confidence" of an association rule is the ratio between its support
and the support of X.
A given
association rule X => Y is considered significant and useful, if it has high
support and confidence values. The user will specify a threshold value for
support and confidence, so that different degrees of significance can be
observed based on these threshold values.
t item set must itself be a large item
set". This process of retaining necessary item sets only is called
"pruning" Apriori uses a "bottom up" approach, where
frequent subsets are extended one item at a time (a step known as candidate
generation), and groups of candidates are tested against the data. The
algorithm terminates when no further successful extensions are found.
Comments
Post a Comment