Show the implementation of Association Algorithm.




Show the implementation of Association Algorithm.

The Microsoft Association algorithm is an association algorithm provided by Analysis Services that is useful for recommendation engines. A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest. The Microsoft Association algorithm is also useful for market basket analysis. For an example of a market basket analysis,

The Apriori algorithm does not analyze patterns, but rather generates and then counts candidate item sets. An item can represent an event, a product, or the value of an attribute, depending on the type of data that is being analyzed.
In the most common type of association model Boolean variables, representing a Yes/No or Missing/Existing value, are assigned to each attribute, such as a product or event name. A market basket analysis is an example of an association rules model that uses Boolean variables to represent the presence or absence of particular products in a customer's shopping basket.
For each item set, the algorithm then creates scores that represent support and confidence. These scores can be used to rank and derive interesting rules from the item sets.
Association models can also be created for numerical attributes. If the attributes are continuous, the numbers can be discretized, or grouped in buckets. The discretized values can then be handled either as Booleans or as attribute-value pairs.

The general form of an association rule is X => Y, where X and Y are two disjoint item sets. The "support" of an item set is the number of transactions that contain all the items of that item set; whereas the support of an association rule is the number of transactions that contain all items of both X and Y. The "confidence" of an association rule is the ratio between its support and the support of X.

A given association rule X => Y is considered significant and useful, if it has high support and confidence values. The user will specify a threshold value for support and confidence, so that different degrees of significance can be observed based on these threshold values.
t  item set must itself be a large item set". This process of retaining necessary item sets only is called "pruning" Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.

Comments