Show the implementation of Clustering Algorithm.
The Microsoft
Clustering algorithm is a segmentation algorithm provided by Analysis Services.
The algorithm uses iterative techniques to group cases in a dataset into
clusters that contain similar characteristics. These groupings are useful for
exploring data, identifying anomalies in the data, and creating predictions.
Clustering
models identify relationships in a dataset that you might not logically derive
through casual observation. For example, you can logically discern that people
who commute to their jobs by bicycle do not typically live a long distance from
where they work. The algorithm, however, can find other characteristics about
bicycle commuters that are not as obvious. In the following diagram, cluster A
represents data about people who tend to drive to work, while cluster B
represents data about people who tend to ride bicycles to work.
The clustering
algorithm differs from other data mining algorithms, such as the Microsoft
Decision Trees algorithm, in that you do not have to designate a predictable
column to be able to build a clustering model. The clustering algorithm trains
the model strictly from the relationships that exist in the data and from the
clusters that the algorithm identifies.
the Microsoft
Clustering algorithm provides two methods for creating clusters and assigning
data points to the clusters. The first, the K-means algorithm, is a hard
clustering method. This means that a data point can belong to only one cluster,
and that a single probability is calculated for the membership of each data
point in that cluster. The second method, the Expectation Maximization (EM)
method, is a soft clustering method. This means that a data point always
belongs to multiple clusters, and that a probability is calculated for each
combination of data point and cluster.
Comments
Post a Comment