Show the implementation of Naïve Bays algorithm.


Show the implementation of Naïve Bays algorithm.

The Microsoft Naive Bays algorithm is a classification algorithm based on Bays’ theorems, and provided by Microsoft SQL Server Analysis Services for use in predictive modeling. The word naïve in the name Naïve Bays derives from the fact that the algorithm uses Bayesian techniques but does not take into account dependencies that may exist. For more information about Bayesian methods, see Microsoft Research Community.
This algorithm is less computationally intense than other Microsoft algorithms, and therefore is useful for quickly generating mining models to discover relationships between input columns and predictable columns. You can use this algorithm to do initial exploration of data, and then later you can apply the results to create additional mining models with other algorithms that are more computationally intense and more accurate.


The Microsoft Naive Bays algorithm calculates the probability of every state of each input column, given each possible state of the predictable column.

Here, the Microsoft Naive Bays Viewer lists each input column in the dataset, and shows how the states of each column are distributed, given each state of the predictable column.
You would use this view of the model to identify the input columns that are important for differentiating between states of the predictable column.

Data Required for Naive Bays Models
When you prepare data for use in training a Naive Bays model, you should understand the requirements for the algorithm, including how much data is needed, and how the data is used.
The requirements for a Naive Bays model are as follows:
A single key column   Each model must contain one numeric or text column that uniquely identifies each record. Compound keys are not allowed.
Input columns   In a Naive Bays model, all columns must be either discrete or discretized columns. For information about discretizing columns, see Discretization Methods (Data Mining).
For a Naive Bays model, it is also important to ensure that the input attributes are independent of each other. This is particularly important when you use the model for prediction.
The reason is that, if you use two columns of data that are already closely related, the effect would be to multiply the influence of those columns, which can obscure other factors that influence the outcome.
Conversely, the ability of the algorithm to identify correlations among variables is useful when you are exploring a model or dataset, to identify relationships among inputs.
At least one predictable column    The predictable attribute must contain discrete or discretized values.
The values of the predictable column can be treated as inputs. This practice can be useful when you are exploring a new dataset, to find relationships among the columns.

Comments