Article
10/01/2010

Customizing a Data Mining Model (Analysis Services - Data Mining)

After you have selected an algorithm that meets your business needs, you can customize the mining model in the following ways to potentially improve results.

Use different columns of data in the model, or change the usage or content types of the columns.
Create filters on the mining model to restrict the data used in training the model.
Set algorithm parameters to control thresholds, tree splits, and other conditions.
Change the default algorithm that is used to analyze data or make predictions.

Changing Data Used by the Model

The decisions that you make about which columns of data to use in the model, and how to use and process that data, can greatly affect the results of analysis. The following topics provide information to help you understand these choices.

Mining Models (Analysis Services - Data Mining)

Provides an overview of the architecture of a mining model, including the underlying mining structure and the choice of mining columns.
Creating Filters for Mining Models (Analysis Services - Data Mining)

Explains how you can create filters that apply to a mining model, in order to create models based on a subset of the mining structure data.
Feature Selection in Data Mining.

Explains how Analysis Services uses a process called feature selection to select only the most useful attributes for addition to a model. Reducing the number of columns and attributes can improve performance and the quality of the model. The feature selection methods that are available differ depending on the algorithm that you choose.

If you use the Data Mining wizard, you can also have Analysis Services automatically select the data that is most useful for building a particular model.

Customizing Algorithm Settings

The choice of algorithm determines what kind of results you will get. For general information about how a specific algorithm works, or the business scenarios where you would benefit from using a particular algorithm, see Data Mining Algorithms (Analysis Services - Data Mining).

The data mining algorithms provided in Analysis Services are also extensively customizable. You can control the behavior of the algorithm and how it processes data by setting algorithm parameters. The following topics provide detailed information about the parameters that each algorithm supports.

Microsoft Decision Trees Algorithm Technical Reference

Microsoft Clustering Algorithm Technical Reference

Microsoft Naive Bayes Algorithm Technical Reference

Microsoft Association Algorithm Technical Reference

Microsoft Sequence Clustering Algorithm Technical Reference

Microsoft Neural Network Algorithm Technical Reference

Microsoft Logistic Regression Algorithm Technical Reference

Microsoft Linear Regression Algorithm Technical Reference

Microsoft Time Series Algorithm Technical Reference

The topic for each algorithm type also lists the prediction functions that can be used with models based on that algorithm.

List of Algorithm Parameters

Each algorithm supports parameters that you can use to customize the behavior of the algorithm and fine-tune the results of your model. For a description of how to use each parameter, see the following topics:

Property name	Applies to
AUTO_DETECT_PERIODICITY	Microsoft Time Series Algorithm Technical Reference
CLUSTER_COUNT	Microsoft Clustering Algorithm Technical Reference Microsoft Sequence Clustering Algorithm Technical Reference
CLUSTER_SEED	Microsoft Clustering Algorithm Technical Reference
CLUSTERING_METHOD	Microsoft Clustering Algorithm Technical Reference
COMPLEXITY_PENALTY	Microsoft Decision Trees Algorithm Technical Reference Microsoft Time Series Algorithm Technical Reference
FORCED_REGRESSOR	Microsoft Decision Trees Algorithm Technical Reference Microsoft Linear Regression Algorithm Technical Reference
FORECAST_METHOD	Microsoft Time Series Algorithm Technical Reference
HIDDEN_NODE_RATIO	Microsoft Neural Network Algorithm Technical Reference
HISTORIC_MODEL_COUNT	Microsoft Time Series Algorithm Technical Reference
HISTORICAL_MODEL_GAP	Microsoft Time Series Algorithm Technical Reference
HOLDOUT_PERCENTAGE	Microsoft Logistic Regression Algorithm Technical Reference Microsoft Neural Network Algorithm Technical Reference Note This parameter is different from the holdout percentage value that applies to a mining structure.
HOLDOUT_SEED	Microsoft Logistic Regression Algorithm Technical Reference Microsoft Neural Network Algorithm Technical Reference Note This parameter is different from the holdout seed value that applies to a mining structure.
INSTABILITY_SENSITIVITY	Microsoft Time Series Algorithm Technical Reference
MAXIMUM_INPUT_ATTRIBUTES	Microsoft Clustering Algorithm Technical Reference Microsoft Decision Trees Algorithm Technical Reference Microsoft Linear Regression Algorithm Technical Reference Microsoft Naive Bayes Algorithm Technical Reference Microsoft Neural Network Algorithm Technical Reference Microsoft Logistic Regression Algorithm Technical Reference
MAXIMUM_ITEMSET_COUNT	Microsoft Association Algorithm Technical Reference
MAXIMUM_ITEMSET_SIZE	Microsoft Association Algorithm Technical Reference
MAXIMUM_OUTPUT_ATTRIBUTES	Microsoft Decision Trees Algorithm Technical Reference Microsoft Linear Regression Algorithm Technical Reference Microsoft Logistic Regression Algorithm Technical Reference Microsoft Naive Bayes Algorithm Technical Reference Microsoft Neural Network Algorithm Technical Reference
MAXIMUM_SEQUENCE_STATES	Microsoft Sequence Clustering Algorithm Technical Reference
MAXIMUM_SERIES_VALUE	Microsoft Time Series Algorithm Technical Reference
MAXIMUM_STATES	Microsoft Clustering Algorithm Technical Reference Microsoft Neural Network Algorithm Technical Reference Microsoft Sequence Clustering Algorithm Technical Reference
MAXIMUM_SUPPORT	Microsoft Association Algorithm Technical Reference
MINIMUM_IMPORTANCE	Microsoft Association Algorithm Technical Reference
MINIMUM_ITEMSET_SIZE	Microsoft Association Algorithm Technical Reference
MINIMUM_DEPENDENCY_PROBABILITY	Microsoft Naive Bayes Algorithm Technical Reference
MINIMUM_PROBABILITY	Microsoft Association Algorithm Technical Reference
MINIMUM_SERIES_VALUE	Microsoft Time Series Algorithm Technical Reference
MINIMUM_SUPPORT	Microsoft Association Algorithm Technical Reference Microsoft Clustering Algorithm Technical Reference Microsoft Decision Trees Algorithm Technical Reference Microsoft Sequence Clustering Algorithm Technical Reference Microsoft Time Series Algorithm Technical Reference
MISSING_VALUE_SUBSTITUTION	Microsoft Time Series Algorithm Technical Reference
MODELLING_CARDINALITY	Microsoft Clustering Algorithm Technical Reference
PERIODICITY_HINT	Microsoft Time Series Algorithm Technical Reference
PREDICTION_SMOOTHING	Microsoft Time Series Algorithm Technical Reference
SAMPLE_SIZE	Microsoft Clustering Algorithm Technical Reference Microsoft Logistic Regression Algorithm Technical Reference Microsoft Neural Network Algorithm Technical Reference
SCORE_METHOD	Microsoft Decision Trees Algorithm Technical Reference
SPLIT_METHOD	Microsoft Decision Trees Algorithm Technical Reference
STOPPING_TOLERANCE	Microsoft Clustering Algorithm Technical Reference

Additional Requirements

Choosing and preparing data is an important part of the data mining process. For example, the algorithms that Microsoft provides do not allow duplicate keys. The type of data that is required for each model differs depending on the algorithm. For more information, see the Requirements section of the following topics:

Microsoft Decision Trees Algorithm	Microsoft Time Series Algorithm
Microsoft Clustering Algorithm	Microsoft Neural Network Algorithm
Microsoft Naive Bayes Algorithm	Microsoft Logistic Regression Algorithm
Microsoft Association Algorithm	Microsoft Linear Regression Algorithm
Microsoft Sequence Clustering Algorithm

Customizing Results by using Queries and Prediction Functions

After the model has been built and processed, you can view the information by using one of the viewers specific to each model type. Alternatively, you can write custom queries by using Data Mining Extensions (DMX) to obtain more advanced or detailed information about the patterns found in the data.

For information about how to create queries that return model content, see Querying Data Mining Models (Analysis Services - Data Mining).

You can use functions to extend the results that a mining model returns. Some functions also return statistics that represent the probability of an outcome, or other scores. In addition, individual algorithms also support additional functions. For example, if a mining model uses clustering, you can use special functions to find information about the clusters. However, if your model is based on the Time Series algorithm, a different set of functions is available for making predictions and querying model content. For more information, see the Technical Reference Topic for each algorithm.

For examples of how to query a mining model and how to use the prediction functions that are designed for specific model types, see Querying Data Mining Models (Analysis Services - Data Mining).

For a list of prediction functions that are supported for all algorithm types, see Mapping Functions to Query Types (DMX).

Assessing Changes in a Model

When you experiment with different models to solve a business problem, or build variations on a model, you need to measure the accuracy of each model and also evaluate how well each model answers the business problem. For general information about evaluating data mining models, see Validating Data Mining Models (Analysis Services - Data Mining). For more information about how to chart the accuracy of different mining models, seeTools for Charting Model Accuracy (Analysis Services - Data Mining).

Share via