Customizing a Data Mining Model (Analysis Services - Data Mining)
After you have selected an algorithm that meets your business needs, you can customize the mining model in the following ways to potentially improve results.
Use different columns of data in the model, or change the usage or content types of the columns.
Create filters on the mining model to restrict the data used in training the model.
Set algorithm parameters to control thresholds, tree splits, and other conditions.
Change the default algorithm that is used to analyze data or make predictions.
Changing Data Used by the Model
The decisions that you make about which columns of data to use in the model, and how to use and process that data, can greatly affect the results of analysis. The following topics provide information to help you understand these choices.
Mining Models (Analysis Services - Data Mining)
Provides an overview of the architecture of a mining model, including the underlying mining structure and the choice of mining columns.
Creating Filters for Mining Models (Analysis Services - Data Mining)
Explains how you can create filters that apply to a mining model, in order to create models based on a subset of the mining structure data.
Feature Selection in Data Mining.
Explains how Analysis Services uses a process called feature selection to select only the most useful attributes for addition to a model. Reducing the number of columns and attributes can improve performance and the quality of the model. The feature selection methods that are available differ depending on the algorithm that you choose.
If you use the Data Mining wizard, you can also have Analysis Services automatically select the data that is most useful for building a particular model.
Customizing Algorithm Settings
The choice of algorithm determines what kind of results you will get. For general information about how a specific algorithm works, or the business scenarios where you would benefit from using a particular algorithm, see Data Mining Algorithms (Analysis Services - Data Mining).
The data mining algorithms provided in Analysis Services are also extensively customizable. You can control the behavior of the algorithm and how it processes data by setting algorithm parameters. The following topics provide detailed information about the parameters that each algorithm supports.
Microsoft Decision Trees Algorithm Technical Reference
Microsoft Clustering Algorithm Technical Reference
Microsoft Naive Bayes Algorithm Technical Reference
Microsoft Association Algorithm Technical Reference
Microsoft Sequence Clustering Algorithm Technical Reference
Microsoft Neural Network Algorithm Technical Reference
Microsoft Logistic Regression Algorithm Technical Reference
Microsoft Linear Regression Algorithm Technical Reference
Microsoft Time Series Algorithm Technical Reference
The topic for each algorithm type also lists the prediction functions that can be used with models based on that algorithm.
List of Algorithm Parameters
Each algorithm supports parameters that you can use to customize the behavior of the algorithm and fine-tune the results of your model. For a description of how to use each parameter, see the following topics:
Property name |
Applies to |
---|---|
AUTO_DETECT_PERIODICITY |
|
CLUSTER_COUNT |
|
CLUSTER_SEED |
|
CLUSTERING_METHOD |
|
COMPLEXITY_PENALTY |
|
FORCED_REGRESSOR |
|
FORECAST_METHOD |
|
HIDDEN_NODE_RATIO |
|
HISTORIC_MODEL_COUNT |
|
HISTORICAL_MODEL_GAP |
|
HOLDOUT_PERCENTAGE |
Microsoft Logistic Regression Algorithm Technical Reference Microsoft Neural Network Algorithm Technical Reference
Note
This parameter is different from the holdout percentage value that applies to a mining structure.
|
HOLDOUT_SEED |
Microsoft Logistic Regression Algorithm Technical Reference Microsoft Neural Network Algorithm Technical Reference
Note
This parameter is different from the holdout seed value that applies to a mining structure.
|
INSTABILITY_SENSITIVITY |
|
MAXIMUM_INPUT_ATTRIBUTES |
Microsoft Clustering Algorithm Technical Reference Microsoft Decision Trees Algorithm Technical Reference Microsoft Linear Regression Algorithm Technical Reference Microsoft Naive Bayes Algorithm Technical Reference |
MAXIMUM_ITEMSET_COUNT |
|
MAXIMUM_ITEMSET_SIZE |
|
MAXIMUM_OUTPUT_ATTRIBUTES |
Microsoft Decision Trees Algorithm Technical Reference Microsoft Linear Regression Algorithm Technical Reference Microsoft Logistic Regression Algorithm Technical Reference |
MAXIMUM_SEQUENCE_STATES |
|
MAXIMUM_SERIES_VALUE |
|
MAXIMUM_STATES |
Microsoft Clustering Algorithm Technical Reference |
MAXIMUM_SUPPORT |
|
MINIMUM_IMPORTANCE |
|
MINIMUM_ITEMSET_SIZE |
|
MINIMUM_DEPENDENCY_PROBABILITY |
|
MINIMUM_PROBABILITY |
|
MINIMUM_SERIES_VALUE |
|
MINIMUM_SUPPORT |
Microsoft Association Algorithm Technical Reference Microsoft Clustering Algorithm Technical Reference Microsoft Decision Trees Algorithm Technical Reference |
MISSING_VALUE_SUBSTITUTION |
|
MODELLING_CARDINALITY |
|
PERIODICITY_HINT |
|
PREDICTION_SMOOTHING |
|
SAMPLE_SIZE |
Microsoft Clustering Algorithm Technical Reference |
SCORE_METHOD |
|
SPLIT_METHOD |
|
STOPPING_TOLERANCE |
Additional Requirements
Choosing and preparing data is an important part of the data mining process. For example, the algorithms that Microsoft provides do not allow duplicate keys. The type of data that is required for each model differs depending on the algorithm. For more information, see the Requirements section of the following topics:
|
Customizing Results by using Queries and Prediction Functions
After the model has been built and processed, you can view the information by using one of the viewers specific to each model type. Alternatively, you can write custom queries by using Data Mining Extensions (DMX) to obtain more advanced or detailed information about the patterns found in the data.
For information about how to create queries that return model content, see Querying Data Mining Models (Analysis Services - Data Mining).
You can use functions to extend the results that a mining model returns. Some functions also return statistics that represent the probability of an outcome, or other scores. In addition, individual algorithms also support additional functions. For example, if a mining model uses clustering, you can use special functions to find information about the clusters. However, if your model is based on the Time Series algorithm, a different set of functions is available for making predictions and querying model content. For more information, see the Technical Reference Topic for each algorithm.
For examples of how to query a mining model and how to use the prediction functions that are designed for specific model types, see Querying Data Mining Models (Analysis Services - Data Mining).
For a list of prediction functions that are supported for all algorithm types, see Mapping Functions to Query Types (DMX).
Assessing Changes in a Model
When you experiment with different models to solve a business problem, or build variations on a model, you need to measure the accuracy of each model and also evaluate how well each model answers the business problem. For general information about evaluating data mining models, see Validating Data Mining Models (Analysis Services - Data Mining). For more information about how to chart the accuracy of different mining models, seeTools for Charting Model Accuracy (Analysis Services - Data Mining).