Logical Architecture (Analysis Services - Data Mining)
Data mining is a process that involves the interaction of multiple components. You access sources of data in a SQL Server database or any other data source to use for training, testing, or prediction. You define data mining structures and models by using Business Intelligence Development Studio or Visual Studio 2005. You can manage data mining objects and create predictions and queries by using SQL Server Management Studio. When the solution is complete, you deploy it to an instance of Analysis Services.
Data Mining Solutions and Projects
To create a data mining project, you must define data sources, data mining structures, and data mining models. The data mining project that you create can also contain test sets for validating your model. After you have deployed a project to the server, you can continue developing and testing new models in the original solution.
Data Mining Source Data
You do not need to use a cube or other specialized data source to perform data mining. Data mining can be done quickly and easily on relational data tables, or any other data source that has been defined as an Analysis Services data source view. The data that you use in data mining is not stored in the data mining solution; only the bindings are stored. The data might reside in a database created in a previous version of SQL Server, a CRM system, or even a flat file. An Analysis Services data source view enables you to combine various data sources by specifying joins between multiple tables. You can also add tables that have a many-to-one relationship to create nested table columns.
When you need to use the data in the data mining solution, Analysis Services reads the data from the source and generates a cache of aggregates and other information that is used. You can keep the cached information and use it to create new data mining models, or you can delete it to save storage space.
You can continually update your data mining solution with new data, or, if you find a model that works well, you can deploy the model as is and never add new data to the model.
SQL Server 2008 Analysis Services also provides the ability to separate your data into training and testing data sets, so that you can test your mining models on a representative, randomly selected set of data.
For more information about creating and using Analysis Services data sources and data source views, see Defining Data Sources (Analysis Services).
Data Mining Structures
A data mining structure is a logical data structure that defines the data domain from which mining models are built. A single mining structure can support multiple mining models that share the same domain. The data mining structure can also be partitioned into a training and test set, by specifying a percentage or amount of data as a HOLDOUT. This partitioning can be done automatically when you define the data mining structure
For more information, see Mining Structures (Analysis Services - Data Mining).
A data mining structure can contain nested tables. A nested table provides additional detail about the case that is modeled in the primary data table. For more information, see Nested Tables (Analysis Services - Data Mining)
Data Mining Models
A data mining model represents a combination of data, a data mining algorithm, and a collection of parameter and filter settings that affect the data used and how the data is processed. For more information, see Mining Models (Analysis Services - Data Mining).
You define a data mining model by using the Data Mining Extensions (DMX) language, or by using the Data Mining Wizard in BI Development Studio. For more information about how to use the Data Mining Wizard, see Data Mining Wizard (Analysis Services - Data Mining). For more information about how to use DMX, see Data Mining Extensions (DMX) Reference.
After you define the structure of the mining model, you process it, populating the empty structure with the patterns that describe the model. This is known as training the model. Patterns are found by passing the original data through a mathematical algorithm. You can use parameters to adjust each algorithm. For more information about how to select a data mining algorithm, see Data Mining Algorithms (Analysis Services - Data Mining). For more information about how to set parameters on individual data mining algorithms to fine-tune the results of a model, see Customizing a Data Mining Model (Analysis Services - Data Mining).
As you build new data mining models, you iteratively test them by making predictions, and then make changes to improve the results. Changes might include adding more data, or changing the parameters of the model to achieve a better fit with the data. For information about testing the accuracy of predictions, see Validating Data Mining Models (Analysis Services - Data Mining).
Deployment
The ultimate goal of data mining development is to create a model that can be used by end users and analysts for making predictions and performing in-depth analysis. Therefore, when you are satisfied with the results that a model generates, you deploy the model to a production environment. In the production environment, the mining models might serve various purposes, depending on your needs. The following list provides some examples of tasks that you can perform by using a data mining model:
Use the models to create predictions, which you can then use to make business decisions. SQL Server provides the DMX language that you can use to create prediction queries, and Prediction Query Builder to help you build the queries.
Embed data mining functionality directly into an application. You can include Analysis Management Objects (AMO) or an assembly that contains a set of objects that your application can use to create, alter, process, and delete mining structures and mining models. Alternatively, you can send XML for Analysis (XMLA) messages directly to an instance of Analysis Services.
Use Integration Services to create a package in which a mining model is used to intelligently separate incoming data into multiple tables. For example, if a database is continually updated with potential customers, you could use a mining model together with Integration Services to split the incoming data into customers who are likely to purchase a product and customers who are likely to not purchase a product.
Create a report that lets users directly query against an existing mining model. Users might want to create different predictions, or analysts might want direct access to the mining model content, so that they can explore interesting patterns in the data.
Updating the model is part of the deployment strategy. As more data comes into the organization, you must reprocess the models, thereby improving their effectiveness. For more information, see Deployment (Analysis Services - Data Mining) and Creating DMX Prediction Queries