Creating New Data Warehouse Filegroups

For the latest version of Commerce Server 2007 Help, see the Microsoft Web site.

You can use SQL Server filegroups to improve the performance of the Data Warehouse by distributing data across multiple disks and by using parallel threads for query processing. Additionally, filegroups can make managing the Data Warehouse databases easier.

Overview

SQL Server has a primary filegroup that contains the primary data file together with the system tables. You can create user-defined filegroups that contain data files that are grouped together for allocation and administrative purposes.

You can create multiple data files on separate disks and create a user-defined filegroup to contain the files. If you create filegroups, try to have one file per physical disk. You can use filegroups to organize your data files, for example to separate files that are heavily queried from those that are heavily modified.

SQL Server designates one filegroup as the default filegroup. The default filegroup is set to the primary filegroup at the time of database creation, unless you specify otherwise. The default filegroup contains the pages for all tables and indexes that do not have a filegroup specified when they are created.

If the default filegroup is left as the primary filegroup, you must size the primary filegroup appropriately or set it to automatically grow so that you do not run out of space. The primary filegroup must be large enough to hold all system tables and any tables and indexes not allocated to a user-defined filegroup.

If the primary filegroup runs out of space, you will be unable to add any information to the system tables. However, if a user-defined filegroup runs out of space, only the user files that are specifically allocated to that filegroup are affected.

User-defined filegroups can improve performance by balancing data load across multiple disks and by using parallel threads to improve data access. When you create a table, you can assign it to a user-defined filegroup. As data is written to the filegroup, each file is filled in parallel.

Each file is physically put on a disk or a set of disks. SQL Server maintains a file map that associates each database object together with its location on the disk. For example, if one file is created on a filegroup that spans four disks, one file map points to the location of data on all four physical disks. If two files are created on a filegroup that spans four disks, two file maps (one map for each file) point to the location of the data on all four physical disks.

Filegroups use parallel threads to improve data access. When a table is accessed sequentially, the system creates a separate thread for each file in parallel. When the system performs a table scan for a table in a filegroup with four files, it uses four separate threads to read the data in parallel. Generally, if you use multiple files on separate disks it improves performance. Too many files in a filegroup can cause too many parallel threads and create bottlenecks.

In addition to using filegroups to balance data loads for performance, you can use filegroups to simplify maintenance.

To use filegroups to simplify maintenance, consider the following:

  • Back up or restore individual files or filegroups instead of backing up or restoring the whole database. Backing up files or filegroups might be necessary on large databases to have an effective backup and restore strategy.

  • Group tables and indexes with similar maintenance requirements into the same filegroups. For example, you might perform maintenance on some objects more frequently than on other objects. By creating two filegroups and assigning tables to them, you can run daily maintenance tasks against the tables in a daily group, and weekly maintenance tasks against the tables in a weekly group. This limits disk contention between the two filegroups.

  • Assign an individual high-maintenance table to its own filegroup. A table that has frequent updates might have to be backed up and restored separately from the database as a whole.

    Note

    A maximum of 256 filegroups can be created for each database. Filegroups can contain only data files. Transaction logs cannot be part of a filegroup. Filegroups cannot be created independent of database files. A filegroup is an administrative mechanism for grouping files in a database.

Best practices for creating filegroups

Before you create filegroups, you must understand your database structure, data, transactions, and queries thoroughly to determine the best way to add tables and indexes on specific filegroups.

When you create filegroups, you should consider the following:

  • Use maintenance requirements instead of performance considerations to determine the number of filegroups.

    Frequently, the striping capabilities of RAID provide much of the same performance gain that you might achieve with filegroups, without the added administrative burden of defining and managing them.

  • Change the default filegroup if you use filegroups.

    If you create multiple filegroups for your database, you should assign one as the default. This prevents unexpected table growth from constraining the system tables in the primary filegroup.

  • Be aware that filegroups do not provide fault tolerance.

    To include fault tolerance, you can mirror each disk by using RAID 1. However, this is an expensive option.

  • Combine filegroups with hardware-based RAID to optimize your databases.

    You can combine filegroups with hardware-based RAID solutions. First, set up hardware striping, and then use filegroups to spread data across multiple hardware stripe sets. A separate thread used for each file provides parallel data access, and the load is spread among multiple disks to reduce contention.

See Also

Other Resources

Extending the Data Warehouse