Guidelines for Optimizing Bulk Import

This topic considers guidelines to optimize performance for several bulk import scenarios:

  • Importing data from a single client (or stream) into an empty table.

  • Importing data from a single client (or stream) into a partially populated, non-empty table.

    Note

    Importing data into a non-empty table is known as an incremental bulk import. The key question for an incremental bulk import is whether indexes should be dropped beforehand.

  • Importing data in parallel with table-level locking from multiple clients (or streams).

  • Copying data between Microsoft SQL Server instances.

This topic also provides a summary of table locking and logging during bulk import operations.

Importing Data from a Single Client (or Stream) into an Empty Table

When you import data into an empty table from a single client (or stream), Microsoft recommends that you do the following

  • Specify the TABLOCK qualifier (which is available as a hint or option on all three bulk-import methods). Using TABLOCK causes a table-level lock to be taken for the duration of the bulk operation and eliminates the overhead of locking individual rows. For more information, see Controlling Locking Behavior for Bulk Import.

  • Minimize logging. For more information, see Prerequisites for Minimal Logging in Bulk Import.

  • Handle indexes as follows.

    When you are using bcp, BULK INSERT, or INSERT ... SELECT * FROM OPENROWSET(BULK...), if the table is empty and has a clustered index and the data in the data file is ordered to match the clustered index key columns, additionally do the following:

    For an empty table, this approach is significantly faster than creating the clustered index after importing the data, as it eliminates the sorting step.

    Note

    If a non-empty table has indexes, bulk imports are fully logged, even under the bulk-logged recovery model. To decide whether to remove the indexes, consider whether the benefit of bulk importing into a table with no indexes is greater than the cost of dropping and re-creating them.

    If you bulk import data into an empty table with indexes and you specify the batch size, the table becomes non-empty after the first batch. Starting with the second batch, the data is fully-logged. For empty indexed tables, consider performing the bulk import in a single batch.

    Note

    When a batch size is not specified, by default, SQL Server query optimizer assumes a default size to be the size of the data file. To improve performance, you can use the ROWS_PER_BATCH or KILOBYTES_PER_BATCH qualifier as a hint to the optimizer about the approximate number of rows in the data file. For more information, see Managing Bulk Copy Batch Sizes.

    Generally, it is faster to bulk import into a table without any indexes, than to bulk import into a table with indexes. Therefore, if an empty table has any indexes, you should drop them before you import data into the table and re-create them afterwards. If the data is not sorted on a clustered-key column and if the table is empty, drop all indexes, import the data, and then create new indexes.

Importing Data from a Single Client (or Stream) into a Non-Empty Table

Importing data into a table that already contains data (a non-empty table) is known as an incremental bulk import. The key question for an incremental bulk import is whether indexes should be dropped beforehand. You have two options. You can either keep the indexes, or your can drop them and recreate them afterwards.

When you import data from a single client (or stream) into a non-empty table, whether to keep the indexes depends on the amount of new data imported relative to the amount of existing data in the table:

  • If you are importing a small amount of new data relative to the amount of existing data, dropping and rebuilding the indexes may be counter productive. The time required to rebuild the indexes is likely to be longer than the time saved during the bulk operation.

  • In contrast, if you are importing a relatively large amount of new data, dropping the indexes on the table before performing the bulk operation can increase performance, without substantially increasing the time required for indexing.

The following table lists the minimal amount of new data that should be in a table before you drop indexes. The minimal amount is in proportion to the total data in the table. The amount differs depending on the types and combinations of indexes. If the new data exceeds the suggested percentage for a given type of index or grouping of indexes, consider dropping indexes before a bulk operation and re-creating them afterward. These numbers are sensitive to the data pattern of the existing data and of the data to be loaded. Therefore, numbers provide only as a general guideline.

Indexes

Relative amount of new data

Clustered index only

30 percent

Clustered and one nonclustered index

25 percent

Clustered and two nonclustered indexes

25 percent

Single nonclustered index only

100 percent

Two nonclustered indexes

60 percent

Importing Data in Parallel with Table-Level Locking from Multiple Clients (or Streams)

If SQL Server is running on a computer that has more than one processor and the data to be bulk imported into a table can be divided into separate data files, you can improve performance by importing the data into the table from multiple clients in parallel. When bulk importing from multiple clients into one table, each client must have its own input data file.

When importing data into a table from multiple clients, consider the following:

  • Multiple bulk-import streams can potentially block each other.

    To prevent this, SQL Server provides a special internal lock, called a bulk-update (BU) lock. To get BU lock you need to specify the TABLOCK option with each bulk import stream without blocking other bulk import streams. This avoids table-access conflicts among the clients. However, a BU lock is only available on a table (whether empty or non-empty) that has no index. If you specify TABLOCK on a table with indexes, parallel bulk import is not possible. For more information, see Controlling Locking Behavior for Bulk Import.

    If the table has any indexes, you can take advantage of BU locking by dropping all of the indexes before bulk import data. You can then bulk import data in parallel using TABLOCK and then re-create the index or indexes. Note, also, that if a non-empty table has indexes, bulk imports are fully logged, even under the bulk-logged recovery model. To decide whether to remove the indexes, consider whether the benefit of bulk importing into a table with no indexes is greater than the cost of dropping and re-creating them.

    Note

    If you drop secondary indexes, consider re-creating them in parallel by creating each secondary index from a separate client.

    To avoid dropping and re-creating indexes, you can perform a parallel import without specifying the TABLOCK hint. In this case, however, the multiple bulk-import streams can potentially block each other, and also, bulk logging optimizations are unavailable. To minimize blocking, you can specify a smaller batch size and use the ORDER hint to eliminate the sorting step during the bulk import operation.

  • The data must be divided into multiple input files, one file per client. To use the CPU most efficiently, the data files should be similar in size.

For more information, see Importing Data in Parallel with Table Level Locking.

Table Locking and Logging During Bulk Import

The following table summarizes how, during a bulk-import operation, the types of locks are determined by the table schema. The following table also indicates whether the table is empty, whether TABLOCK is set for the operation, and what type of logging occurs if the database is using the bulk-logged recovery model.

Note

After the first successful batch, the table is no longer empty.

Bulk-import target table

Is table empty?

Is TABLOCK set?

Locks

Logging under bulk-logged and simple recovery models

Heap

Yes

Yes

BU-Tab

Bulk logging

Heap

Yes

No

IX-Tab

Full logging

Heap

No

Yes

BU-tab

Bulk logging

Heap

No

No

IX-Tab

Full logging

Heap with one non-clustered index

Yes

Yes

SCH-M

Bulk logging

Heap with one non-clustered index

Yes

No

IX-Tab

Full logging

Heap with one non-clustered index

No

Yes

SCH-M

  • Data—bulk logging

  • Index—full logging

Heap with one non-clustered index

No

No

IX-Tab

Full logging

Clustered index

Yes

Yes

SCH-M

Bulk logging

Clustered index

Yes

No

IX-Tab

Full logging

Clustered index

No

Yes

X-TAB

Full logging

Clustered index

No

No

IX-Tab

Full logging

Copying Data Between SQL Server Instances

To bulk-copy data from one instance of SQL Server to another, use bcp to export the table data into a data file. Then use one of the bulk import methods to import the data from the file to a table. Perform both the bulk export and bulk import operations using either native or Unicode native format.

If the source table has a clustered index or if you intend to bulk import the data into a table with a clustered index:

  1. Bulk export the data out of the source table by using bcp with the query option to a SELECT statement and using an appropriate ORDER BY clause to create an ordered data file. For more information, see bcp Utility.

  2. When bulk importing the data into SQL Server. Use the ORDER qualifier, which is supported only by bcp and BULK INSERT. For more information, see Controlling the Sort Order When Bulk Importing Data.

For more information, see Copying Data Between Servers.