Managing Batches for Bulk Import
This section discusses managing how batch size is determined for a bulk import operation. By default, all the rows in a data file are imported as a single batch of unknown size in a single transaction. In this case, if the import operation fails before completion, the entire transaction is rolled back, and no data is added to the destination table. The failed operation must then be restarted at the beginning of the data file.
Importing a large data file as a single batch can be problematic, so bcp and BULK INSERT let you import data in a series of batches, each of which is smaller than the data file. Each batch is imported and logged in a separate transaction, and after a given transaction is committed, the rows imported by that transaction are committed. If the operation fails, only rows imported from the current batch are rolled back, and you can resume importing data starting at the beginning of the failed batch rather than at the beginning of the data file.
Note
For information about how batches work, see Batches.
Alternatively, if you choose not to limit the batch size, you can improve performance by estimating the size of the data file in your command. Your estimate is used by the query processor when creating the query plan for the operation.
Note
When you specify batch size or data file size, accuracy is not critical.
The following table summarizes the qualifiers that support these alternatives.
Command |
Batch Size |
Rows Sent per Batch |
Kilobytes Sent per Batch |
---|---|---|---|
bcp1 |
-bbatch_size |
-h "ROWS_PER_BATCH = bb" |
-h "KILOBYTES_PER_BATCH = cc" |
BULK INSERT2 |
BATCHSIZE = batch_size |
ROWS_PER_BATCH = rows_per_batch |
KILOBYTES_PER_BATCH = kilobytes_per_batch
Note
Using this option with BATCHSIZE generates an error.
|
INSERT ... SELECT * FROM OPENROWSET (BULK...) |
—3 |
ROWS_PER_BATCH = rows_per_batch |
—3 |
1 In a bcp command, do not use the -bbatch_size switch and the ROWS_PER_BATCH or KILOBYTES_PER_BATCH hint together. Combining these causes an error.
2 In a BULK INSERT command, if you use BATCHSIZE with ROWS_PER_BATCH or KILOBYTES_PER_BATCH, BATCHSIZE takes precedence.
3 OPENROWSET has neither BATCHSIZE nor KILOBYTES_PER_BATCH options.
The following sections describe the use of these qualifiers.
Specifying the Approximate Batch Size
If you import a very large number of rows, dividing the data into batches can offer advantages. After each batch complete, the transaction is logged. If, for any reason, a bulk-import operation terminates before completion, only the current transaction (batch) is rolled back.
Note
Bulk logging includes a copy of the imported data in the transaction log. This can cause the log to grow quickly, but after each batch, you can back up the log to reclaim log space.
To import the data file in a series of batches of approximately a specified size, use the following qualifier:
For bcp: -b
For BULK INSERT: BATCHSIZE
Each batch of rows is inserted as a separate transaction. If, for any reason, the bulk-import operation terminates before completion, only the current transaction is rolled back. For instance, if a data file has 1000 rows, and a batch size of 100 is used, Microsoft SQL Server logs the operation as 10 separate transactions; each transaction inserts 100 rows into the destination table. If the bulk-import operation terminates while loading row 750, only the previous 49 rows are removed as SQL Server rolls back the current transaction. The destination table still contains the first 700 rows.
SQL Server optimizes the load automatically, according to the batch size value, which may result in better performance. In general, you should make the batch size as large as practical. Typically, the larger the batch size, the better the performance of the bulk-import operation. A few exceptions exist, however. If one or more indexes exist on the target table, then a larger batch size might put memory pressure for sorting. Also, during a parallel load performed without using the TABLOCK option, a larger batch size might lead to more blocking.
Note
Batch sizes are not applicable when you bulk export data from an instance of SQL Server to a data file.
Specifying the Approximate Size of a Data File
If you do not specify a batch size for the bulk-import operation, you can indicate the approximate size of the data file to allow the query processor to effectively allocate resources in the query plan. To indicate the approximate size of the data file, either estimate the number of rows or the number of kilobytes of data, as follows:
Estimating rows per batch
To estimate the number of rows, use the ROWS_PER_BATCH hint or option. If you specify a value > 0, the query processor uses the value of ROWS_PER_BATCH as a hint for allocating resources in the query plan. This value should be of the same order as the actual number of rows.
Although all rows from the data file are copied into an instance of SQL Server in one batch, bcp displays the message "1000 rows sent to SQL Server" after every 1000 rows. This message is for information only and occurs regardless of the batch size.
Estimating kilobytes per batch
To estimate the size of the data file in kilobytes, use the KILOBYTES_PER_BATCH hint or option. SQL Server optimizes the bulk-import operation according to the specified value.
Note
When you bulk import a large data file without specifying the batch size or any minimal-logging optimizations, the transaction log might fill before the bulk import operation is complete. To avoid this situation, enlarge the transaction log or allow it to grow automatically.