Copying Data Between Different Collations
Data can be moved between Unicode columns seamlessly and conversion is unnecessary. Data moved between columns containing non-Unicode character data, however, must be converted from the source code page to the target code page. A code page is an ordered set of characters of a given script in which a numeric index, or code point value, is associated with each character. Code pages support character sets and keyboard layouts used by different Microsoft Windows locales. For a list of the supported code pages, see Collation and Code Page Architecture.
By default, character data is converted in the following manner:
Operation | Character data converted to: |
---|---|
Export |
OEM code page characters |
Import |
ANSI/Microsoft Windows code page characters |
Conversion between OEM and ANSI code pages causes the loss of extended or double-byte character set (DBCS) characters. To avoid such conversions, you can specify a code page or, for Microsoft SQL Server 7.0 or later, a collation.
Note
For information about transferring a database from one collation to another collation, see this Microsoft Web site.
Beginning with SQL Server 7.0, the physical storage of character strings is controlled by collations. Microsoft SQL Server 2005 supports setting collations at four levels: the server instance, a database, a table column, and an expression. A SQL Server collation specifies three properties:
The code page used to store non-Unicode character data.
Note
Microsoft SQL Server 6.5 and earlier used code pages but did not store them in collations.
The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.
The sort order to use for non-Unicode character data types (char, varchar, and text).
For an introduction to code pages and collations, see Collation Options and International Support. For in-depth information, see Working with Collations.
Using a Column-Level Collation
In SQL Server 7.0 and later, you can control the conversion of data by specifying the collation in which the data is stored in the field of the target data file, for a bulk-export, or the column of the target table, for a bulk-import operation. Any required translations between the data file collation and the collations of the table columns in the database are performed internally by the import or export operation.
Note
To control the conversion of data from one code page to another in Microsoft SQL Server 6.0 or Microsoft SQL Server 6.5, you need to specify a code page by using a code-page option (bcp-C switch or the Transact-SQL CODEPAGE option) in the command. For more information, see Using a Code Page, later in this topic. For SQL Server 7.0 and later, specifying a code page is supported, but it is preferable to specify the collation in a format file.
Collations always apply to SQLCHAR columns on both bulk-import and bulk-export operations. Column collation specifications are always ignored for columns that do not have either SQLCHAR or SQLNCHAR specified as their host data type. Collations are used to determine the sort order for both SQLCHAR and SQLNCHAR columns during bulk-import operations for which the columns are referenced in the ORDER hint.
To specify the collation, you must use a format file. Both types of format files support specifying column-level collations. For information on using collations in non-XML format files, see the following section, Specifying Column-Level Collations in Non-XML Format Files. For information on using collations in XML format files, see Understanding XML Format Files.
Specifying Column-Level Collations in Non-XML Format Files
The final column in a non-XML format file (the column in ordinal position 8) contains a collation specification that defines how the data for that column is stored in the data file. The collation-column options are as follows:
Option
Description
name
Specifies the name of the collation used to store the data in the data file. For a list of the SQL collation names, see SQL Collation Name (Transact-SQL).
RAW
Specifies that the data is stored in the code page that is specified in a code-page option in the command or the bcp_control BCPFILECP hint. If none of these is specified, the collation of the data file is that of the OEM code page of the client computer.
Note:
For more information about the code-page options, see "Using a Code Page," later in this topic. For more information, see about the BCPFILECP hint, see bcp_control.
""
Has the same meaning as RAW.
Examples
The following example shows a non-XML format file with column collations specified for fields 2 and 3. The collation information is displayed in bold.
9.0
4
1 SQLCHAR 0 7 "\t" 1 DepartmentID ""
2 SQLCHAR 0 100 "\t" 2 Name SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "\t" 3 GroupName SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 24 "\r\n" 4 ModifiedDate ""
Note
The preceding format file is for the HumanResources.Department table in the AdventureWorks sample database, which has four columns: DepartmentID, Name, GroupName, and ModifiedDate. For more information about the table, see Department Table (AdventureWorks).
For more information on non-XML format files, see Understanding Non-XML Format Files.
Using a Code Page
To control the conversion of data from one code page to another in SQL Server 6.0 or SQL Server 6.5, you need to specify a code page in the command.
Note
To control the conversion of data in SQL Server 7.0 and later, Microsoft recommends that you use a format file to specify column-level collations for a bulk-export or bulk-import operation. For more information, see. "Using a Column-level Collation," earlier section.
The syntax for specifying a code page is as follows:
- To specify a code page in a bcp command, use the -C switch:
-C { ACP | OEM | RAW | code_page } - To specify a code page in a BULK INSERT or INSERT ... SELECT * FROM OPENROWSET(BULK...) statement, use the CODEPAGE option:
CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' }
Both the -C switch and the CODEPAGE option accept the following values for the code page.
Code Page Value | Description |
---|---|
ACP |
During an import operation, specifies that data fields of the char, varchar, or text data type are converted from the ANSI/Windows code page (ISO 1252) to the SQL Server code page. During an export operation, specifies that bcp converts these columns from the SQL Server code page to the ANSI/Windows code page. |
OEM (default) |
During an import operation, specifies that data fields of the char, varchar, or text data type are converted from the system OEM code page to the SQL Server code page. During an export operation, specifies that bcp converts these columns from the SQL Server code page to the system OEM code page. |
RAW |
No conversion from one code page to another occurs. This is the fastest option. |
code_page |
A specific code page number (for example, |
Examples
The following examples use the HumanResources.myTeam table in the AdventureWorks database. Before you can run the examples, you must create this table. For information about the table and how to create it, see Creating the HumanResources.myTeam Table.
Note
Before testing the following example, you should delete the existing rows from the myTeam table to avoid primary-key conflicts.
A. Using a Code Page with bcp
This example uses bcp to bulk export the data from the HumanResources.myTeam
table in the AdventureWorks
database to the myTeam850.txt
data file, use the code page 850
. At the Windows command prompt, enter:
bcp AdventureWorks.HumanResources.myTeam out myTeam850.txt -c -C850 -T
B. Using a Code Page with BULK INSERT
This example uses BULK INSERT to bulk import myTeam850.txt
, created in the preceding bcp example, into the AdventureWorks``HumanResources.myTeam
table.
From a query tool, such as SQL Server Management Studio Query Editor, execute:
USE AdventureWorks
GO
DELETE HumanResources.myTeam
GO
BULK INSERT HumanResources.myTeam
FROM 'C:\myTeam850.txt'
WITH (CODEPAGE = 850);
GO
Bulk Exporting Data Between Different Collations
In a bulk-export operation, the collation specification controls the code page that is used to store character data in the data file. It applies to:
- All columns in a character-format data file.
- Any column in a native mode file where SQLCHAR is specified as the host file data type.
- SQLCHAR characters whose values are greater than 127 or less than 32.
Collations are applied to characters whose values are 32 through 127, but all code pages map the same characters to the values from 32 to 127, so applying different collations has no effect.
The rules for determining which collation or code page is used on a bulk export operation are as follows:
If a column collation is specified either in a format file or by using the ODBC bcp_setcolfmt function (bcp_setcolfmt), the character data is stored using the ANSI code page associated with the collation.
If a column collation was not specified, but a code page was specified by either a code-page option in the command or the bcp_control BCPFILECP hint, all SQLCHAR data from columns having no column collation specification is stored using the specified code page.
Note
For information about the code-page options, see "Using a Code Page," earlier in this topic. For more information, see about the BCPFILECP hint, see bcp_control.
If you do not specify either a collation or a code page, SQLCHAR data is stored using the OEM code page of the client computer.
Note
No information about the collation/code page is stored in a data file.
Bulk Importing Data Between Different Collations
For a bulk-import operation, code-page interpretation applies to columns stored as character-format, SQLCHAR, data in a data file. In a character-format data file, all columns are stored as SQLCHAR. Because no information about the collation/code page is stored in a data file, for a bulk-import operation, you must supply the information about the collation/code page of the data fields.
The rules for determining which collation or code page is used on a bulk-import operation are as follows:
If you specify a column collation in a format file or by using the ODBC bcp_setcolfmt function (bcp_setcolfmt), the SQLCHAR data in a data file is interpreted by using the ANSI code page that is associated with the specified column collation. Ensure that your collation specifications match the collations in the data file.
If a column collation is not specified, but a code page is specified by using the code-page option in the command or the bcp_control BCPFILECP hint, the SQLCHAR data is interpreted by using the specified code page.
Note
For information about the code-page options, see Using a Code Page, earlier in this topic. For more information, see about the BCPFILECP hint, see bcp_control.
If you do not specify any collations or a code page, then data in SQLCHAR columns is interpreted by using the OEM code page of the client computer.
On a bulk-import operation, the collation specification controls the following:
- How the operation attempts to interpret the code page of SQLCHAR columns in the data file.
- How bcp or BULK INSERT applies the ORDER hint, if specified.
If you use the ORDER hint, the bulk-import operation uses collations to properly interpret the ORDER hint. This hint applies to both SQLCHAR and SQLNCHAR columns. The data in the columns referenced by the ORDER hint must be in the sequence that is defined by the collation that is mapped to those columns. For more information, see Controlling the Sort Order When Bulk Importing Data.
See Also
Concepts
Using Unicode Character Format to Import or Export Data
Using Unicode Native Format to Import or Export Data
Working with Collations
Other Resources
bcp_control
bcp Utility
bcp_setcolfmt
BULK INSERT (Transact-SQL)
Format Files for Importing or Exporting Data
International Considerations for Databases and Database Engine Applications
OPENROWSET (Transact-SQL)