Comparing and Sorting Data for a Specific Culture
Alphabetical order and conventions for sequencing items vary from culture to culture. For example, sort order can be case-sensitive or case-insensitive. It can be phonetically based or based on the appearance of the character. In East Asian languages, sorts are ordered by the stroke and radical of ideographs. Sorts can also vary depending on the fundamental order the language and culture use for the alphabet. For example, the Swedish language has an "Æ" character that it sorts after "Z" in the alphabet. The German language also has this character, but sorts it like "ae", after "A" in the alphabet. A world-ready application must be able to compare and sort data on a per-culture basis to support culture-specific and language-specific sorting conventions.
Note In some scenarios culture-sensitive behavior is not desirable. For more information about when and how to perform culture-insensitive operations, see Culture-Insensitive String Operations.
Comparing Strings
The CompareInfo class provides a set of methods you can use to perform culture-sensitive string comparisons. The CultureInfo class has a CompareInfo property that is an instance of this class. This property defines how to compare and sort strings for a specific culture. The static String.Compare method uses the information in the CultureInfo.CompareInfo property to compare two strings. The String.Compare method returns a negative integer if the first string precedes the second string in the sort order, zero if the two strings are equal, and a positive integer if the first string follows the second string in the sort order.
The following example illustrates how two strings can be evaluated differently by the String.Compare method, depending upon the culture used to perform the comparison. First, the Thread.CurrentCulture is set to da-DK for the Danish (Denmark) culture, and the strings "Apple" and "Æble" are compared. The Danish language treats the character "Æ" as an individual letter, sorting it after "Z" in the alphabet. Therefore, the string "Æble" is greater than "Apple" for the Danish culture. Next, the Thread.CurrentCulture is set to en-US for the English (United States) culture, and the strings "Apple" and "Æble" are compared again. This time, the string "Æble" is determined to be less than "Apple". The English language treats the character "Æ" as a special symbol, sorting it before the letter "A" in the alphabet.
Imports System.Globalization
Imports System.Threading
Public Class TestClass
Public Shared Sub Main()
Dim str1 As String = "Apple"
Dim str2 As String = "Æble"
' Set the CurrentCulture to Danish in Denmark.
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("da-DK")
Dim result1 As Integer = String.Compare(str1, str2)
Console.WriteLine("When the CurrentCulture is ""da-DK""," + _
ControlChars.Newline + " the result of comparing {0}" + _
" with {1} is: {2}", str1, str2, result1)
' Set the CurrentCulture to English in the U.S.
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US")
Dim result2 As Integer = [String].Compare(str1, str2)
Console.WriteLine("When the CurrentCulture is ""en-US""," + _
ControlChars.Newline + " the result of comparing {0}" + _
" with {1} is: {2}", str1, str2, result2)
End Sub
End Class
' The example display sthe following output:
' When the CurrentCulture is "da-DK",
' the result of comparing Apple with Æble is: -1
' When the CurrentCulture is "en-US",
' the result of comparing Apple with Æble is: 1
using System;
using System.Globalization;
using System.Threading;
public class CompareStringSample
{
public static void Main()
{
string str1 = "Apple";
string str2 = "Æble";
// Set the CurrentCulture to Danish in Denmark.
Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
// Compare the two strings.
int result1 = String.Compare(str1, str2);
Console.WriteLine("When the CurrentCulture is \"da-DK\",\n" +
" the result of comparing {0} with {1} is: {2}",
str1, str2, result1);
// Set the CurrentCulture to English in the U.S.
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
// Compare the two strings.
int result2 = String.Compare(str1, str2);
Console.WriteLine("When the CurrentCulture is \"en-US\",\n" +
" the result of comparing {0} with {1} is: {2}",
str1, str2, result2);
}
}
// The example displays the following output:
// When the CurrentCulture is "da-DK",
// the result of comparing Apple with Æble is: -1
// When the CurrentCulture is "en-US",
// the result of comparing Apple with Æble is: 1
For more information on comparing strings, see Comparing Strings.
Using Alternate Sort Orders
Some cultures support more than one sort order. For example, the culture Chinese (PRC), with the name zh-CN, supports a sort by pronunciation (default) and a sort by stroke count. When your application creates a CultureInfo object using a culture name, for example, zh-CN, the default sort order is used. To specify the alternate sort order, the application should create a CultureInfo object using the identifier for the alternate sort order. Then, the application should obtain a CompareInfo object from the CultureInfo.CompareInfo property to use in string comparisons. Alternatively, your application can create a CompareInfo object directly by calling the static CompareInfo.GetCompareInfo(Int32) method and specifying the identifier for the alternate sort order.
The following table lists the cultures that support alternate sort orders and the identifiers for the default and alternate sort orders.
Culture name |
Culture |
Default sort name and identifier |
Alternate sort name and identifier |
---|---|---|---|
es-ES |
Spanish (Spain) |
International: 0x00000C0A |
Traditional: 0x0000040A |
zh-TW |
Chinese (Taiwan) |
Stroke Count: 0x00000404 |
Bopomofo: 0x00030404 |
zh-CN |
Chinese (PRC) |
Pronunciation: 0x00000804 |
Stroke Count: 0x00020804 |
zh-HK |
Chinese (Hong Kong SAR) |
Stroke Count: 0x00000c04 |
Stroke Count: 0x00020c04 |
zh-SG |
Chinese (Singapore) |
Pronunciation: 0x00001004 |
Stroke Count: 0x00021004 |
zh-MO |
Chinese (Macao SAR) |
Pronunciation: 0x00001404 |
Stroke Count: 0x00021404 |
ja-JP |
Japanese (Japan) |
Default: 0x00000411 |
Unicode: 0x00010411 |
ko-KR |
Korean (Korea) |
Default: 0x00000412 |
Korean Xwansung - Unicode: 0x00010412 |
de-DE |
German (Germany) |
Dictionary: 0x00000407 |
Phone Book Sort DIN: 0x00010407 |
hu-HU |
Hungarian (Hungary) |
Default: 0x0000040e |
Technical Sort: 0x0001040e |
ka-GE |
Georgian (Georgia) |
Traditional: 0x00000437 |
Modern Sort: 0x00010437 |
Searching Strings
Your application can use the overloaded CompareInfo.IndexOf method to retrieve the zero-based index of a character or substring within a specified string. The method retrieves a negative integer if the character or substring is not found in the specified string. When searching for a specified character using CompareInfo.IndexOf, the application should take into account that the method overloads that accept a CompareOptions parameter perform the comparison differently from the method overloads that do not accept this parameter. The method overloads that search for a character type and do not take a CompareOptions parameter perform a culture-sensitive search. This if a Unicode value represents a precomposed character, such as the ligature "Æ" (\u00C6), it might be considered equivalent to any occurrence of its components in the correct sequence, such as "AE" (\u0041\u0045), depending on the culture. To perform an ordinal (culture-insensitive) search, for which a character type is considered equivalent to another character type only if the Unicode values are the same, the application should use one of the CompareInfo.IndexOf overloads that take a CompareOptions parameter and set the parameter to the Ordinal value.
Your applications can also use overloads of the String.IndexOf method that search for a character to perform an ordinal (culture-insensitive) search. Note that the overloads of this method that search for a string perform a culture-sensitive search.
The following example illustrates the difference in the results retrieved by the IndexOf method depending on culture. A CultureInfo object is created for da-DK, for the culture Danish (Denmark). Next, overloads of the CompareInfo.IndexOf method are used to search for the character "Æ" in the strings "Æble" and "aeble." Note that, for da-DK, the CompareInfo.IndexOf method that takes a CompareOptions parameter set to Ordinal and the same method that does not take this parameter retrieve the same thing. The character "Æ" is only considered equivalent to the Unicode code value \u00E6.
using System;
using System.Globalization;
using System.Threading;
public class Example
{
public static void Main()
{
string str1 = "æble";
string str2 = "aeble";
char find = 'æ';
// Create CultureInfo objects representing the Danish (Denmark)
// and English (United States) cultures.
CultureInfo[] cultures = { CultureInfo.CreateSpecificCulture("da-DK"),
CultureInfo.CreateSpecificCulture("en-US") };
foreach (var ci in cultures) {
Thread.CurrentThread.CurrentCulture = ci;
int result1 = ci.CompareInfo.IndexOf(str1, find);
int result2 = ci.CompareInfo.IndexOf(str2, find);
int result3 = ci.CompareInfo.IndexOf(str1, find,
CompareOptions.Ordinal);
int result4 = ci.CompareInfo.IndexOf(str2, find,
CompareOptions.Ordinal);
Console.WriteLine("\nThe current culture is {0}",
CultureInfo.CurrentCulture.Name);
Console.WriteLine("\n CompareInfo.IndexOf(string, char) method:");
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str1, result1);
Console.WriteLine("\n CompareInfo.IndexOf(string, char) method:");
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str2, result2);
Console.WriteLine("\n CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method");
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str1, result3);
Console.WriteLine("\n CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method");
Console.WriteLine(" Position of {0} in the string {1}: {2}",
find, str2, result4);
Console.WriteLine();
}
}
}
// The example displays the following output
// The current culture is da-DK
//
// CompareInfo.IndexOf(string, char) method:
// Position of æ in the string æble: 0
//
// CompareInfo.IndexOf(string, char) method:
// Position of æ in the string aeble: -1
//
// CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
// Position of æ in the string æble: 0
//
// CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
// Position of æ in the string aeble: -1
//
//
// The current culture is en-US
//
// CompareInfo.IndexOf(string, char) method:
// Position of æ in the string æble: 0
//
// CompareInfo.IndexOf(string, char) method:
// Position of æ in the string aeble: 0
//
// CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
// Position of æ in the string æble: 0
//
// CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method
// Position of æ in the string aeble: -1
If your application replaces CultureInfo ci = new CultureInfo ("da-DK") with CultureInfo ci = new CultureInfo ("en-US"), the CompareInfo.IndexOf method with the CompareOptions parameter set to Ordinal and the same method without this parameter retrieve different results. The culture-sensitive comparison performed by the IndexOf method evaluates the character "Æ" as equivalent to its components "ae". The ordinal (culture-insensitive) comparison performed by the IndexOf method does not retrieve character "Æ" equivalent to "ae" because their Unicode code values do not match.
When you recompile and execute the code for en-US, representing English (United States), the following output is produced:
The CurrentCulture property is set to English (United States)
Using CompareInfo.IndexOf(string, char) method
the result of searching for Æ in the string Æble is: 0
Using CompareInfo.IndexOf(string, char) method
the result of searching for Æ in the string aeble is: 0
Using CompareInfo.IndexOf(string, char, CompareOptions) method
the result of searching for Æ in the string Æble is: 0
Using CompareInfo.IndexOf(string, char, CompareOptions) method
the result of searching for Æ in the string aeble is: -1
Sorting Strings
The Array class provides an overloaded Sort method that allows your application to sort arrays based on the CultureInfo.CurrentCulture property. In the following example, an array of three strings is created. First, the CultureInfo.CurrentCulture property is set to en-US, and the Array.Sort method is called. The resulting sort order is based on sorting conventions for the English (United States) culture. Next, the Thread.CurrentCulture property is set to da-DK, and the Array.Sort method is called again. Notice how the resulting sort order differs from the en-US results because the sorting conventions for da-DK are used.
Imports System.Threading
Imports System.IO
Imports System.Globalization
Public Class TextToFile
Public Shared Sub Main()
Dim stringArray() As String = { "Apple", "Æble", "Zebra" }
' Display the values of the array.
Console.WriteLine("The Array initially contains the following strings:")
PrintIndexAndValues(stringArray)
' Set the current culture to "en-US".
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US")
' Sort the values of the array.
Array.Sort(stringArray)
' Display the values of the array.
Console.WriteLine("After sorting for the ""en-US"" culture:")
PrintIndexAndValues(stringArray)
' Set the current culture to "da-DK".
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("da-DK")
' Sort the values of the array.
Array.Sort(stringArray)
' Display the values of the array.
Console.WriteLine("After sorting for the ""da-DK"" culture:")
PrintIndexAndValues(stringArray)
End Sub
Public Shared Sub PrintIndexAndValues(values As String())
For Each value In values
Console.WriteLine(value)
Next
Console.WriteLine()
End Sub
End Class
' The example displays the following output:
' The array initially contains the following strings:
' Apple
' Æble
' Zebra
'
' After sorting for the "en-US" culture:
' Æble
' Apple
' Zebra
'
' After sorting for the "da-DK" culture:
' Apple
' Zebra
' Æble
using System;
using System.Threading;
using System.Globalization;
public class ArraySort
{
public static void Main()
{
string[] stringArray = { "Apple", "Æble", "Zebra" };
// Display the values of the array.
Console.WriteLine("The array initially contains the following strings:");
PrintIndexAndValues(stringArray);
// Sets the current culture to "en-US".
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");
// Sort the values of the array.
Array.Sort(stringArray);
// Display the values of the array.
Console.WriteLine( "After sorting for the \"en-US\" culture:");
PrintIndexAndValues(stringArray);
// Set the current culture to "da-DK".
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("da-DK");
// Sort the values of the array.
Array.Sort(stringArray);
// Display the values of the array.
Console.WriteLine( "After sorting for the \"da-DK\" culture:");
PrintIndexAndValues(stringArray);
}
public static void PrintIndexAndValues(string[] values)
{
foreach (var value in values)
Console.WriteLine(value);
Console.WriteLine();
}
}
// The example displays the following output:
// The Array initially contains the following strings:
// Apple
// Æble
// Zebra
//
// After sorting for the "en-US" culture:
// Æble
// Apple
// Zebra
//
// After sorting for the culture "da-DK":
// Apple
// Zebra
// Æble
Using Sort Keys
Sort keys are used to support culturally sensitive sorts. Based on the Unicode Standard, each character in a string is given several categories of sort weights, including alphabetic, case, and diacritic weights. A sort key serves as the repository of these weights for a particular string. For example, a sort key might contain a string of alphabetic weights, followed by a string of case weights, and so on. For additional information on sort key concepts, see The Unicode Standard at the Unicode home page.
In the .NET Framework, the SortKey class maps strings to their sort keys, and vice versa. Your applications can use the CompareInfo.GetSortKey method to create a sort key for a string that you specify. The resulting sort key for a specified string is a sequence of bytes that can differ depending upon the CurrentCulture and the CompareOptions value specified. For example, if the application specifies the value IgnoreCase when creating a sort key, a string comparison operation using the sort key ignores case.
After creating a sort key for a string, the application can pass it as a parameter to methods provided by the SortKey class. The Compare method allows comparison of sort keys. Because this method performs a simple byte-by-byte comparison, using it is much faster than using String.Compare. Applications that are sorting-intensive can improve performance by generating and storing sort keys for all the strings that are used. When a sort or comparison operation is required, the application can use the sort keys instead of the strings.
The following code example creates sort keys for two strings when the CurrentCulture is set to da-DK. It compares the two strings using the SortKey.Compare method and displays the results. The method returns a negative integer if string1 is less than string2, zero (0) if string1 and string2 are equal, and a positive integer if string1 is greater than string2. Next, the CurrentCulture property is set to en-US and sort keys are created for the same strings. The sort keys for the strings are compared and the results are displayed. Notice that the sort results differ based on the setting for CurrentCulture. Although the results of the following code example are identical to the results of comparing these strings in the Comparing Strings example earlier in this topic, using the SortKey.Compare method is faster than using the String.Compare method.
Imports System.Threading
Imports System.Globalization
Public Class SortKeySample
Public Shared Sub Main()
Dim str1 As [String] = "Apple"
Dim str2 As [String] = "Æble"
' Set the CurrentCulture to "da-DK".
Dim dk As CultureInfo = CultureInfo.CreateSpecificCulture("da-DK")
Thread.CurrentThread.CurrentCulture = dk
' Create a culturally sensitive sort key for str1.
Dim sc1 As SortKey = dk.CompareInfo.GetSortKey(str1)
' Create a culturally sensitive sort key for str2.
Dim sc2 As SortKey = dk.CompareInfo.GetSortKey(str2)
' Compare the two sort keys and display the results.
Dim result1 As Integer = SortKey.Compare(sc1, sc2)
Console.WriteLine("Current culture: {0}", _
CultureInfo.CurrentCulture.Name)
Console.WriteLine("Result of comparing {0} with {1}: {2}", _
str1, str2, result1)
Console.WriteLine()
' Set the CurrentCulture to "en-US".
Dim enus As CultureInfo = CultureInfo.CreateSpecificCulture("en-US")
Thread.CurrentThread.CurrentCulture = enus
' Create a culturally sensitive sort key for str1.
Dim sc3 As SortKey = enus.CompareInfo.GetSortKey(str1)
' Create a culturally sensitive sort key for str1.
Dim sc4 As SortKey = enus.CompareInfo.GetSortKey(str2)
' Compare the two sort keys and display the results.
Dim result2 As Integer = SortKey.Compare(sc3, sc4)
Console.WriteLine("Current culture: {0}", _
CultureInfo.CurrentCulture.Name)
Console.WriteLine("Result of comparing {0} with {1}: {2}", _
str1, str2, result2)
End Sub
End Class
' The example displays the following output:
' Current culture: da-DK
' Result of comparing Apple with Æble: -1
'
' Current culture: en-US
' Result of comparing Apple with Æble: 1
using System;
using System.Threading;
using System.Globalization;
public class SortKeySample
{
public static void Main(String[] args)
{
String str1 = "Apple";
String str2 = "Æble";
// Set the current culture to "da-DK".
CultureInfo dk = CultureInfo.CreateSpecificCulture("da-DK");
Thread.CurrentThread.CurrentCulture = dk;
// Create a culturally sensitive sort key for str1.
SortKey sc1 = dk.CompareInfo.GetSortKey(str1);
// Create a culturally sensitive sort key for str2.
SortKey sc2 = dk.CompareInfo.GetSortKey(str2);
// Compare the two sort keys and display the results.
int result1 = SortKey.Compare(sc1, sc2);
Console.WriteLine("Current culture: {0}",
CultureInfo.CurrentCulture.Name);
Console.WriteLine("Result of comparing {0} with {1}: {2}\n",
str1, str2, result1);
// Set the current culture to "en-US".
CultureInfo enus = CultureInfo.CreateSpecificCulture("en-US");
Thread.CurrentThread.CurrentCulture = enus ;
// Create a culturally sensitive sort key for str1.
SortKey sc3 = enus.CompareInfo.GetSortKey(str1);
// Create a culturally sensitive sort key for str1.
SortKey sc4 = enus.CompareInfo.GetSortKey(str2);
// Compare the two sort keys and display the results.
int result2 = SortKey.Compare(sc3, sc4);
Console.WriteLine("Current culture: {0}",
CultureInfo.CurrentCulture.Name);
Console.WriteLine("Result of comparing {0} with {1}: {2}\n",
str1, str2, result2);
}
}
// The example displays the following output:
// Current culture: da-DK
// Result of comparing Apple with Æble: -1
//
// Current culture: en-US
// Result of comparing Apple with Æble: 1
Normalization
Your application can normalize strings to either uppercase or lowercase before sorting. Rules for string sorting and casing are language-specific. For example, even within Latin script-based languages, there are different composition and sorting rules. There are only a few languages (including English) for which the sort order matches the order of the code points, for example, A [65] comes before B [66].
Your applications should not rely on code points to perform accurate sorting and string comparisons. In addition, the .NET Framework does not enforce or guarantee a specific form of normalization. You are responsible for performing the appropriate normalization in the applications that you develop.
For more information on string normalization, see Normalization and Sorting.
See Also
Concepts
Culture-Insensitive String Operations