MultiByteToWideChar function (stringapiset.h)

Article
02/05/2024

Maps a character string to a UTF-16 (wide character) string. The character string is not necessarily from a multibyte character set.

Caution Using the MultiByteToWideChar function incorrectly can compromise the security of your application. Calling this function can easily cause a buffer overrun because the size of the input buffer indicated by lpMultiByteStr equals the number of bytes in the string, while the size of the output buffer indicated by lpWideCharStr equals the number of characters. To avoid a buffer overrun, your application must specify a buffer size appropriate for the data type the buffer receives. For more information, see Security Considerations: International Features.

Note The ANSI code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the most consistent results, applications should use Unicode, such as UTF-8 or UTF-16, instead of a specific code page, unless legacy standards or data formats prevent the use of Unicode. If using Unicode is not possible, applications should tag the data stream with the appropriate encoding name when protocols allow it. HTML and XML files allow tagging, but text files do not.

Syntax

int MultiByteToWideChar(
  [in]            UINT                              CodePage,
  [in]            DWORD                             dwFlags,
  [in]            _In_NLS_string_(cbMultiByte)LPCCH lpMultiByteStr,
  [in]            int                               cbMultiByte,
  [out, optional] LPWSTR                            lpWideCharStr,
  [in]            int                               cchWideChar
);

Parameters

[in] CodePage

Code page to use in performing the conversion. This parameter can be set to the value of any code page that is installed or available in the operating system. For a list of code pages, see Code Page Identifiers. Your application can also specify one of the values shown in the following table.

Value	Meaning
CP_ACP	The system default Windows ANSI code page. Note This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should use UTF-16 or UTF-8 if possible.
CP_MACCP	The current system Macintosh code page. Note This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should use UTF-16 or UTF-8 if possible. Note This value is used primarily in legacy code and should not generally be needed since modern Macintosh computers use Unicode for encoding.
CP_OEMCP	The current system OEM code page. Note This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should use UTF-16 or UTF-8 if possible.
CP_SYMBOL	Symbol code page (42).
CP_THREAD_ACP	The Windows ANSI code page for the current thread. Note This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should use UTF-16 or UTF-8 if possible.
CP_UTF7	UTF-7. Use this value only when forced by a 7-bit transport mechanism. Use of UTF-8 is preferred.
CP_UTF8	UTF-8.

[in] dwFlags

Flags indicating the conversion type. The application can specify a combination of the following values, with MB_PRECOMPOSED being the default. MB_PRECOMPOSED and MB_COMPOSITE are mutually exclusive. MB_USEGLYPHCHARS and MB_ERR_INVALID_CHARS can be set regardless of the state of the other flags.

Value	Meaning
MB_COMPOSITE	Always use decomposed characters, that is, characters in which a base character and one or more nonspacing characters each have distinct code point values. For example, Ä is represented by A + ¨: LATIN CAPITAL LETTER A (U+0041) + COMBINING DIAERESIS (U+0308). Note that this flag cannot be used with MB_PRECOMPOSED.
MB_ERR_INVALID_CHARS	Fail if an invalid input character is encountered. Starting with Windows Vista, the function does not drop illegal code points if the application does not set this flag, but instead replaces illegal sequences with U+FFFD (encoded as appropriate for the specified codepage). Windows 2000 with SP4 and later, Windows XP: If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION.

Value

Meaning

MB_COMPOSITE

Always use decomposed characters, that is, characters in which a base character and one or more nonspacing characters each have distinct code point values. For example, Ä is represented by A + ¨: LATIN CAPITAL LETTER A (U+0041) + COMBINING DIAERESIS (U+0308). Note that this flag cannot be used with MB_PRECOMPOSED.

MB_ERR_INVALID_CHARS

Fail if an invalid input character is encountered.

Starting with Windows Vista, the function does not drop illegal code points if the application does not set this flag, but instead replaces illegal sequences with U+FFFD (encoded as appropriate for the specified codepage).

Windows 2000 with SP4 and later, Windows XP: If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION.

MB_PRECOMPOSED

Default; do not use with MB_COMPOSITE. Always use precomposed characters, that is, characters having a single character value for a base or nonspacing character combination. For example, in the character è, the e is the base character and the accent grave mark is the nonspacing character. If a single Unicode code point is defined for a character, the application should use it instead of a separate base character and a nonspacing character. For example, Ä is represented by the single Unicode code point LATIN CAPITAL LETTER A WITH DIAERESIS (U+00C4).

MB_USEGLYPHCHARS

Use glyph characters instead of control characters.

For the code pages listed below, dwFlags must be set to 0. Otherwise, the function fails with ERROR_INVALID_FLAGS.

50220
50221
50222
50225
50227
50229
57002 through 57011
65000 (UTF-7)
42 (Symbol)

Note

For UTF-8 or code page 54936 (GB18030, starting with Windows Vista), dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.

[in] lpMultiByteStr

Pointer to the character string to convert.

[in] cbMultiByte

Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting Unicode string has a terminating null character, and the length returned by the function includes this character.

If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.

[out, optional] lpWideCharStr

Pointer to a buffer that receives the converted string.

[in] cchWideChar

Size, in characters, of the buffer indicated by lpWideCharStr. If this value is 0, the function returns the required buffer size, in characters, including any terminating null character, and makes no use of the lpWideCharStr buffer.

Return value

Returns the number of characters written to the buffer indicated by lpWideCharStr if successful. If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr. Also see dwFlags for info about how the MB_ERR_INVALID_CHARS flag affects the return value when invalid sequences are input.

The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError, which can return one of the following error codes:

ERROR_INSUFFICIENT_BUFFER: A supplied buffer size was not large enough, or it was incorrectly set to NULL.
ERROR_INVALID_FLAGS: The values supplied for flags were not valid.
ERROR_INVALID_PARAMETER: Any of the parameter values was invalid.
ERROR_NO_UNICODE_TRANSLATION: Invalid Unicode was found in a string.

Remarks

The default behavior of this function is to translate to a precomposed form of the input character string. If a precomposed form does not exist, the function attempts to translate to a composite form.

The use of the MB_PRECOMPOSED flag has very little effect on most code pages because most input data is composed already. Consider calling NormalizeString after converting with MultiByteToWideChar. NormalizeString provides more accurate, standard, and consistent data, and can also be faster. Note that for the NORM_FORM enumeration being passed to NormalizeString, NormalizationC corresponds to MB_PRECOMPOSED and NormalizationD corresponds to MB_COMPOSITE.

As mentioned in the caution above, the output buffer can easily be overrun if this function is not first called with cchWideChar set to 0 in order to obtain the required size. If the MB_COMPOSITE flag is used, the output can be three or more characters long for each input character.

The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns the value ERROR_INVALID_PARAMETER.

MultiByteToWideChar does not null-terminate an output string if the input string length is explicitly specified without a terminating null character. To null-terminate an output string for this function, the application should pass in -1 or explicitly count the terminating null character for the input string.

The function fails if MB_ERR_INVALID_CHARS is set and an invalid character is encountered in the source string. An invalid character is one of the following:

A character that is not the default character in the source string, but translates to the default character when MB_ERR_INVALID_CHARS is not set.
For DBCS strings, a character that has a lead byte but no valid trail byte.

Starting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 strings will behave the same way as on earlier Windows operating systems.

Windows XP: To prevent the security problem of the non-shortest-form versions of UTF-8 characters, MultiByteToWideChar deletes these characters.

Starting with Windows 8: MultiByteToWideChar is declared in Stringapiset.h. Before Windows 8, it was declared in Winnls.h.

Code example

catch (std::exception e)
{
    // Save in-memory logging buffer to a log file on error.

    ::std::wstring wideWhat;
    if (e.what() != nullptr)
    {
        int convertResult = MultiByteToWideChar(CP_UTF8, 0, e.what(), (int)strlen(e.what()), NULL, 0);
        if (convertResult <= 0)
        {
            wideWhat = L"Exception occurred: Failure to convert its message text using MultiByteToWideChar: convertResult=";
            wideWhat += convertResult.ToString()->Data();
            wideWhat += L"  GetLastError()=";
            wideWhat += GetLastError().ToString()->Data();
        }
        else
        {
            wideWhat.resize(convertResult + 10);
            convertResult = MultiByteToWideChar(CP_UTF8, 0, e.what(), (int)strlen(e.what()), &wideWhat[0], (int)wideWhat.size());
            if (convertResult <= 0)
            {
                wideWhat = L"Exception occurred: Failure to convert its message text using MultiByteToWideChar: convertResult=";
                wideWhat += convertResult.ToString()->Data();
                wideWhat += L"  GetLastError()=";
                wideWhat += GetLastError().ToString()->Data();
            }
            else
            {
                wideWhat.insert(0, L"Exception occurred: ");
            }
        }
    }
    else
    {
        wideWhat = L"Exception occurred: Unknown.";
    }

    Platform::String^ errorMessage = ref new Platform::String(wideWhat.c_str());
    // The session added the channel at level Warning. Log the message at
    // level Error which is above (more critical than) Warning, which
    // means it will actually get logged.
    _channel->LogMessage(errorMessage, LoggingLevel::Error);
    SaveLogInMemoryToFileAsync().then([=](StorageFile^ logFile) {
        _logFileGeneratedCount++;
        StatusChanged(this, ref new LoggingScenarioEventArgs(LoggingScenarioEventType::LogFileGenerated, logFile->Path->Data()));
    }).wait();
}

Example from Windows Universal Samples on GitHub.

Requirements

Requirement	Value
Minimum supported client	Windows 2000 Professional [desktop apps \| UWP apps]
Minimum supported server	Windows 2000 Server [desktop apps \| UWP apps]
Target Platform	Windows
Header	stringapiset.h (include Windows.h)
Library	Kernel32.lib
DLL	Kernel32.dll

Share via