Pay Close Attention - String Handling

I need to make a detour for a few moments, and discuss how to handle strings in COM code. If you are familiar with how Unicode and ANSI strings work, and know how to convert between the two, then you can skip this section. Otherwise, read on.

Whenever a COM method returns a string, that string will be in Unicode. (Well, all methods that are written to the COM spec, that is!) Unicode is a character encoding scheme, like ASCII, only all characters are 2 bytes long. If you want to get the string into a more manageable state, you should convert it to a TCHAR string.

TCHAR and the _t functions (for example, _tcscpy()) are designed to let you handle Unicode and ANSI strings with the same source code. In most cases, you'll be writing code that uses ANSI strings and the ANSI Windows APIs, so for the rest of this article, I will refer to chars instead of TCHARs, just for simplicity. You should definitely read up on the TCHAR types, though, to be aware of them in case you ever come across them in code written by others.

When you get a Unicode string back from a COM method, you can convert it to a char string in one of several ways:

  1. Call the WideCharToMultiByte() API.
  2. Call the CRT function wcstombs().
  3. Use the CString constructor or assignment operator (MFC only).
  4. Use an ATL string conversion macro.

    WideCharToMultiByte()

    You can convert a Unicode string to an ANSI string with the WideCharToMultiByte() API. This API's prototype is:

    int WideCharToMultiByte (

    UINT CodePage,

    DWORD dwFlags,

    LPCWSTR lpWideCharStr,

    int cchWideChar,

    LPSTR lpMultiByteStr,

    int cbMultiByte,

    LPCSTR lpDefaultChar,

    LPBOOL lpUsedDefaultChar );

    The parameters are:

    CodePage

    The code page to convert the Unicode characters into. You can pass CP_ACP to use the current ANSI code page. Code pages are sets of 256 characters. Characters 0-127 are always identical to the ASCII encoding. Characters 128-255 differ, and can contain graphics or letters with diacritics. Each language or region has its own code page, so it's important to use the right code page to get proper display of accented characters.

    dwFlags

    dwFlags determine how Windows deals with "composite" Unicode characters, which are a letter followed by a diacritic. An example of a composite character is è. If this character is in the code page specified in CodePage, then nothing special happens. However, if it is not in the code page, Windows has to convert it to something else.
    Passing WC_COMPOSITECHECK makes the API check for non-mapping composite characters. PassingWC_SEPCHARS makes Windows break the character into two, the letter followed by the diacritic, for example e`. Passing WC_DISCARDNS makes Windows discard the diacritics. Passing WC_DEFAULTCHAR makes Windows replace the composite characters with a "default" character, specified in the lpDefaultChar parameter. The default behavior is WC_SEPCHARS.

    lpWideCharStr

    The Unicode string to convert.

    cchWideChar

    The length of lpWideCharStr in Unicode characters. You will usually pass -1, which indicates that the string is zero-terminated.

    lpMultiByteStr

    A char buffer that will hold the converted string.

    cbMultiByte

    The size of lpMultiByteStr, in bytes.

    lpDefaultChar

    Optional - a one-character ANSI string that contains the "default" character to be inserted when dwFlagscontains WC_COMPOSITECHECK | WC_DEFAULTCHAR and a Unicode character cannot be mapped to an equivalent ANSI character. You can pass NULL to have the API use a system default character (which as of this writing is a question mark).

    lpUsedDefaultChar

    Optional - a pointer to a BOOL that will be set to indicate if the default char was ever inserted into the ANSI string. You can pass NULL if you don't care about this information.

    Whew, a lot of boring details! Like always, the docs make it seem much more complicated than it really is. Here's an example showing how to use the API:

     Collapse | Copy Code

    // Assuming we already have a Unicode string wszSomeString...

    char szANSIString [MAX_PATH];

     

    WideCharToMultiByte ( CP_ACP, // ANSI code page

    WC_COMPOSITECHECK, // Check for accented characters

    wszSomeString, // Source Unicode string

    -1, // -1 means string is zero-terminated

    szANSIString, // Destination char string

    sizeof(szANSIString), // Size of buffer

    NULL, // No default character

    NULL ); // Don't care about this flag

    After this call, szANSIString will contain the ANSI version of the Unicode string.

    wcstombs()

    The CRT function wcstombs() is a bit simpler, but it just ends up calling WideCharToMultiByte(), so in the end the results are the same. The prototype for wcstombs() is:

     Collapse | Copy Code

    size_t wcstombs (

    char* mbstr,

    const
    wchar_t* wcstr,

    size_t count );

    The parameters are:

    mbstr

    A char buffer to hold the resulting ANSI string.

    wcstr

    The Unicode string to convert.

    count

    The size of the mbstr buffer, in bytes.

    wcstombs() uses the WC_COMPOSITECHECK | WC_SEPCHARS flags in its call to WideCharToMultiByte(). To reuse the earlier example, you can convert a Unicode string with code like this:

     Collapse | Copy Code

    wcstombs ( szANSIString, wszSomeString, sizeof(szANSIString) );

    CString

    The MFC CString class contains constructors and assignment operators that accept Unicode strings, so you can letCString do the conversion work for you. For example:

     Collapse | Copy Code

    // Assuming we already have wszSomeString...

     

    CString str1 ( wszSomeString ); // Convert with a constructor.

    CString str2;

     

    str2 = wszSomeString; // Convert with an assignment operator.

    ATL macros

    ATL has a handy set of macros for converting strings. To convert a Unicode string to ANSI, use the W2A() macro (a mnemonic for "wide to ANSI"). Actually, to be more accurate, you should use OLE2A(), where the "OLE" indicates the string came from a COM or OLE source. Anyway, here's an example of how to use these macros.

     Collapse | Copy Code

    #include <atlconv.h>

     

    // Again assuming we have wszSomeString...

     

    {

    char szANSIString [MAX_PATH];

    USES_CONVERSION; // Declare local variable used by the macros.

     

    lstrcpy ( szANSIString, OLE2A(wszSomeString) );

    }

    The OLE2A() macro "returns" a pointer to the converted string, but the converted string is stored in a temporary stack variable, so we need to make our own copy of it with lstrcpy(). Other macros you should look into areW2T() (Unicode to TCHAR), and W2CT() (Unicode string to const TCHAR string).

    There is an OLE2CA() macro (Unicode string to a const char string) which we could've used in the code snippet above. OLE2CA() is actually the correct macro for that situation, since the second parameter to lstrcpy() is aconst char*, but I didn't want to throw too much at you at once.

    Sticking with Unicode

    On the other hand, you can just keep the string in Unicode if you won't be doing anything complicated with the string. If you're writing a console app, you can print Unicode strings with the std::wcout global variable, for example:

     Collapse | Copy Code

    wcout << wszSomeString;

    But keep in mind that wcout expects all strings to be in Unicode, so if you have any "normal" strings, you'll still need to output them with std::cout. If you have string literals, prefix them with L to make them Unicode, for example:

     Collapse | Copy Code

    wcout << L"The Oracle says..." << endl << wszOracleResponse;

    If you keep a string in Unicode, there are a couple of restrictions:

  • You must use the wcsXXX() string functions, such as wcslen(), on Unicode strings.
  • With very few exceptions, you cannot pass a Unicode string to a Windows API on Windows 9x. To write code that will run on 9x and NT unchanged, you'll need to use the TCHAR types, as described in MSDN.

有关String的转换的一篇好文章的更多相关文章

  1. C# Byte[] 转String 无损转换

    C# Byte[] 转String 无损转换 转载请注明出处 http://www.cnblogs.com/Huerye/ /// <summary> /// string 转成byte[ ...

  2. Allegro转换PADS终极篇(转载)

    Allegro转换PADS终极篇.....http://www.eda365.com/forum.php?mod=viewthread&tid=86947&fromuid=190625 ...

  3. C# 之 将string数组转换到int数组并获取最大最小值

    1.string 数组转换到 int 数组 " }; int[] output = Array.ConvertAll<string, int>(input, delegate(s ...

  4. 转:char*, char[] ,CString, string的转换

    转:char*, char[] ,CString, string的转换 (一) 概述 string和CString均是字符串模板类,string为标准模板类(STL)定义的字符串类,已经纳入C++标准 ...

  5. HTML5 Blob与ArrayBuffer、TypeArray和字符串String之间转换

    1.将String字符串转换成Blob对象 //将字符串 转换成 Blob 对象 var blob = new Blob(["Hello World!"], { type: 'te ...

  6. 【python】bytearray和string之间转换,用在需要处理二进制文件和数据流上

    最近在用python搞串口工具,串口的数据流基本读写都要靠bytearray,而我们从pyqt的串口得到的数据都是string格式,那么我们就必须考虑到如何对这两种数据进行转换了,才能正确的对数据收发 ...

  7. Java - byte[] 和 String互相转换

    通过用例学习Java中的byte数组和String互相转换,这种转换可能在很多情况需要,比如IO操作,生成加密hash码等等. 除非觉得必要,否则不要将它们互相转换,他们分别代表了不同的数据,专门服务 ...

  8. CDuiString和String的转换

    很多时候 难免用到CDuiString和string的转换. 我们应该注意到,CDuiString类有个方法: LPCTSTR GetData() const; 可以通过这个方法,把CDuiStrin ...

  9. [转] HTML5 Blob与ArrayBuffer、TypeArray和字符串String之间转换

    1.将String字符串转换成Blob对象 //将字符串 转换成 Blob 对象 var blob = new Blob(["Hello World!"], { type: 'te ...

随机推荐

  1. 直接插入排序(高级版)之C++实现

    直接插入排序(高级版)之C++实现 一.源代码:InsertSortHigh.cpp /*直接插入排序思想: 假设待排序的记录存放在数组R[1..n]中.初始时,R[1]自成1个有序区,无序区为R[2 ...

  2. 1316 文化之旅 2012年NOIP全国联赛普及组

      题目描述 Description 有一位使者要游历各国,他每到一个国家,都能学到一种文化,但他不愿意学习任何一种文化超过一次(即如果他学习了某种文化,则他就不能到达其他有这种文化的国家).不同的国 ...

  3. [BZOJ4567][SCOI2016]背单词(Trie+贪心)

    1.题意表述十分难以理解,简单说就是:有n个单词,确定一个背的顺序,使总代价最小. 2.因为第(1)种情况的代价是n*n,这个代价比任何一种不出现第(1)种情况的方案都要大,所以最后肯定不会出现“背某 ...

  4. HDU 3487 Play with Chain (splay tree)

    Play with Chain Time Limit: 6000/2000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)T ...

  5. 支付宝支付-常用支付API详解(查询、退款、提现等)-转

    所有的接口支持沙盒环境的测试 1.前言 前面几篇文件详细介绍了 支付宝提现.扫码支付.条码支付.Wap支付.App支付 支付宝支付-提现到个人支付宝 支付宝支付-扫码支付 支付宝支付-刷卡支付(条码支 ...

  6. LeetCode152:Maximum Product Subarray

    Find the contiguous subarray within an array (containing at least one number) which has the largest ...

  7. C#中数据库连接的几种方式

    [转载]原文出处http://blog.163.com/ny_lonely/blog/static/188924273201161112931892/   1.配置文件链接. 利用VS.NET开发平台 ...

  8. 关于deselectRowAtIndexPath

    有没有遇到过,导航+UITableView,在push,back回来之后,当前cell仍然是选中的状态.当然,解决办法简单,添加一句[tableView deselectRowAtIndexPath: ...

  9. C#编程(六)------------枚举

    原文链接:http://blog.csdn.net/shanyongxu/article/details/46423255 枚举 定义枚举用到的关键字:enum public enum TimeOfD ...

  10. 【docker】docker启动、重启、关闭命令,附带:docker启动容器报错:docker: Error response from daemon: driver failed programming external connectivity on endpoint es2-node

    在关闭并放置centos 的防火墙重启之后[操作:https://www.cnblogs.com/sxdcgaq8080/p/10032829.html] 启动docker容器就发现开始报错: [ro ...