关于COM的Unicode string的精彩论述

I need to make a detour for a few moments, and discuss how to handle strings in COM code. If you are familiar with how Unicode and ANSI strings work, and know how to convert between the two, then you can skip this section. Otherwise, read on.

Whenever a COM method returns a string, that string will be in Unicode. (Well, all methods that are written to the COM spec, that is!) Unicode is a character encoding scheme, like ASCII, only all characters are 2 bytes long. If you want to get the string into a more manageable state, you should convert it to a TCHAR string.

TCHAR and the _t functions (for example, _tcscpy()) are designed to let you handle Unicode and ANSI strings with the same source code. In most cases, you'll be writing code that uses ANSI strings and the ANSI Windows APIs, so for the rest of this article, I will refer to chars instead of TCHARs, just for simplicity. You should definitely read up on the TCHAR types, though, to be aware of them in case you ever come across them in code written by others.

When you get a Unicode string back from a COM method, you can convert it to a char string in one of several ways:

Call the WideCharToMultiByte() API.
Call the CRT function wcstombs().
Use the CString constructor or assignment operator (MFC only).
Use an ATL string conversion macro.

WideCharToMultiByte()

You can convert a Unicode string to an ANSI string with the WideCharToMultiByte() API. This API's prototype is:

Hide   Copy Code

int WideCharToMultiByte (

UINT CodePage,

DWORD dwFlags,

LPCWSTR lpWideCharStr,

int cchWideChar,

LPSTR lpMultiByteStr,

int cbMultiByte,

LPCSTR lpDefaultChar,

LPBOOL lpUsedDefaultChar );

The parameters are:

CodePage

The code page to convert the Unicode characters into. You can pass CP_ACP to use the current ANSI code page. Code pages are sets of 256 characters. Characters 0-127 are always identical to the ASCII encoding. Characters 128-255 differ, and can contain graphics or letters with diacritics. Each language or region has its own code page, so it's important to use the right code page to get proper display of accented characters.

dwFlags

dwFlags determine how Windows deals with "composite" Unicode characters, which are a letter followed by a diacritic. An example of a composite character is è. If this character is in the code page specified inCodePage, then nothing special happens. However, if it is not in the code page, Windows has to convert it to something else.
Passing WC_COMPOSITECHECK makes the API check for non-mapping composite characters. PassingWC_SEPCHARS makes Windows break the character into two, the letter followed by the diacritic, for examplee`. Passing WC_DISCARDNS makes Windows discard the diacritics. Passing WC_DEFAULTCHAR makes Windows replace the composite characters with a "default" character, specified in the lpDefaultCharparameter. The default behavior is WC_SEPCHARS.

lpWideCharStr

The Unicode string to convert.

cchWideChar

The length of lpWideCharStr in Unicode characters. You will usually pass -1, which indicates that the string is zero-terminated.

lpMultiByteStr

A char buffer that will hold the converted string.

cbMultiByte

The size of lpMultiByteStr, in bytes.

lpDefaultChar

Optional - a one-character ANSI string that contains the "default" character to be inserted when dwFlagscontains WC_COMPOSITECHECK | WC_DEFAULTCHAR and a Unicode character cannot be mapped to an equivalent ANSI character. You can pass NULL to have the API use a system default character (which as of this writing is a question mark).

lpUsedDefaultChar

Optional - a pointer to a BOOL that will be set to indicate if the default char was ever inserted into the ANSI string. You can pass NULL if you don't care about this information.

Whew, a lot of boring details! Like always, the docs make it seem much more complicated than it really is. Here's an example showing how to use the API:

Hide   Copy Code

// Assuming we already have a Unicode string wszSomeString...

char szANSIString [MAX_PATH];

WideCharToMultiByte ( CP_ACP, // ANSI code page

WC_COMPOSITECHECK, // Check for accented characters

wszSomeString, // Source Unicode string

-1, // -1 means string is zero-terminated

szANSIString, // Destination char string

sizeof(szANSIString), // Size of buffer

NULL, // No default character

NULL ); // Don't care about this flag

After this call, szANSIString will contain the ANSI version of the Unicode string.

wcstombs()

The CRT function wcstombs() is a bit simpler, but it just ends up calling WideCharToMultiByte(), so in the end the results are the same. The prototype for wcstombs() is:

Hide   Copy Code

size_t wcstombs (

char* mbstr,

const
wchar_t* wcstr,

size_t count );

The parameters are:

mbstr

A char buffer to hold the resulting ANSI string.

wcstr

The Unicode string to convert.

count

The size of the mbstr buffer, in bytes.

wcstombs() uses the WC_COMPOSITECHECK | WC_SEPCHARS flags in its call to WideCharToMultiByte(). To reuse the earlier example, you can convert a Unicode string with code like this:

Hide   Copy Code

wcstombs ( szANSIString, wszSomeString, sizeof(szANSIString) );

CString

The MFC CString class contains constructors and assignment operators that accept Unicode strings, so you can let CString do the conversion work for you. For example:

Hide   Copy Code

// Assuming we already have wszSomeString...

CString str1 ( wszSomeString ); // Convert with a constructor.

CString str2;

str2 = wszSomeString; // Convert with an assignment operator.

ATL macros

ATL has a handy set of macros for converting strings. To convert a Unicode string to ANSI, use the W2A() macro (a mnemonic for "wide to ANSI"). Actually, to be more accurate, you should use OLE2A(), where the "OLE" indicates the string came from a COM or OLE source. Anyway, here's an example of how to use these macros.

Hide   Copy Code

#include <atlconv.h>

// Again assuming we have wszSomeString...

{

char szANSIString [MAX_PATH];

USES_CONVERSION; // Declare local variable used by the macros.

lstrcpy ( szANSIString, OLE2A(wszSomeString) );

}

The OLE2A() macro "returns" a pointer to the converted string, but the converted string is stored in a temporary stack variable, so we need to make our own copy of it with lstrcpy(). Other macros you should look into areW2T() (Unicode to TCHAR), and W2CT() (Unicode string to const TCHAR string).

There is an OLE2CA() macro (Unicode string to a const char string) which we could've used in the code snippet above. OLE2CA() is actually the correct macro for that situation, since the second parameter tolstrcpy() is a const char*, but I didn't want to throw too much at you at once.

关于COM的Unicode string的精彩论述的更多相关文章

【RF库测试】Encode String To Bytes&Decode Bytes To String& should be string&should be unicode string &should not be string
场景1:判断类型 r ${d} set variable \xba\xcb\xbc\xf5\xcd\xa8\xb9\xfd #核减通过 Run Keyword And Continue On Fail ...
unicode string和ansi string的转换函数及获取程序运行路径的代码
#pragma once#include <string> namespace stds { class tool { public: std::string ws2s(const std ...
python: int to unicode string
>>> import types >>> print type(str(2)) <type 'str'> >>> ')) <ty ...
Unicode String to a UTF-8 TypedArray Buffer in JavaScript
https://coolaj86.com/articles/unicode-string-to-a-utf-8-typed-array-buffer-in-javascript/
np.nan is an invalid document, expected byte or unicode string.
ValueError Traceback (most recent call last) <ipython-input-12-1dc462ae8893> in <module> ...
从Java String实例来理解ANSI、Unicode、BMP、UTF等编码概念
转(http://www.codeceo.com/article/java-string-ansi-unicode-bmp-utf.html#0-tsina-1-10971-397232819ff9a ...
[转]SSIS cannot convert between unicode and non-unicode string
本文转自:http://www.mssqltips.com/sqlservertip/1393/import-excel-unicode-data-with-sql-server-integratio ...
UTF-8和Unicode
What's the difference between unicode and utf8? up vote 103 down vote favorite 49 Is it true that un ...
C#中文和UNICODE编码转换
C#中文和UNICODE编码转换 //中文轉為UNICODE string str = "中文"; string outStr = ""; if (!strin ...

随机推荐

CSS 布局:40个教程、技巧、例子和最佳实践
前言: 布局是WEB开发一个重要的课题,进入XHTML/CSS后,使用TABLE布局的方式逐渐淡出,CSS布局以众多优点成为主流,本文将介绍40个基于CSS的web布局的资源和教程.文章的出处在htt ...
Web2.0应用程序的7条原则
个人看好Web的发展潜力,本文字摘自<Collective Intelligence 实战> 网络是平台使用传统许可模式软件的公司或用户必须运行软件.定期更新至最新版本,以及扩展它来满足 ...
利用arpspoof探取账户密码
---恢复内容开始--- > /proc/sys/net/ipv4/ip_forward 首先在kali里开启IP转发功能 arpspoof -t 被害人ip 网关ip -i eth0 例如再 ...
轻巧的编辑器：Sublime Text3 user设置
开发到现在,编辑器倒用过不少,VIM.zend.my eclipse.EPP.editplus.notepad++.sublime text 2. 最初使用sublime是同学推荐的,说其何其的好,何 ...
OpenGL ES 3.0 图元装配
1. 前言之前已经把纹理的渲染给弄出来了,但是又遇到一个新的问题,那就是图元装配,比如说我已经把图片给显示出来了,但是呢,并没有做到让它显示到具体的位置,而跟这个位置相关的则需要靠图元装配. 图元装 ...
SVN 提交回滚
取消对代码的修改分为两种情况: 第一种情况:改动没有被提交(commit). 这种情况下,使用svn revert就能取消之前的修改. svn revert用法如下: # svn revert [ ...
Map:HashMap和TreeMap
一.Map集合特点:将键映射到值得对象 Map集合和Collection集合的区别? Collection:是单列集合,存储的是单独出现的元素 Map: 是双列集合,存储的是键值对形式 ...
Antd前端开发采坑记录
背景基于页面友好,界面整洁美观:基于Antd框架开发虾能平台选型基于Antd-admin工程架构,进行开发:基于Antd+React+Umj 采坑记录按照Html方式天机onClick方法,每 ...
javascript小记-作用域
一.全局作用域全局作用域的变量不论在什么时候都可以直接引用,而不必通过全局对象:满足以下条件的变量属于全局作用域:1.在最外层定义的变量2.全局对象的属性3.任何地方隐式定义的变量(未定义直接赋值的 ...
C++源码里没有./configure文件的问题
使用autoreconf软件来进行生成即可,在命令行输入autoreconf -vi,注意:前提要安装这个软件yum install autoconf.

关于COM的Unicode string的精彩论述

关于COM的Unicode string的精彩论述的更多相关文章

随机推荐

热门专题