对std::string和std::wstring区别的解释,807个赞同,有例子
807down vote
|
|
|
@Sorin Sbarnea: UTF-8 could take 1-6 bytes, but apparently the standard limits it to 1-4. See en.wikipedia.org/wiki/UTF8#Description for more information. – paercebal Jan 13 '10 at 13:10
|
||
|
While this examples produces different results on Linux and Windows the C++ program contains implementation-defined behavior as to whether
olè is encoded as UTF-8 or not. Further more, the reason you cannot natively stream wchar_t * to std::cout is because the types are incompatible resulting in an ill-formed program and it has nothing to do with the use of encodings. It's worth pointing out that whether you use std::string or std::wstring depends on your own encoding preference rather than the platform, especially if you want your code to be portable. – John Leidegren Aug 9 '12 at 9:37 |
||
|
@paercebal Whatever the platform supports is entirely arbitrary and besides the point. If you store all strings internally as UTF-8 on Windows you'll have to convert them to either ANSI or UTF-16 and call the corresponding Win32 function but if you know your UTF-8 strings are just plain ASCII strings you don't have to do anything. The platform doesn't dictate how you use strings as much as the circumstances. – John Leidegren Aug 9 '12 at 16:35
|
||
|
Windows actually uses UTF-16 and have been for quite some time, older versions of Windows did use UCS-2 but this is not the case any longer. My only issue here is the conclusion that
std::wstring should be used on Windows because it's a better fit for the Unicode Windows API which I think is fallacious. If your only concern was calling into the Unicode Windows API and not marshalling strings then sure but I don't buy this as the general case. – John Leidegren Aug 9 '12 at 18:15 |
||
|
@ John Leidegren :
If your only concern was calling into the Unicode Windows API and not marshalling strings then sure : Then, we agree. I'm coding in C++, not JavaScript. Avoiding useless marshalling or any other potentially costly processing at runtime when it can be done at compile time is at the heart of that language. Coding against WinAPI and using std::string is just an unjustified wasting runtime resources. You find it fallacious, and it's Ok, as it is your viewpoint. My own is that I won't write code with pessimization on Windows just because it looks better from the Linux side. – paercebal Aug 9 '12 at 19:48 |
So, every reader here now should have a clear understanding about the facts, the situation. If not, then you must read paercebal's outstandingly comprehensive answer [btw: thanks!]. My pragmatical conclusion is shockingly simple: all that C++ (and STL) "character encoding" stuff is substantially broken and useless. Blame it on Microsoft or not, that will not help anyway. My solution, after in-depth investigation, much frustration and the consequential experiences is the following:
The conversions are straightforward, google should help here ... That's it. Use UTF8String wherever memory is precious and for all UTF-8 I/O. Use UCS2String wherever the string must be parsed and/or manipulated. You can convert between those two representations any time. Alternatives & Improvements
ICU or other unicode libraries? |
|||||||||||||||||||||
|
I recommend avoiding My view is summarized in http://utf8everywhere.org of which I am a co-author. Unless your application is API-call-centric, e.g. mainly UI application, the suggestion is to store Unicode strings in std::string and encoded in UTF-8, performing conversion near API calls. The benefits outlined in the article outweigh the apparent annoyance of conversion, especially in complex applications. This is doubly so for multi-platform and library development. And now, answering your questions:
|
||||
|
|||||||||||||||||||||
|
I frequently use std::string to hold utf-8 characters without any problems at all. I heartily recommend doing this when interfacing with API's which use utf-8 as the native string type as well. For example, I use utf-8 when interfacing my code with the Tcl interpreter. The major caveat is the length of the std::string, is no longer the number of characters in the string.
|
|||||||||||||||||||||
|
|
|||||||||||||
|
Applications that are not satisfied with only 256 different characters have the options of either using wide characters (more than 8 bits) or a variable-length encoding (a multibyte encoding in C++ terminology) such as UTF-8. Wide characters generally require more space than a variable-length encoding, but are faster to process. Multi-language applications that process large amounts of text usually use wide characters when processing the text, but convert it to UTF-8 when storing it to disk. The only difference between a Practically every compiler uses a character set whose first 128 characters correspond with ASCII. This is also the case with compilers that use UTF-8 encoding. The important thing to be aware of when using strings in UTF-8 or some other variable-length encoding, is that the indices and lengths are measured in bytes, not characters. The data type of a wstring is If you don't need multi-language support, you might be fine with using only regular strings. On the other hand, if you're writing a graphical application, it is often the case that the API supports only wide characters. Then you probably want to use the same wide characters when processing the text. Keep in mind that UTF-16 is a variable-length encoding, meaning that you cannot assume |
|||||||||||||||||
|
1) As mentioned by Greg, wstring is helpful for internationalization, that's when you will be releasing your product in languages other than english 4) Check this out for wide character http://en.wikipedia.org/wiki/Wide_character |
|||
|
|||||||||||||||||||||
|
A good question! I think DATA ENCODING (sometime CHARSET also involved) is a MEMORY EXPRESSION MECHANISM in order to save data to file or transfer data via network, so I answer this question as: 1.When should I use std::wstring over std::string? If the programming platform or API function is a single-byte one, and we want to process or parse some unicode datas, e.g read from Windows' .REG file or network 2-byte stream, we should declare std::wstring variable to easy process them. e.g.: wstring ws=L"中国a"(6 octets memory: 0x4E2D 0x56FD 0x0061), we can use ws[0] to get character '中' and ws[1] to get character '国' and ws[2] to get character 'a', etc. 2.Can std::string hold the entire ASCII character set, including the special characters? Yes. But notice: American ASCII, means each 0x00~0xFF octet stand for one character ,including printable text such as "123abc&*_&" and you said special one, mostly print it as a '.' avoid confusing editors or terminals. And some other countries extend their own "ASCII" charset ,e.g. Chinese, use 2 octets to stand for one character. 3.Is std::wstring supported by all popular C++ compilers? Maybe, or mostly. I have used: VC++6 and GCC 3.3, YES 4.What is exactly a "wide character"? wide character mostly indicate using 2 octets or 4 octets to hold all countries's characters. 2 octets UCS2 is a representative sample, and further e.g. English 'a', its memory is 2 octet of 0x0061(vs in ASCII 'a's memory is 1 octet 0x61) |
https://stackoverflow.com/questions/402283/stdwstring-vs-stdstring
对std::string和std::wstring区别的解释,807个赞同,有例子的更多相关文章
- C++ MFC std::string转为 std::wstring
std::string转为 std::wstring std::wstring UTF8_To_UTF16(const std::string& source) { unsigned long ...
- 如何使用 window api 转换字符集?(std::string与std::wstring的相互转换)
//宽字符转多字节 std::string W2A(const std::wstring& utf8) { int buffSize = WideCharToMultiByte(CP_ACP, ...
- std::string与std::wstring互相转换
作者:zzandyc来源:CSDN原文:https ://blog.csdn.net/zzandyc/article/details/77540056 版权声明:本文为博主原创文章,转载请附上博文链接 ...
- std::u32string conversion to/from std::string and std::u16string
I need to convert between UTF-8, UTF-16 and UTF-32 for different API's/modules and since I know have ...
- std::string, std::wstring, wchar_t*, Platform::String^ 之间的相互转换
最近做WinRT的项目,涉及到Platform::String^ 和 std::string之间的转换,总结一下: (1)先给出源代码: std::wstring stows(std::string ...
- std::wstring std::string w2m m2w
static std::wstring m2w(std::string ch, unsigned int CodePage = CP_ACP) { if (ch.empty())return L&qu ...
- C++ std::unordered_map使用std::string和char *作key对比
最近在给自己的服务器框架加上统计信息,其中一项就是统计创建的对象数,以及当前还存在的对象数,那么自然以对象名字作key.但写着写着,忽然纠结是用std::string还是const char *作ke ...
- std::string 用法总结
标准C++中的string类的用法总结 相信使用过MFC编程的朋友对CString这个类的印象应该非常深刻吧?的确,MFC中的CString类使用起来真的非常的方便好用.但是如果离开了MFC框架,还有 ...
- could not deduce template argument for 'const std::_Tree<_Traits> &' from 'const std::string'
VS2008, 写一个简单的demo的时候出现了这个: 1>------ Build started: Project: GetExportTable, Configuration: Relea ...
随机推荐
- https://github.com/mvf/svn_wfx
https://github.com/mvf/svn_wfx 2003.net对应的vc是7.0版本.需要更高的. 在哪里可以下载呢 https://www.tjupt.org/没有校外种子 Proj ...
- 中小研发团队架构实践之RabbitMQ快速入门及应用
原文:中小研发团队架构实践之RabbitMQ快速入门及应用 使用过分布式中间件的人都知道,程序员使用起来并不复杂,常用的客户端API就那么几个,比我们日常编写程序时用到的API要少得多.但是分布式中间 ...
- asm 的hello world 2011.04.28
这几天一直在弄一个嵌入式的程序,搭环境,熟悉库函数,熟悉汇编,乱成一锅粥,到现在还是没有什么系统性的收获. 或许下周弄出来吧,(一定得弄出来,不然老大该跟我急了……). 今天,熟悉汇编,好歹用汇编写出 ...
- js进阶 11-17 juqery如何查找一个元素的同级元素
js进阶 11-17 juqery如何查找一个元素的同级元素 一.总结 一句话总结:三个方法,向前(prev()),向后(next())和兄弟(siblings()),而前面两个每个都对应三个,pre ...
- printk()函数的总结
我们在使用printk()函数中使用日志级别为的是使编程人员在编程过程中自定义地进行信息的输出,更加容易地掌握系统当前的状况.对程序的调试起到了很重要的作用.(下文中的日志级别和控制台日志控制级别是一 ...
- 11G、12C安装结束需要做的一些操作
修改spfile参数:修改前,先备份 create pfile from spfile; alter system set memory_target=0 scope=spfile;alter sys ...
- css hover控制其他元素
<html> <body> <style> #a:hover {color : #FFFF00;} #a:hover > #b:first-child{col ...
- 用户之间imp的问题
今天同事说申请了一个从生产导出的dump文件,须要导入測试库进行測试. 之前做的基本都是本库导出,本库导入的操作,比如:imp test/***@test tables=tbl_fuel file=H ...
- php课程 6-23 mb_substr字符串截取怎么用
php课程 6-23 mb_substr字符串截取怎么用 一.总结 一句话总结: 1.mb_substr字符串截取怎么用? 参数为:起始位置,个数 $str='我是小金,我是中国人!'; echo & ...
- Tor (洋葱头)torbrowser
Tor是什么 Tor是互联网上用于保护您隐私最有力的工具之一,但是时至今日仍有许多人往往认为Tor是一个终端加密工具.事实上,Tor是用来匿名浏览网页和邮件发送(并非是邮件内容加密)的.今天,我们要讨 ...