std::u32string conversion to/from std::string and std::u16string
I need to convert between UTF-8, UTF-16 and UTF-32 for different API's/modules and since I know have the option to use C++11 am looking at the new string types.
It looks like I can use string
, u16string
and u32string
for UTF-8, UTF-16 and UTF-32. I also found codecvt_utf8
and codecvt_utf16
which look to be able to do a conversion between char
or char16_t
and char32_t
and what looks like a higher level wstring_convert
but that only appears to work with bytes/std::string
and not a great deal of documentation.
Am I meant to use a wstring_convert
somehow for the UTF-16 ↔ UTF-32 and UTF-8 ↔ UTF-32 case? I only really found examples for UTF-8 to UTF-16, which I am not even sure will be correct on Linux where wchar_t
is normally considered UTF-32... Or do something more complex with those codecvt things directly?
Or is this just still not really in a usable state and I should stick with my own existing small routines using 8, 16 and 32bit unsigned integers?
---------------------------------------------------------------------------------------------------------------------------------------------------------------
answer:
If you read the documentation at CppReference.com for wstring_convert
, codecvt_utf8
, codecvt_utf16
, and codecvt_utf8_utf16
, the pages include a table that tells you exactly what you can use for the various UTF conversions.
And yes, you would use std::wstring_convert
to facilitate the conversion between the various UTFs. Despite its name, it is not limited to just std::wstring
, it actually operates with any std::basic_string
type (which std::string
, std::wstring
, and std::uXXstring
are all based on).
Class template std::wstring_convert performs conversions between byte string
std::string
and wide stringstd::basic_string<Elem>
, using an individual code conversion facet Codecvt. std::wstring_convert assumes ownership of the conversion facet, and cannot use a facet managed by a locale. The standard facets suitable for use with std::wstring_convert are std::codecvt_utf8 for UTF-8/UCS2 and UTF-8/UCS4 conversions and std::codecvt_utf8_utf16 for UTF-8/UTF-16 conversions.
For example:
typedef std::string u8string;
u8string To_UTF8(const std::u16string &s)
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
return conv.to_bytes(s);
}
u8string To_UTF8(const std::u32string &s)
{
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
return conv.to_bytes(s);
}
std::u16string To_UTF16(const u8string &s)
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
return conv.from_bytes(s);
}
std::u16string To_UTF16(const std::u32string &s)
{
std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
std::string bytes = conv.to_bytes(s);
return std::u16string(reinterpret_cast<const char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t));
}
std::u32string To_UTF32(const u8string &s)
{
std::wstring_convert<codecvt_utf8<char32_t>, char32_t> conv;
return conv.from_bytes(s);
}
std::u32string To_UTF32(const std::u16string &s)
{
const char16_t *pData = s.c_str();
std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
return conv.from_bytes(reinterpret_cast<const char*>(pData), reinterpret_cast<const char*>(pData+s.length()));
}
std::u32string conversion to/from std::string and std::u16string的更多相关文章
- 对std::string和std::wstring区别的解释,807个赞同,有例子
807down vote string? wstring? std::string is a basic_string templated on a char, and std::wstring on ...
- C++ MFC std::string转为 std::wstring
std::string转为 std::wstring std::wstring UTF8_To_UTF16(const std::string& source) { unsigned long ...
- 实战c++中的string系列--std:vector 和std:string相互转换(vector to stringstream)
string.vector 互转 string 转 vector vector vcBuf;string stBuf("Hello DaMao!!!");----- ...
- no match for call to ‘(std::__cxx11::string {aka std::__cxx11::basic_string
问题: t->package().ship_id(sqlRow[1]);其中 ship_id为 结构体package中的string类型.如下: typedef struct Package{ ...
- Item 25: 对右值引用使用std::move,对universal引用则使用std::forward
本文翻译自<effective modern C++>,由于水平有限,故无法保证翻译完全正确,欢迎指出错误.谢谢! 博客已经迁移到这里啦 右值引用只能绑定那些有资格被move的对象上去.如 ...
- 如何使用 window api 转换字符集?(std::string与std::wstring的相互转换)
//宽字符转多字节 std::string W2A(const std::wstring& utf8) { int buffSize = WideCharToMultiByte(CP_ACP, ...
- std::string与std::wstring互相转换
作者:zzandyc来源:CSDN原文:https ://blog.csdn.net/zzandyc/article/details/77540056 版权声明:本文为博主原创文章,转载请附上博文链接 ...
- 实战c++中的string系列--std::string与MFC中CString的转换
搞过MFC的人都知道cstring,给我们提供了非常多便利的方法. CString 是一种非常实用的数据类型. 它们非常大程度上简化了MFC中的很多操作,使得MFC在做字符串操作的时候方便了非常多.无 ...
- gcc编译链接std::__cxx11::string和std::string的问题
今天公司的小伙伴遇到一个问题,这里做一个记录. 问题是这样的,他编译了公司的基础库,然后在程序中链接的时候遇到点问题,报错找不到定义. 用到的函数声明大概是这样的: void function(con ...
随机推荐
- Linux下的rename命令
Dos/Windows下,对文件改名用rename.而书上说,Linux下对文件或目录改名该用mv.我一直也是这样做的,却忽略了Linux下也有个叫rename的命令.都是rename,但功能上就有点 ...
- Spring cloud consul 相关前提知识
Spring boot .vs. Spring mvc spring boot extends spring mvc extends spring Spring Boot uses Spring ...
- 如何去掉jQWidgets中TreeGrid和Grid右下角的链接
关于如何去掉这个水印,这是官方的说法. 更新了jQWidgets版本,发现在使用过程中发现每次渲染完TreeGrid和Grid后会在表格右下角出现一个www.jqwidgets.com的span标签. ...
- PHP usort 使用用户自定义的比较函数对数组中的值进行排序
From: http://www.php100.com/cover/php/2395.html usort (PHP 4, PHP 5) usort — 使用用户自定义的比较函数对数组中的值进行排序 ...
- 无向带权图的最小生成树算法——Prim及Kruskal算法思路
边赋以权值的图称为网或带权图,带权图的生成树也是带权的,生成树T各边的权值总和称为该树的权. 最小生成树(MST):权值最小的生成树. 生成树和最小生成树的应用:要连通n个城市需要n-1条边线路.可以 ...
- Xcode 文档注释
首先要下载一个服务:[下载地址]这是一个老外写的工作流,解压缩,然后双击,安装一下, 选择xcode —> services —> services perference 安装完就会在右边 ...
- jquery click事件,多次执行
用jquery绑定一个按钮click事件后,第一次点击后,一切正常,第二次点击,竟然执行两次,以后越来越多, 后来查看文档发现 jquery click 不是 替换原有的function 而是接 ...
- 关于SpringMVC Json使用
很简单的一个东西,这里就不做过多介绍了,写一个小Demo,随手记录下. 首先,在搭好SpringMVC工程环境之后,如果想用Spring自带的Json,需要额外的添加2个JAR包: 1.jackson ...
- Thinkphp5笔记五:配置data文件夹
如果你看项目下的各种文件,有种乱七八糟的感觉的话,你就可以进行以下配置. 配置data文件夹的,整理各种文件,让看起来舒服些. 一.设置runtime文件夹 index.php define('RUN ...
- Java编程思想学习笔记——类的访问权限
类的访问权限 Java中,访问权限修饰词用于确定库中哪些类对于该库的使用者是可用的. public class Foo{ } 注意点: 每个文件(编译单元)只能有一个public类. public类的 ...