boost.property_tree解析xml的帮助类以及中文解析问题的解决(转)

boost.property_tree可以用来解析xml和json文件，我主要用它来解析xml文件，它内部封装了号称最快的xml解析器rapid_xml，其解析效率还是很好的。但是在使用过程中却发现各种不好用，归纳一下不好用的地方有这些：获取不存在的节点时就抛出异常获取属性值时，要排除属性和注释节点，如果没注意这一点就会抛出异常，让人摸不着头脑。内存模型有点怪。默认不支持中文的解析。解析中文会乱码。

ptree获取子节点

　　获取子节点接口原型为get_child(node_path)，这个node_path从当前路径开始的全路径，父路径和子路径之间通过“.”连接，如“root.sub.child”。需要注意的是get_child获取的是第一个子节点，如果我们要获取子节点列表，则要用路径“root.sub”，这个路径可以获取child的列表。如果获取节点的路径不存在则会抛出异常，这时，如果不希望抛出异常则可以用get_xxx_optional接口，该接口返回一个optional<T>的结果出来，由外面判断是否获取到结果了。

1.//ptree的optional接口

2.auto item = root.get_child_optional('Root.Scenes');

　　该接口返回的是一个optional<ptree>，外面还要判断该节点是否存在，optional对象通过bool操作符来判断该对象是否是无效值，通过指针访问

符'*'来访问该对象的实际内容。建议用optional接口访问xml节点。

1.//ptree的optional接口

2.auto item = root.get_child_optional('Root.Scenes');

3.if(item)

4. cout<<'该节点存在'<<endl;

ptree的内存模型

　　ptree维护了一个pair<string, ptree>的子节点列表，first指向的是该节点的TagName，second指向的才是ptree节点，因此在遍历ptree子节点时要注意迭代器的含义。

1.for (auto& data : root)

2.{

3. for (auto& item : data.second) //列表元素为pair<string, ptree>，要用second继续遍历

4. {

5. cout<<item.first<<endl;

6. }

7.}

　　需要注意的是ptree.first可能是属性（'<xmlattr>'）也可能是注释（'<xmlcomment>'），只有非注释类型的节点才能使用获取属性值、子节点等常用接口。

ptree获取属性值

　　通过get<T>(attr_name)可以获取属性的值，如果想获取属性的整形值的话，可以用get<int>('Id')，返回一个整数值。有一点要注意如果ptree.first为'<xmlcomment>'时，是没有属性值的，可以通过data()来获取注释内容。如果这个ptree.first不为<xmlattr>时需要在属性名称前面加'<xmlcomment>.'，即get<int>('<xmlcomment>.Id')才能正确获取属性值。可以看到获取属性值还是比较繁琐的，在后面要介绍的帮助类中可以简化属性值的获取。如果要获取节点的值则用get_value()接口，该接口用来获取节点的值，如节点：<Field>2</Field>通过get_value()就可以获取值'2'。

解析中文的问题

　　ptree只能解析窄字符的xml文件，如果xml文件中含有unicode如中文字符，解析出来就是乱码。解析unicode要用wptree，该类的接口均支持宽字符并且接口和ptree保持一致。要支持中文解析仅仅wptree还不够，还需要一个unicode转换器的帮助，该转换器可以实现宽字符和窄字符的转换，宽窄的互相转换函数有很多实现，不过c++11中有更简单统一的方式实现款窄字符的转换。

c++11中宽窄字符的转换：

1.std::wstring_convert<std::codecvt<wchar_t,char,std::mbstate_t>> conv

2.

3.(newstd::codecvt<wchar_t,char,std::mbstate_t>('CHS'));

4.//宽字符转为窄字符

5.string str = conv.to_bytes(L'你好');

6.//窄字符转为宽字符

7.string wstr = conv.from_bytes(str);

　　boost.property_tree在解析含中文的xml文件时，需要先将该文件转换一下。

　　boost解决方法：

01.#include 'boost/program_options/detail/utf8_codecvt_facet.hpp'

02.void ParseChn()

03.{

04. std::wifstream f(fileName);

05. std::locale utf8Locale(std::locale(), new boost::program_options::detail::utf8_codecvt_facet());

06. f.imbue(utf8Locale); //先转换一下

07.

08. //用wptree去解析

09. property_tree::wptree ptree;

10. property_tree::read_xml(f, ptree);

11.}

　　这种方法有个缺点就是要引入boost的libboost_program_options库，该库有二十多M，仅仅是为了解决一个中文问题，却要搞得这么麻烦，有点得不偿失。好在c++11提供更简单的方式，用c++11可以这样：

01.void Init(const wstring& fileName, wptree& ptree)

02.{

03. std::wifstream f(fileName);

04. std::locale utf8Locale(std::locale(), new std::codecvt_utf8<wchar_t>);

05. f.imbue(utf8Locale); //先转换一下

06.

07. //用wptree去解析

08. property_tree::read_xml(f, ptree);

09.}

　　用c++11就不需要再引入boost的libboost_program_options库了，很简单。

property_tree的帮助类

　　property_tree的帮助类解决了前面提到的问题：

用c++11解决中文解析问题简化属性的获取增加一些操作接口，比如一些查找接口避免抛出异常，全部返回optional<T>对象隔离了底层繁琐的操作接口，提供统一、简洁的高层接口，使用更加方便。

　　下面来看看这个帮助类是如何实现的吧：

001.#include<boost/property_tree/ptree.hpp>

002.#include<boost/property_tree/xml_parser.hpp>

003.using namespace boost;

004.using namespace boost::property_tree;

005.

006.#include <map>

007.#include <vector>

008.#include <codecvt>

009.#include <locale>

010.using namespace std;

011.

012.const wstring XMLATTR = L'<xmlattr>';

013.const wstring XMLCOMMENT = L'<xmlcomment>';

014.const wstring XMLATTR_DOT = L'<xmlattr>.';

015.const wstring XMLCOMMENT_DOT = L'<xmlcomment>.';

016.

017.class ConfigParser

018.{

019.public:

020.

021. ConfigParser() : m_conv(new code_type('CHS'))

022. {

023.

024. }

025.

026. ~ConfigParser()

027. {

028. }

029.

030. void Init(const wstring& fileName, wptree& ptree)

031. {

032. std::wifstream f(fileName);

033. std::locale utf8Locale(std::locale(), new std::codecvt_utf8<wchar_t>);

034. f.imbue(utf8Locale); //先转换一下

035. wcout.imbue(std::locale('chs')); //初始化cout为中文输出格式

036.

037. //用wptree去解析

038. property_tree::read_xml(f, ptree);

039. }

040.

041. // convert UTF-8 string to wstring

042. std::wstring to_wstr(const std::string& str)

043. {

044. return m_conv.from_bytes(str);

045. }

046.

047. // convert wstring to UTF-8 string

048. std::string to_str(const std::wstring& str)

049. {

050. return m_conv.to_bytes(str);

051. }

052.

053. //获取子节点列表

054. auto Descendants(const wptree& root, const wstring& key)->decltype(root.get_child_optional(key))

055. {

056. return root.get_child_optional(key);

057. }

058.

059. //根据子节点属性获取子节点列表

060. template<typename T>

061. vector<wptree> GetChildsByAttr(const wptree& parant, const wstring& tagName, const wstring& attrName, const T& attrVal)

062. {

063. vector<wptree> v;

064.

065. for (auto& child : parant)

066. {

067. if (child.first != tagName)

068. continue;

069.

070. auto attr = Attribute<T>(child, attrName);

071.

072. if (attr&&*attr == attrVal)

073. v.push_back(child.second);

074. }

075.

076. return v;

077. }

078.

079. //获取节点的某个属性值

080. template<typename R>

081. optional<R> Attribute(const wptree& node, const wstring& attrName)

082. {

083. return node.get_optional<R>(XMLATTR_DOT + attrName);

084. }

085.

086. //获取节点的某个属性值，默认为string

087. optional<wstring> Attribute(const wptree& node, const wstring& attrName)

088. {

089. return Attribute<wstring>(node, attrName);

090. }

091.

092. //获取value_type的某个属性值

093. template<typename R>

094. optional<R> Attribute(const wptree::value_type& pair, const wstring& attrName)

095. {

096. if (pair.first == XMLATTR)

097. return pair.second.get_optional<R>(attrName);

098. else if (pair.first == XMLCOMMENT)

099. return optional<R>();

100. else

101. return pair.second.get_optional<R>(XMLATTR_DOT + attrName);

102. }

103.

104. //获取value_type的某个属性值，默认为string

105. optional<wstring> Attribute(const wptree::value_type& pair, const wstring& attrName)

106. {

107. return Attribute<wstring>(pair, attrName);

108. }

109.

110. //根据某个属性生成一个<string, ptree>的multimap

111. template<class F = std::function<bool(wstring&)>>

112. multimap<wstring, wptree> MakeMapByAttr(const wptree& root, const wstring& key, const wstring& attrName, F predict = [](wstring& str){return true; })

113. {

114. multimap<wstring, wptree> resultMap;

115. auto list = Descendants(root, key);

116. if (!list)

117. return resultMap;

118.

119. for (auto& item : *list)

120. {

121. auto attr = Attribute(item, attrName);

122. if (attr&&predict(*attr))

123. resultMap.insert(std::make_pair(*attr, item.second));

124. }

125.

126. return resultMap;

127. }

128.

129.private:

130. using code_type = std::codecvt<wchar_t, char, std::mbstate_t>;

131. std::wstring_convert<code_type> m_conv;

132.};

　　测试文件test.xml和测试代码：

01.<?xml version='1.0' encoding='UTF-8'?>

02.<Root Id='123456'>

03. <Scenes>

04. 

05. <Scene Name='测试1'>

06. 

07. <DataSource>

08. 

09. <Data>

10. 

11. <Item Id='1' FileName='测试文件1' />

12. </Data>

13. <Data>

14. <Item Id='2' FileName='测试文件2' />

15. <Item Id='3' FileName='测试文件3' />

16. </Data>

17. </DataSource>

18. </Scene>

19. 

20. <Scene Name='测试2'>

21. <DataSource>

22. <Data>

23. <Item Id='4' FileName='测试文件4' />

24. </Data>

25. <Data>

26. <Item Id='5' FileName='测试文件5' />

27. </Data>

28. </DataSource>

29. </Scene>

30. </Scenes>

31.</Root>

01.void Test()

02.{

03. wptree pt; pt.get_value()

04. ConfigParser parser;

05. parser.Init(L'test1.xml', pt); //解决中文问题，要转换为unicode解析

06.

07. auto scenes = parser.Descendants(pt, L'Root.Scenes'); //返回的是optional<wptree>

08. if (!scenes)

09. return;

10.

11. for (auto& scene : *scenes)

12. {

13. auto s = parser.Attribute(scene, L'Name'); //获取Name属性，返回的是optional<wstring>

14. if (s)

15. {

16. wcout << *s << endl;

17. }

18.

19. auto dataList = parser.Descendants(scene.second, L'DataSource'); //获取第一个子节点

20. if (!dataList)

21. continue;

22.

23. for (auto& data : *dataList)

24. {

25. for (auto& item : data.second)

26. {

27. auto id = parser.Attribute<int>(item, L'Id');

28. auto fileName = parser.Attribute(item, L'FileName');

29.

30. if (id)

31. {

32. wcout << *id << L' ' << *fileName << endl; //打印id和filename

33. }

34. }

35. }

36. }

37.}

测试结果:

　　可以看到通过帮助类，无需使用原生接口就可以很方便的实现节点的访问与操作。使用者不必关注内部细节，根据统一而简洁的接口就可以操作xml文件了。

　　一点题外话，基于这个帮助类再结合linq to object可以轻松的实现linq to xml：

01.//获取子节点SubNode的属性ID的值为0x10000D的项并打印出该项的Type属性

02.from(node.Descendants('Root.SubNode')).where([](XNode& node)

03.{

04. auto s = node.Attribute('ID');

05. return s&&*s == '0x10000D';

06.}).for_each([](XNode& node)

07.{

08. auto s = node.Attribute('Type');

09. if (s)

10. cout << *s << endl;

11.});