在msvc中使用Boost.Spirit.X3

Preface

“Examples of designs that meet most of the criteria for "goodness" (easy to understand, flexible, efficient) are a recursive-descent parser, which is traditional procedural code. Another example is the STL, which is a generic library of containers and algorithms depending crucially on both traditional procedural code and on parametric polymorphism.” --Bjarne Stroustrup

先把Boost文档当中引用的Bj的名言搬过来镇楼。小生在这里斗胆也来一句。 Boost spirit is a recursive-descent parser, which is depending on traditional procedural code, static(parametric) polymorphism and expression template. Procedural Code控制流程，Static Polymorphism实现模式匹配与分派，再加上使用Expression Template管理语法产生式，让spirit充满的魔力。

鄙文对Spirit的性能问题不作讨论，只介绍Spirit.X3的一些基本概念和简单的使用方法，并在最后给出一个简单的示例。后面的一两篇幅，会介绍如果扩展X3. 鄙文还假设，读者有一些基本的编译知识，如词法分析、语法分析、抽象语法树(AST)、综合属性和继承属性与终结符和非终结符。

Terminals & Nonterminals

namespace x3 = boost::spirit::x3;

终结符号在X3中代表了一些基本词法单元(parser)的集合，它们通常都是一元的(unary parser)，在后面的篇幅中会剖析spirit的源码作详细解释。终结符号在展开语法生成式的时候，是最基本的单位。例如x3::char_匹配一个字符，x3::ascii::alpha匹配一个ascii码的一个字母，x3::float_匹配一个单精度浮点数等，匹配字符串使用了正则表达式引擎。详细请参考字符单元、数字单元和字符串单元等。

非终结符号通常是由终结符号按照一定的逻辑关系组成而来。非终结符号通过组合终结符号来生成定义复杂的语法生成式。例如x3::float_ >> x3::float与"16.0 1.2"匹配成功，>>表示一个顺序关系。*x3::char_与"asbcdf234"匹配成功，但同样也会与"assd s s ddd"匹配成功，在词法单元的世界中空格或者一些自定义的skipper(如注释)会被忽略跳过。详细的参考X3非终结符的文档。

上面我们看到在X3使用终结符与C++的operator来生成非终结符，那么非终结符到底是什么类型。实际上它是使用了expression template，创建了一个静态树形结构的语法产生式。那么展开产生式的过程，就是一个自顶向下的深度优先遍历，碰到非终结符号，x3会尝试匹配其子语法单元只到终结符号。

Synthesized Attribute

无论是终结符还是非终结符，在匹配字符串成功以后，它们将字符串作为输入，总会输出的某一个类型的值。这个值就是这个语法单元的综合属性。例如x3::char_的综合属性是char类型的值，x3::float_对应float型数的值。非终结符的属性比较复杂，可以参考组合语法单元的综合属性。

除了综合属性外，还有一个继承属性。继承属性同综合属性一样也是某一个类型的值，这个值可能来自于某个语法产生式其他节点的综合属性。例如xml的节点<Node></Node>，在解析</Node>的时候，需要与前面的匹配，这里就是使用继承属性的场景。可惜在x3中继承属性还没有实现，在boost::spirit::qi中有继承属性的实现。小生正在尝试实现继承属性，但是鄙文就不讨论继承属性了。

Start Rule

在编译解析源语言的开始，x3需要知道其语法产生式的起始语法，也就是语法产生式的静态树形数据结构的根节点。整个分析的流程就总根节点开始递归向下进行。而根节点的综合树形可以是代表这个源代码的抽象语法树。我们可以发现X3的词法分析与语法分析是被合并到一趟(One Pass)来完成了。当然，也可以在第一趟只做词法分析，将根节点的综合属性依旧为字符串，然后再做第二趟完成语法分析。

Simple Examples

1. 解析"1.2 , 1.3 , 1.4 , 1.5"

#include <boost/spirit/home/x3.hpp>   // x3 core

#include <boost/fusion/adapted.hpp>   // adapt fusion.vector with std::vector

// ......

std::string source = "1.2 , 1.3 , 1.4 , 1.5";

auto itr = source.cbegin();

auto end = source.cend();

std::vector<float> result;

auto r = phrase_parse(itr, end, x3::float_ >> *(',' >> x3::float_), x3::ascii::space, result);

x3::float_ >> *(',' >> x3::float_)表示一个float类型的数据后面紧跟若干个(',' >> x3::float_)的组合。在尝试写组合语法产生式的时候，先考虑语法再考虑综合属性。那么这里就要探究一下，这个组合产生式的综合属性是什么。','是一个字符常量，在x3的文档中可以知道，字符串常量x3::lit的综合属性是x3::unused，这意味着它只会消费(consume)源码的字符串而不会消费(consume)综合属性的占位。简而言之',' >> x3::float_中的','可以忽略，则其综合属性就是float类型的值。那么整个产生式的综合属性就是std::vector<int>类型的值了，或者其类型与std::vector<int>兼容(fusion.adapt)。

auto r = phrase_parse(itr, end, x3::float_ % ',', x3::ascii::space, result);

x3::float_ >> *(',' >> x3::float_)可以简化为x3::float_ % ','.

2. 解析" 1.2, Hello World"并产生一个用户自定义的综合属性

struct user_defined

{

    float              value;

    std::string        name;

};

BOOST_FUSION_ADAPT_STRUCT(

    user_defined, value, name)

// .....

std::string source = "1.2, Hello World";

auto itr = source.cbegin();

auto end = source.cend();

user_defined data;

auto r = phrase_parse(itr, end, x3::float_ >> ',' >> x3::lexeme[*x3::char_], x3::ascii::space, data);

    借助Boost.Fusion库，我们可以把一个struct适配成一个tuple. 宏BOOST_FUSION_ADAPT_STRUCT就把struct user_defined适配成了boost::fusion::vector<float, std::string>.

    x3::lexeme是一个词法探测器。词法探测器同样是一个parser，同样有综合属性。lexeme的综合属性是一个字符串值，但是它修改字符串迭代器的行为，在匹配的时候不跳过空格。如果是默认跳过空格的行为，那么*x3::char_会跳过字符串间的空格，匹配的结果将会是"HelloWorld"，这是一个错误的结果；而x3::lexeme[*x3::char_]匹配的结果是"Hello World".

phrase_parse函数定义在boost::spirit::x3的命名空间下，在这里phrase_parse是一个非限定性名称(unqualified name)，使用ADL查找就能正确找到函数的入口。

3. 解析C++的identifier

C++的identifier要求第一个字符只能是字母或者下划线，而后面的字符可以是字母数字或者下划线；

auto const identifier_def = x3::lexeme[x3::char_("_a-zA-Z") >> *x3::char_("_0-9a-zA-Z")];

第一种方法比较直观。x3::char_只匹配一个字符，x3::char_重载的operator call可以罗列其可以匹配的全部字符，别忘了使用lexeme不跳过空格。

auto const identifier_def = x3::lexeme[(x3::alpha | x3::char_('_')) >> *(x3::alnum | x3::char_('_'))];

第二种方法使用了x3中内置的charactor parser. x3::alpha是一个字母的parser而x3::alnum是字母和数字的parser.

auto const identifier_def = x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')];

这一种看似更简洁，但是它实际上是错误的。原因在于'_'是一个常量字符，x3::lit是没有综合属性的，所以当我们使用这个parser去解析一个identirier的时候，它会漏掉下划线。

auto const identifier_def = x3::raw[x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')]];

这一个例子会让我们更深刻的理解匹配串与综合属性的关系。虽然x3::raw的重载的operator index中的表达式的综合属性会忽略下划线，但是它匹配的字符串没有忽略下划线！x3::raw探测器，是一个unary parser，其综合属性的类型是一个字符串。它忽略其operator index中parser的综合属性，以其匹配的串来代替！例如，"_foo_1"中x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')]匹配的串是"_foo_1"，其综合属性是"foo1"；identifier_def的综合属性就把"foo1"用匹配串"_foo_1"代替。

4. 解析C++的注释

C++中注释有两种"//"和"/**/"。"//"一直到本行结束都是注释；而"/*"与下一个"*/"之间的都是注释。

auto const annotation_def =

        (x3::lit("//") > x3::seek[x3::eol | x3::eoi]) |

        (x3::lit("/*") > x3::seek[x3::lit("*/")]);

operator> 与operator>>都是顺序关系，但是前者比后者更严格。后者由operator>>顺序连接的parser不存在也是可以通过匹配的；但是前者有一个predicate的性质在其中，operator>连接的parser必须匹配才能成功。x3::eol与x3::eoi是两个charactor parser，分别表示文件的换行符与文件末尾符。我们值关心注释匹配的串，在真正的解析中会被忽略掉，而不关心注释语法单元的综合属性。x3::seek是另外一个词法探测器，它的综合属性依旧是一个字符串，它同x3::lexeme一样修改了迭代器的行为，匹配一个串直到出现一个指定的字符为止。

msvc中使用x3

x3使用了C++14标准的特性，如Expression SFINAE(基本上都是它的锅)， Generic Lambda等。它使用的大部分C++14的特性在vs2015的编译器上暂时都有实现除了Expression SFINAE. 小生只过了X3官方的例子，发现只用把这些使用了Expression SFINAE的代码改成传统的SFINAE的方法。除此之外还有Boost.Preprocessor库与decltype一起使用的时候在msvc14.0的编译器下有bug的问题。顺便喷一下微软，msvc都开始实现C++17的提案了，竟然连C++11的标准都还没有全部搞定！

1. 修改<boost\spirit\home\x3\nonterminal\detail\rule.hpp>中的代码

//template <typename ID, typename Iterator, typename Context, typename Enable = void>

    //struct has_on_error : mpl::false_ {};

    //

    //template <typename ID, typename Iterator, typename Context>

    //struct has_on_error<ID, Iterator, Context,

    //    typename disable_if_substitution_failure<

    //        decltype(

    //            std::declval<ID>().on_error(

    //                std::declval<Iterator&>()

    //              , std::declval<Iterator>()

    //              , std::declval<expectation_failure<Iterator>>()

    //              , std::declval<Context>()

    //            )

    //        )>::type

    //    >

    //  : mpl::true_

    //{};

template <typename ID, typename Iterator, typename Context>

struct has_on_error_impl

{

    template <typename U, typename = decltype(declval<U>().on_error(

        std::declval<Iterator&>(),

        std::declval<Iterator>(),

        std::declval<expectation_failure<Iterator>>(),

        std::devlval<Context>()

        ))>

    static mpl::true_ test(int);

    template<typename> static mpl::false_ test(...);

    using type = decltype(test<ID>());

};

template <typename ID, typename Iterator, typename Context>

using has_on_error = typename has_on_error_impl<ID, Iterator, Context>::type;

//template <typename ID, typename Iterator, typename Attribute, typename Context, typename Enable = void>

//struct has_on_success : mpl::false_ {};

//

//template <typename ID, typename Iterator, typename Attribute, typename Context>

//struct has_on_success<ID, Iterator, Context, Attribute,

//    typename disable_if_substitution_failure<

//        decltype(

//            std::declval<ID>().on_success(

//                std::declval<Iterator&>()

//              , std::declval<Iterator>()

//              , std::declval<Attribute&>()

//              , std::declval<Context>()

//            )

//        )>::type

//    >

//  : mpl::true_

//{};

template <typename ID, typename Iterator, typename Attribute, typename Context>

struct has_on_success_impl

{

    template <typename U, typename = decltype(declval<U>().on_success(

        std::declval<Iterator&>(),

        std::declval<Iterator>(),

        std::declval<Attribute>(),

        std::declval<Context>()

        ))>

    static mpl::true_ test(int);

    template<typename> static mpl::false_ test(...);

    using type = decltype(test<ID>());

};

template<typename ID, typename Iterator, typename Attribute, typename Context>

using has_on_success = typename has_on_success_impl<ID, Iterator, Attribute, Context>::type;

2. 修改<boost/spirit/home/x3/support/utility/is_callable.hpp>中的代码

    //template <typename Sig, typename Enable = void>

    //struct is_callable_impl : mpl::false_ {};

    //template <typename F, typename... A>

    //struct is_callable_impl<F(A...), typename disable_if_substitution_failure<

    //    decltype(std::declval<F>()(std::declval<A>()...))>::type>

    //  : mpl::true_

    //{};

    template <typename Sig>

    struct is_callable_impl : mpl::false_ {};

    template <typename F, typename ... A>

    struct is_callable_impl<F(A...)>

    {

        template <typename T, typename =

            decltype(std::declval<F>()(std::declval<A>()...))>

        static mpl::true_ test(int);

        template <typename T>

        static mpl::false_ test(...);

        using type = decltype(test<F>());

    };

3. 修改<boost/spirit/home/x3/nonterminal/rule.hpp>中的BOOST_SPIRIT_DEFINE为如下代码

#define BOOST_SPIRIT_DEFINE_(r, data, rule_name)                                \

    using BOOST_PP_CAT(rule_name, _t) = decltype(rule_name);                    \

    template <typename Iterator, typename Context, typename Attribute>          \

    inline bool parse_rule(                                                     \

        BOOST_PP_CAT(rule_name, _t) rule_                                       \

      , Iterator& first, Iterator const& last                                   \

      , Context const& context, Attribute& attr)                                \

    {                                                                           \

        using boost::spirit::x3::unused;                                        \

        static auto const def_ = (rule_name = BOOST_PP_CAT(rule_name, _def));   \

        return def_.parse(first, last, context, unused, attr);                  \

    }                                                                           \

    /***/

修改出1、2都是因为Expression SFINAE在msvc中还没有实现。而修改处3的原因是在使用BOOST_SPIRIT_DEFINE貌似与decltype有冲突，小生写了一些测试代码，最后把问题锁定在decltype(rule_name)作为形参类型的用法上。这里在gcc上编译是没有问题的，应该是msvc对decltype的支持还不完全。BOOST_SPIRIT_DEFINE涉及到x3::rule的使用，将在下一篇详细讲解使用方法。

Ending

Boost.Spirit乍看把C++语法弄得面目全非，其实在处理Expression Template的时候，重载operator是最优雅的做法。在UE4的UI框架，还有一些基于Expression Template的数学库中也大量使用了这种技巧。Recursive Descent - 迭代是人，递归是神；Static Polymorphism - 形散而神不散。而Expression Template应用在其中，就像是前面两者的躯骨框架。但是Expression Template如果构建特别复杂的语法产生式，也会使得编译器负担很重，降低编译速度，甚至导致类型标识符的长度大于4K！这些问题将在后面的篇幅同Spirit运行期的效率问题一同讨论。总体而言，小生觉得Spirit依旧是优雅的。