variable point to code variable
expression tree data structure
lamda expression anonymous function

原文

Newcomers to LINQ often find expression trees difficult to grasp. In this post I hope to show that the subject is not quite as difficult as it might appear at first. Any reader who has an intermediate level understanding of LINQ should find the material in this post easy to grasp.

An expression tree provides a method of translating executable code into data. This can be very valuable if you want to modify or transform code before executing it. In particular, it can be useful if you want to transform C# code such as a LINQ query expression into code that operates on another process, such as a SQL database.

But I'm getting ahead of myself. By the end of this post you will find it easy to understand why it is helpful to translate code into data. First I need to provide a little background. Let's start by seeing the relatively simple syntax for creating an expression tree.

The Syntax for an Expression Tree

Consider the following very simple lambda expression:

Func<int, int, int> function = (a,b) => a + b;
This statement consists of three sections.
  1. A declaration: Func<int, int, int> function
  2. An equals operator: =
  3. A lambda expression: (a,b) => a + b;

The variable function points at raw executable code that knows how to add two numbers. The lambda expression shown in step three above is a short hand way of writing the following method:

public int function(int a, int b)
{
return a + b;
}

One can call either the method shown above, or the lambda expression like this:

int c = function(3, 5);

After the function is called, the variable c will be set equal to 3 + 5, which is 8.

The delegate type Func shown above in the declaration found in step one is declared for us in the System namespace:

public delegate TResult Func<T1, T2, TResult>(T1 arg1, T2 arg2);

This code looks complicated, but it is used here to help us declare the variable function, which is set equal to a very simple lambda expression that adds two numbers together. Even if you don't understand delegates and generic functions, it should still be clear that this is a way of declaring a variable that references executable code. In this case it points at very simple executable code.

Translating Code into Data

In the previous section, you saw how to declare a variable that points at raw executable code. Expression trees are not executable code, they are a form of data structure. So how does one translate the raw code found in an expression into an expression tree? How does one translate code into data?

LINQ provides a simple syntax for translating code into a data structure called an expression tree. The first step is to add a using statement to introduce the Linq.Expressions namespace:

using System.Linq.Expressions;

Now we can create an expression tree:

Expression<Func<int, int, int>> expression = (a,b) => a + b;

The identical lambda expression shown in the previous example is converted into an expression tree declared to be of type Expression<T>. The identifier expression is not executable code; it is a data structure called an expression tree.

The samples that ship with Visual Studio 2008 include a program called the ExpressionTreeVisualizer. It can be used to visualize an expression tree. In Figure 01 you can see a screen shot of a dialog that displays the expression tree for the simple Expression statement shown above. Notice that the lambda expression is displayed at the top of the dialog, and its constituent parts are displayed below it in a TreeView control.

Figure 01: The ExpressionTreeVisualizer sample that ships with the VS 2008 C# Samples creates a symbolic rendering of the expression tree.

Writing Code to Explore an Expression Tree

Our simple example is an Expression<TDelegate>. There are four properties in the Expression<TDelegate> class:

  • Body: Retrieve the body of the expression.

  • Parameters: Retrieve the parameters of the lambda expression.
  • NodeType: Gets the ExpressionType for some node in the tree. This is an enumeration of 45 different values, representing all the different possible types of expression nodes, such as those that return constants, those that return parameters, those that decide whether one value is less than another (<), those that decide if one is greater than another (>), those that add values together (+), etc.
  • Type: Gets the static type of the expression. In this case, the expression is of type Func<int, int, int>.

If we collapse the nodes of the tree shown in Figure 1, the four properties of the Expression<TDelegate> class are clearly visible.

Figure 02: By collapsing the nodes of the expression tree, you can easily see the four main properties of the Expression<TDelegate> class.

You can use these properties to start exploring the expression tree. For instance, you can find out the names of the parameters passed to it:

Console.WriteLine("Param 1: {0}, Param 2: {1}", expression.Parameters[0], expression.Parameters[1]);

This code prints out the values a and b:

Param 1: a, Param 2: b

It should be easy to locate these values in the ParameterExpression nodes shown in Figure 01.

The following code lets us explore the Body of the expression, which in this case is (a + b):

BinaryExpression body = (BinaryExpression)expression.Body;
ParameterExpression left = (ParameterExpression)body.Left;
ParameterExpression right = (ParameterExpression)body.Right;
Console.WriteLine(expression.Body);
Console.WriteLine(" The left part of the expression: " +
"{0}{4} The NodeType: {1}{4} The right part: {2}{4} The Type: {3}{4}",
left.Name, body.NodeType, right.Name, body.Type, Environment.NewLine);

This code yields the following output:

(a + b)
  The left part of the expression: a
  The NodeType: Add
  The right part of the expression: b
  The Type: System.Int32

Again, you should find it fairly easy to locate this information in the Body node shown at the top of Figure 01.

By exploring the expression tree, we are able to analyze the parts of the expression and discover its components. You can see that all the elements of our expression have been represented as nodes of a data structure. The expression tree is code converted into data.

Compiling an Expression: Converting Data back into Code

If we can convert code into data, then we ought to be able to convert the data back into code. Here is the simple code that asks the compiler to convert an expression tree into executable code:

int result = expression.Compile()(3, 5);
Console.WriteLine(result);

This code would print out the value 8, the same value as the simple lambda function we declared at the start of this section.

IQueryable<T> and Expression Trees

Now that you have at least an abstract conceptual understanding of an expression tree, it is time to loop back and understand the key role that it plays in LINQ, and particularly in LINQ to SQL. Take a moment to consider this canonical LINQ to SQL query expression:

var query = from c in db.Customers
where c.City == "Nantes"
select new { c.City, c.CompanyName };

As you probably know, the variable query that is returned by this LINQ expression is of type IQueryable. Here is the declaration of IQueryable:

public interface IQueryable : IEnumerable
{
Type ElementType { get; }
Expression Expression { get; }
IQueryProvider Provider { get; }
}

As you can see, IQueryable contains a property of type Expression, this is the base type from which most variants of Expression<T> are derived. It is designed to hold the expression tree associated with an instance of IQueryable. It is a data structure equivalent to the executable code found in a query expression.

Take a moment to consider the image shown in Figure 03. You will probably need to double-click on it so you can pop up a full sized version of the image. This is the visualization of the expression tree for the query expression shown at the beginning of this section of the post. The ExpressionTreeVisualizer sample was used to create this image, just as I used that sample to create the visualization of the lambda based expression tree shown in Figure 01.

Figure 03: The complex expression tree generated by the simple LINQ to SQL query expression shown above. (Click the image to see the full picture.)

Why Convert a LINQ to SQL Query Expression into an Expression Tree?

You have learned that an expression tree is a data structure that represents executable code. But so far we have not answered the central question of why one would want to make such a conversion. This is the question we asked at the beginning of this post, and it is now time to answer it.

A LINQ to SQL query is not executed inside your C# program. Instead, it is translated into SQL, sent across a wire, and executed on a database server. In other words, the following code is never actually executed inside your program:

var query = from c in db.Customers
where c.City == "Nantes"
select new { c.City, c.CompanyName };

It is first translated into the following SQL statement and then executed on a server:

SELECT [t0].[City], [t0].[CompanyName]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[City] = @p0

The code found in a query expression has to be translated into a SQL query that can be sent to another process as a string. In this case that process happens to be a SQL server database. It is obviously going to be much easier to translate a data structure such as an expression tree into SQL than it is to translate raw IL or executable code into SQL. To exaggerate the difficulty of the problem somewhat, just imagine trying to translate a series of zeros and ones into SQL!

When it is time to translate your query expression into SQL, the expression tree representing your query is taken apart and analyzed, just as we took apart our simple lambda expression tree in the previous section. Granted, the algorithm for parsing the LINQ to SQL expression tree is much more sophisticated than the one we used, but the principle is the same. Once it has analyzed the parts of the expression tree, then LINQ mulls them over and decides the best way to write a SQL statement that will return the requested data.

Expression trees were created in order to make the task of converting code such as a query expression into a string that can be passed to some other process and executed there. It is that simple. There is no great mystery here, no magic wand that needs to be waved. One simply takes code, converts it into data, and then analyzes the data to find the constituent parts that will be translated into a string that can be passed to another process.

Because the query comes to the compiler encapsulated in such an abstract data structure, the compiler is free to interpret it in almost any way it wants. It is not forced to execute the query in a particular order, or in a particular way. Instead, it can analyze the expression tree, discover what you want done, and then decide how to do it. At least in theory, it has the freedom to consider any number of factors, such as the current network traffic, the load on the database, the current results sets it has available, etc. In practice LINQ to SQL does not consider all these factors, but it is free in theory to do pretty much what it wants. Furthermore, one could pass this expression tree to some custom code you write by hand which could analyze it and translate it into something very different from what is produced by LINQ to SQL.

IQueryable<T> and IEnumerable<T>

As you probably know, the query expressions found in LINQ to Objects return IEnumerable<T> rather than IQueryable<T>. Why does LINQ to Objects use IEnumerable<T> and LINQ to SQL use IQueryable<T>?

Here is the declaration for IEnumerable<T>:

public interface IEnumerable<T> : IEnumerable
{
IEnumerator<T> GetEnumerator();
}

As you can see, IEnumerable<T> does not contain a field of type Expression. This points to a fundamental difference between LINQ to Objects and LINQ to SQL. The latter makes heavy use of expression trees, but LINQ to Objects uses them rarely.

Why are expression trees not a standard part of LINQ to Objects? Though the answer may or may not come to you instantly, it should make sense to you once you hear it.

Consider this simple LINQ to Objects query expression:

List<int> list = new List<int>() { 1, 2, 3 };

var query = from number in list
where number < 3
select number;

This LINQ query returns the numbers from our list that are smaller than three; that is, it returns the numbers 1 and 2. Clearly it is not necessary to translate this query into a string that can be passed to another process in order to get the proper answer. Instead, one can convert the query expression directly into .NET code that can be executed. There is no need to translate it into a string or perform any other complex operation on it.

Though this is a bit of a generalization and in practice the lines may be blurred a bit in particular cases, overall the rules are fairly simple:

  • If code can be executed in process then the job can be done with the simple type called IEnumerable<T>

  • If you need to translate a query expression into a string that will be passed to another process, then use IQueryable<T> and expression trees.

Projects like LINQ to Amazon that need to translate query expressions into web service calls that execute out of process will typically make use of IQueryable<T> and expression trees. LINQ to Amazon converts its query expressions into data that a web service can pass to another process that might not even be written in C#. When translating C# code into something that can be passed to a web service the abstractions inherent in an expression tree are very useful. Code that will execute in process, however, can frequently get by without expression trees. The following query, for instance, uses IEnumerable<T> because it calls into the .NET reflection API which is hosted in process:

var query = from method in typeof(System.Linq.Enumerable).GetMethods()
orderby method.Name
group method by method.Name into g
select new { Name = g.Key, Overloads = g.Count() };

Summary

This post covered a few basic facts about expression trees. These data structures expose and delineate the parts of an expression by translating code into data. At least on the conceptual level, expression trees are quite simple to understand. They take an executable expression and capture its parts in a tree-like data structure. For instance, we examined this simple expression:

(a,b) => a + b;

By exploring the tree derived from this expression you saw the basis for the algorithm used to create the tree shown in Figure 01.

You also saw that expression trees play an especially important role in LINQ to SQL. In particular, they are the data abstraction that is used to capture the logic held in a LINQ to SQL query expression. This data is parsed and analyzed so that a SQL statement can be derived from it and sent to the server.

LINQ makes querying a first class citizen of the C# language that is both type checked and IntelliSense aware. Code of that is type checked and IntelliSense aware must use a true C# syntax that can be directly converted into executable code just as any other C# code be translated and executed. Expression trees make it relatively easy to convert that executable code into SQL statements that can be passed to a server.

Queries that return IEnumerable<T> rather than IQueryable<T> typically do not use expression trees. As a general rule, one can say that LINQ queries that execute in process do not need expression trees while code that executes out of process can make good use of the abstractions inherent in an expression tree.

Expression Tree Basics 表达式树原理的更多相关文章

  1. Python与数据结构[3] -> 树/Tree[1] -> 表达式树和查找树的 Python 实现

    表达式树和查找树的 Python 实现 目录 二叉表达式树 二叉查找树 1 二叉表达式树 表达式树是二叉树的一种应用,其树叶是常数或变量,而节点为操作符,构建表达式树的过程与后缀表达式的计算类似,只不 ...

  2. [.net 面向对象程序设计进阶] (6) Lamda表达式(二) 表达式树快速入门

    [.net 面向对象程序设计进阶] (6) Lamda表达式(二) 表达式树快速入门 本节导读: 认识表达式树(Expression Tree),学习使用Lambda创建表达式树,解析表达式树. 学习 ...

  3. C# - LINQ 表达式树

    表达式树(Expression Tree) 表达式树是不可执行的代码,它只是用于表示一种树状的数据结构,树上的每一个节点都表示为某种表达式类型,大概有25种表达式类型,它们都派生自Expression ...

  4. 反射,Expression Tree,IL Emit 属性操作对比

    .net的反射(Reflection) 是.Net中获取运行时类型信息的一种方法,通过反射编码的方式可以获得 程序集,模块,类型,元数据等信息. 反射的优点在于微软提供的API调用简单,使用方便: 表 ...

  5. 【C#表达式树 开篇】 Expression Tree - 动态语言

    .NET 3.5中新增的表达式树(Expression Tree)特性,第一次在.NET平台中引入了"逻辑即数据"的概念.也就是说,我们可以在代码里使用高级语言的形式编写一段逻辑, ...

  6. [C#] C# 知识回顾 - 表达式树 Expression Trees

    C# 知识回顾 - 表达式树 Expression Trees 目录 简介 Lambda 表达式创建表达式树 API 创建表达式树 解析表达式树 表达式树的永久性 编译表达式树 执行表达式树 修改表达 ...

  7. 不可不知的表达式树(1)Expression初探

    说起Lambda表达式,大家基本都很熟悉了,而表达式树(Expression Trees),则属于80%的工作中往往都用不到的那种技术,所以即便不是什么新技术,很多人对其理解都并不透彻.此文意图从表达 ...

  8. C# 知识回顾 - 表达式树 Expression Trees

    C# 知识回顾 - 表达式树 Expression Trees 目录 简介 Lambda 表达式创建表达式树 API 创建表达式树 解析表达式树 表达式树的永久性 编译表达式树 执行表达式树 修改表达 ...

  9. jQuery find() 搜索所有段落中的后代 C# find() 第一个匹配元素 Func 有返回值 Action是没有返回值 Predicate 只有一个参数且返回值为bool 表达式树Expression

    所有p后代span Id为 TotalProject 的 select 标签 的后代 option标签 为选中的 text using System; using System.Collections ...

随机推荐

  1. 在Ubuntu14.04系统POWER8服务器上搭建Docker Registry服务

    本文描述了如何在POWER8服务器上搭建一个本地化的Docker镜像仓库,主要涉及镜像制作,Docker Registry服务启动等.希望能够对在非X86服务器上搭建Docker仓库的同学提供参考. ...

  2. CDN(翻译)

    http://dojotoolkit.org/documentation/tutorials/1.10/cdn/index.html 从CDN上加载dojo模块时非常有用的.从CDN上加载Dojo并同 ...

  3. vue.js组件

    <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title&g ...

  4. va_copy

    #include <stdio.h> #include <stdarg.h> void func(char* format, va_list ptr); void print( ...

  5. mysql重复记录的查询删除方法

    1.查找表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断select * from peoplewhere peopleId in (select   peopleId from  ...

  6. mybatis的动态sql及模糊查询

    1.动态sql 使用类似于jstl表达式来实现 2.模糊查找 用一个对象来封装条件 步骤: 1)新建一个条件实体 package com.hy.mybatis.entity; public class ...

  7. jenkins中submodule的使用

    尝试过各种插件配置都不行. 最后只好通过命令来更新Submodule了. 首先把Git更新到最新,为什么要更新等会儿再说. 项目里的Submodule要配置好,这是基本的. (检查是否配置好的方法: ...

  8. Starling项目启动时报错This application is not correctly embedded (wrong wmode value)

    这个其实是Flash报的“ErrorEvent:. text=Error #3702: Context3D 不可用”错误被Starling变了一下. 解决方法:找到项目文件夹中的html-templa ...

  9. EJB 教程推荐

    EJB教程 EJB概述 EJB创建应用 EJB无状态Bean EJB有状态会话Bean EJB持久性 EJB消息驱动Bean EJB注解 EJB回调 EJB定时器服务 EJB依赖注入 EJB拦截器 E ...

  10. C/C++二维数组分配内存

    //C++方式 double **Q=new double*[row];    //初始化Q矩阵 for(int i=0;i<row;++i) Q[i]=new double[POS_NUM]( ...