mongodb聚合(转)

聚合是泛指各种可以处理批量记录并返回计算结果的操作。MongoDB提供了丰富的聚合操作，用于对数据集执行计算操作。在 mongod 实例上执行聚合操作可以大大简化应用的代码，并降低对资源的消耗。

聚合有比较简单的 count 计算总数；distinct去重；group by 分组。也有比较复杂的管道聚合。下面将分别讲述。

appuser 集合 具有如下文档

{name:"人间四月",age:,"locate":" 北京"}

{name:"dolphin",age:,"locate":" 北京"}

{name:"yunsheng",age:,"locate":" 天津"}

{name:"shark",age:,"locate":" 天津"}

{name:"babywang",age:,"locate":" 四川"}

count 返回符合查询条件的文档总数

使用mongodb命令查询北京地区的注册人数

db.appuser.count({locate:"北京"})

返回结果是 []

distinct去除重复操作返回查询到的指定字段值不重复的记录

使用mongodb命令查询用户来自于哪些地区

db.appuser.distinct("locate")

返回结果是 [" 北京"," 天津","四川"]

count distinct 配合使用

使用mongodb命令查询用户来源的地区数量

db.runCommand({"distinct":"appuser","key":"locate"}).values.length

返回结果是

group 操作会把查询到的文档按照给定的字段值进行分组。分组操作会返回一个文档数组，其中的每个文档包含了一组文档的计算结果

group 命令不能在分片集合上运行。特别需要注意一点， group 操作的结果集大小不能超过16MB。

mongodb命令查询各个地区 年龄最大的用户

db.appuser.group({

key:{locate:""},

initial:{age:},

reduce:function(cur, result){

 if(cur.age>result.age)

   result.age = cur.age;

   result.name = cur.name;

}})

查询返回结果是

[ {

waitedMS:NumberLong(),

retval:[{

             locate:" 北京",

             age:22.0,

             name:"dolphin"

         },{

             locate:" 天津",

             age:23.0,

             name:"shark"

         },{

             locate:" 四川",

             age:25.0,

             name:"babywang"

         }],

count:NumberLong(),

keys:NumberLong(),

ok:1.0

}]

group $keyf 有时我们需要对分组的字段做一些处理。

mongodb命令 对名字长度分组，找出每个分组中年龄最大的用户。

db.appuser.group({

$keyf:function(doc){return {namelength:doc.name.length};},

initial:{age:},

reduce:function(cur, result){

 if(cur.age>result.age)

   result.age = cur.age;

   result.name = cur.name;

}

})

返回结果[

 {

     waitedMS:NumberLong(),

     retval:[

         {

             namelength:4.0,

             age:20.0,

             name:"人间四月"

         },

         {

             namelength:7.0,

             age:22.0,

             name:"dolphin"

         },

         {

             namelength:8.0,

             age:25.0,

             name:"babywang"

         },

         {

             namelength:5.0,

             age:23.0,

             name:"shark"

         }

     ],

     count:NumberLong(),

     keys:NumberLong(),

     ok:1.0

 }

]

group finalize 针对分组后的每一个分组结果做相应操作

mongodb命令 对名字长度分组，找出每个分组中年龄最大的用户，最后对每个人的年龄加10

db.appuser.group({

$keyf:function(doc){return {namelength:doc.name.length};},

initial:{age:},

finalize:function(doc){

   doc.age=doc.age+;

},

reduce:function(cur, result){

 if(cur.age>result.age)

     result.age = cur.age;

     result.name = cur.name;

}

})

作者：dolphin叔叔
链接：http://www.jianshu.com/p/5b32d7612d08
來源：简书

聚合管道的功能简单来说就分两种：

对文档进行“过滤”，也就是筛选出符合条件的文档;
对文档进行“变换”，也就是改变文档的输出形式。

####$project#### 1.我们有这样的数据

  {

     "_id" : 1,

     title: "abc123",

     isbn: "0001122223334",>

     author: { last: "zzz", first: "aaa" },

     copies: 5

  }

现在使用project来变换输出

 db.books.aggregate(

     [

          { $project : { title : 1 , author : 1 } }

     ]

 )

可以得

 {

    "_id" : 1,

    "title" : "abc123",

    "author" : { "last" : "zzz", "first" : "aaa" }

 }

在$project里，我们指明（筛选）了要显示的数据，title和author，_id是自带的，可以用 _id：0 来将其过滤掉

2.我们现在有基础数据

 {

     "_id" : 1,

     title: "abc123",

     isbn: "0001122223334",

     author: { last: "zzz", first: "aaa" },

     copies: 5

 }

但是我们需要变换他的输出形式，我们就可以这样

db.books.aggregate(

 [

   {

      $project: {

         title: 1,

         isbn: {

            prefix: { $substr: [ "$isbn", 0, 3 ] },

            group: { $substr: [ "$isbn", 3, 2 ] },

            publisher: { $substr: [ "$isbn", 5, 4 ] },

            title: { $substr: [ "$isbn", 9, 3 ] },

            checkDigit: { $substr: [ "$isbn", 12, 1] }

         },

         lastName: "$author.last",

         copiesSold: "$copies"

      }

   }

 ]

 )

在isbn内部的键 prefix,group,publisher,title，checkDigit,外部的lastName,copiesSold都是我们自己定义的。 $substr取字串，$isbn是字串键名，第二参数是字串起始位置，第三参数是取几个。

最后结果

 {

  "_id" : 1,

  "title" : "abc123",

  "isbn" : {

      "prefix" : "000",

      "group" : "11",

      "publisher" : "2222",

      "title" : "333",

      "checkDigit" : "4"

  },

  "lastName" : "zzz",

  "copiesSold" : 5

 }

数据源没有变，但是我们改变的数据显示的方式。

####$match#### 过滤数据，过滤完的数据，接下来用作其他用。（水龙头上的过滤器，过滤干净的水，接下来淘米，煮饭都可以，貌似扯远了。。。回！）

1.例子

  db.articles.aggregate(

     [

          { $match : { author : "dave" } }

     ]

  );

过滤条件为键 author 值为 dave

结果为

 {

  "result" : [

               {

                 "_id" : ObjectId("512bc95fe835e68f199c8686"),

                 "author": "dave",

                 "score" : 80

               },

               { "_id" : ObjectId("512bc962e835e68f199c8687"),

                  "author" : "dave",

                  "score" : 85

               }

            ],

  "ok" : 1

 }

2.再看一例

 db.articles.aggregate(

 [

               { $match : { score : { $gt : 70, $lte : 90 } } },

               { $group: { _id: null, count: { $sum: 1 } } }

 ]

 );

这次有两步：

第一步，过滤键 score 值大于70 且小于等于90 的文档，
再用group 对文档用 count 统计，统计方式 $sum 求和，步长为1。

因为group操作必须有个_id,所以给其置null。

结果为

 {

  "result" : [

               {

                 "_id" : null,

                 "count" : 3

               }

             ],

  "ok" : 1

 }

####$cond#### 判断用的，可以跟if then 语句

1.举例

{ "_id" : 1, "item" : "abc1", qty: 300 }

{ "_id" : 2, "item" : "abc2", qty: 200 }

{ "_id" : 3, "item" : "xyz1", qty: 250 }

现在我们想根据qty的值来生成新的数据（值）

db.inventory.aggregate(

[

  {

     $project:

       {

         item: 1,

         discount:

           {

             $cond: { if: { $gte: [ "$qty", 250 ] }, then: 30, else: 20 }

           }

       }

  }

 ]

 )

结果为

 { "_id" : 1, "item" : "abc1", "discount" : 30 }

 { "_id" : 2, "item" : "abc2", "discount" : 20 }

 { "_id" : 3, "item" : "xyz1", "discount" : 30 }

可以发现，discount是我们新的键，它根据cond的if判断后，分别被赋上了相应的值（then和else可以省略）

####$limit#### 限制个数

1.例子

db.article.aggregate(

    { $limit : 5 }

);

值得注意的是，当聚合操作中同时出现sort和limit， sort只会对通过limit的数据排序，内存中也仅会存储通过limit的数据。

####$skip#### 略过N个

1.例子

db.article.aggregate(

   { $skip : 5 }

);

####$unwind#### 拆解数组集合

1.例子

 {

      "_id" : 1,

      "item" : "ABC1",

      sizes: [ "S", "M", "L"]

 }

现在对sizes进行拆解

 db.inventory.aggregate(

    [

        { $unwind : "$sizes" }

    ]

 )

结果

 { "_id" : 1, "item" : "ABC1", "sizes" : "S" }

 { "_id" : 1, "item" : "ABC1", "sizes" : "M" }

 { "_id" : 1, "item" : "ABC1", "sizes" : "L" }

我们可以看到sizes里每一个数据被拆解到每一个文档里了，除了sizes 的值不同外，其他相同。

$unwind与$group组合可以实现distinct

####$group#### 先分组，再合并

1.例子

{ "_id" : { "month" : 3, "day" : 15, "year" : 2014 },

      "totalPrice" : 50, "averageQuantity" : 10, "count" : 1 }

{ "_id" : { "month" : 4, "day" : 4, "year" : 2014 },

      "totalPrice" : 200, "averageQuantity" : 15, "count" : 2 }

{ "_id" : { "month" : 3, "day" : 1, "year" : 2014 },

      "totalPrice" : 40, "averageQuantity" : 1.5, "count" : 2 }

_id 为分组依据，_id 为null，及不分组，直接合并。

合并依据：

键 totalPrice 保存键 price 和键 quantity 值的乘积的和
键averageQuantity 保存键 quantity 的值的平均值
键 count 作统计

db.sales.aggregate( [ { $group : { _id : null, totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } } ] )

结果

 { "_id" : null, "totalPrice" : 290, "averageQuantity" : 8.6, "count" : 5 }

2.再看一例

 { "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2,

            "date" : ISODate("2014-03-01T08:00:00Z") }

 { "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1,

            "date" : ISODate("2014-03-01T09:00:00Z") }

 { "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10,

            "date" : ISODate("2014-03-15T09:00:00Z") }

 { "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20,

            "date" : ISODate("2014-04-04T11:21:39.736Z") }

 { "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10,

            "date" : ISODate("2014-04-04T21:23:13.331Z") }

咱们依据键 item 分组

  db.sales.aggregate( [ { $group : { _id : "$item" } } ] )

结果

 { "_id" : "xyz" }

 { "_id" : "jkl" }

 { "_id" : "abc" }

####$sort#### 排序

1.例子

db.users.aggregate(

  [

     { $sort : { age : -1, posts: 1 } }

  ]

)

对键age 顺序排序，对键 posts 逆序排序

####$out#### 创建指定副本集合

1.例子

   { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }

   { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }

   { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }

   { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }

   { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }

对其按author分组，然后out一个新集合authors

  db.books.aggregate( [

                  { $group : { _id : "$author", books: { $push: "$title" } } },

                  { $out : "authors" }

  ] )

结果

 { "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }

 { "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

我们现在看到的只是数据映射，不是实体文档，

但是在authors里，有映射副本存成的文档。

也就是说 $out 可以创建新的集合，存储聚合后的文档映射。

转自https://github.com/qianjiahao/MongoDB/wiki/MongoDB%E4%B9%8B%E8%81%9A%E5%90%88%E7%AE%A1%E9%81%93%E4%B8%8A