Siddhi CEP Window机制
https://docs.wso2.com/display/CEP400/SiddhiQL+Guide+3.0#SiddhiQLGuide3.0-Window
https://docs.wso2.com/display/CEP400/Inbuilt+Windows#InbuiltWindows
http://wso2.com/library/articles/2013/06/understanding-siddhi-powers-wso2-cep-2x/
https://docs.wso2.com/display/CEP400/Samples+on+Processing+Events
windows机制有点晦涩,而且例子给的也不充分,这里详细看看。
基本语法:
from
<input stream
name
>[<filter condition>]#window.<window
name
>(<parameter>, <parameter>, ... )
select
<attribute
name
>, <attribute
name
>, ...
insert [current events | expired events | all events]
into
<
output
stream
name
>
window.length
直接看个例子,这里用expired event,但使用的时候往往不用expired
"define stream cseEventStream (symbol string, price float, volume long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.length(6)" +
"select symbol, price, avg(price) as ap, sum(price) as sp, count(price) as cp " +
"group by symbol " +
"insert expired events into outputStream;";
简单解释下,
define,定义stream,stream中每个event的结构
@info,可选,定义query的名字
query的含义,对于cseEventStream,当price<700时,生成length为4的窗口
那么当windows的length超过4的时候,就会产生expired event,此时就会触发insert操作
insert的内容取决于select
下面我输入如下的流数据,
int i = 0;
while (i < 10) {
float p = i*10;
inputHandler.send(new Object[]{"WSO2", p, 100});
System.out.println("\"WSO2\", " + p);
inputHandler.send(new Object[] {"IBM", p, 100});
System.out.println("\"IBM\", " + p);
Thread.sleep(1000);
i++;
}
得到的结果部分如下,
"WSO2", 0.0
"IBM", 0.0
"WSO2", 10.0
"IBM", 10.0
"WSO2", 20.0
"IBM", 20.0
"WSO2", 30.0
receive events: 1
Event{timestamp=1447906176329, data=[WSO2, 0.0, 15.0, 30.0, 2], isExpired=false}
"IBM", 30.0
receive events: 1
Event{timestamp=1447906176331, data=[IBM, 0.0, 15.0, 30.0, 2], isExpired=false}
"WSO2", 40.0
receive events: 1
Event{timestamp=1447906177331, data=[WSO2, 10.0, 25.0, 50.0, 2], isExpired=false}
"IBM", 40.0
receive events: 1
Event{timestamp=1447906177331, data=[IBM, 10.0, 25.0, 50.0, 2], isExpired=false}
解释下,可以说明几个问题,
1. window length = 6, 所以当发出第7个event的时,会触发expired
2. 此时,outputStream就会收到这条expired的event
3. 从这个event当然我们可以得到该event的所有信息,并且还可以通过aggregate functions来得到当前window中的所有events的统计值
这个地方很难以理解,得到的event只是expired的,无法得到window中的所有event,但用aggre func却可以对window你们的events做统计
这里我们做了3个统计,平均值,sum, count,这样你可以看出avg是怎么算出来的?
比如,对于Event{timestamp=1447906176329, data=[WSO2, 0.0, 15.0, 30.0, 2], isExpired=false}
由于我们加了groupby,所以只会针对symbol=wso2的做统计,
当我们发送"WSO2", 30.0 时,会触发"WSO2", 0.0的过期,你会发现这时候去统计,这两条event都会被排除在外,参加统计的如下
"IBM", 0.0 "WSO2", 10.0 "IBM", 10.0 "WSO2", 20.0 "IBM", 20.0
所以,count为2, sum为30,而avg=15
如果不加groupby的结果如下,
"WSO2", 0.0
"IBM", 0.0
"WSO2", 10.0
"IBM", 10.0
"WSO2", 20.0
"IBM", 20.0
"WSO2", 30.0
receive events: 1
Event{timestamp=1447913986723, data=[WSO2, 0.0, 12.0, 60.0, 5], isExpired=false}
"IBM", 30.0
receive events: 1
Event{timestamp=1447913986725, data=[IBM, 0.0, 18.0, 90.0, 5], isExpired=false}
这样就不会管symbol是什么,会把window里面的全相加
这里expired event是可选的,还有current event和all event,
expired event是当event expired时触发,那么current event就是当event达到时触发,all event就是两种情况都触发,
下面我们看看如果换成all event,会是什么结果,我测的结果是和current event一样的,只会在event到达的时候触发,bug?
"WSO2", 10.0
"IBM", 10.0
receive events: 1
Event{timestamp=1447914310502, data=[WSO2, 10.0, 5.0, 10.0, 2], isExpired=false}
receive events: 1
Event{timestamp=1447914310502, data=[IBM, 10.0, 5.0, 10.0, 2], isExpired=false}
"WSO2", 20.0
"IBM", 20.0
receive events: 1
Event{timestamp=1447914311503, data=[WSO2, 20.0, 10.0, 30.0, 3], isExpired=false}
receive events: 1
Event{timestamp=1447914311503, data=[IBM, 20.0, 10.0, 30.0, 3], isExpired=false}
"WSO2", 30.0
"IBM", 30.0
receive events: 1
Event{timestamp=1447914312503, data=[WSO2, 30.0, 20.0, 60.0, 3], isExpired=false}
receive events: 1
Event{timestamp=1447914312503, data=[IBM, 30.0, 20.0, 60.0, 3], isExpired=false}
"WSO2", 40.0
"IBM", 40.0
receive events: 1
Event{timestamp=1447914313503, data=[WSO2, 40.0, 30.0, 90.0, 3], isExpired=false}
receive events: 1
Event{timestamp=1447914313503, data=[IBM, 40.0, 30.0, 90.0, 3], isExpired=false}
window.time
这个和length是一样的,只是触发条件是time
"define stream cseEventStream (symbol string, price float, volume long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.time(2 sec)" +
"select symbol, price, avg(price) as ap, sum(price) as sp, count(price) as cp " +
"group by symbol " +
"insert expired events into outputStream;";
得到结果如下,
"WSO2", 0.0
"IBM", 0.0
"WSO2", 10.0
"IBM", 10.0
"WSO2", 20.0
"IBM", 20.0
receive events: 1
Event{timestamp=1447915287974, data=[WSO2, 0.0, 10.0, 10.0, 1], isExpired=false}
receive events: 1
Event{timestamp=1447915287977, data=[IBM, 0.0, 15.0, 30.0, 2], isExpired=false}
"WSO2", 30.0
"IBM", 30.0
receive events: 2
Event{timestamp=1447915288975, data=[WSO2, 10.0, 20.0, 20.0, 1], isExpired=false}
Event{timestamp=1447915288975, data=[IBM, 10.0, 20.0, 20.0, 1], isExpired=false}
可以看到,这里expire是根据时间的,所以expire不一定是在event来的时候判断,而是根据scheduled timer,如下图,
所以在算统计的时候,取决于当时间timer被触发时,window里面有几个event,所以上面的结果有可能是1,也有可能是2
window.lengthBatch;timeBatch
这种window就是非sliding的,直接看例子,
"define stream cseEventStream (symbol string, price float, volume long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.lengthBatch(4)" +
"select symbol, price " +
"insert expired events into outputStream;";
仍然是上面的输入,得到结果,
"WSO2", 0.0
"IBM", 0.0
"WSO2", 10.0
"IBM", 10.0
"WSO2", 20.0
"IBM", 20.0
"WSO2", 30.0
"IBM", 30.0
receive events: 4
Event{timestamp=1447923776094, data=[WSO2, 0.0], isExpired=false}
Event{timestamp=1447923776094, data=[IBM, 0.0], isExpired=false}
Event{timestamp=1447923776094, data=[WSO2, 10.0], isExpired=false}
Event{timestamp=1447923776094, data=[IBM, 10.0], isExpired=false}
"WSO2", 40.0
"IBM", 40.0
"WSO2", 50.0
"IBM", 50.0
receive events: 4
Event{timestamp=1447923778094, data=[WSO2, 20.0], isExpired=false}
Event{timestamp=1447923778094, data=[IBM, 20.0], isExpired=false}
Event{timestamp=1447923778094, data=[WSO2, 30.0], isExpired=false}
Event{timestamp=1447923778094, data=[IBM, 30.0], isExpired=false}
可以看到,lengthBatch设为4,当window的length达到8的时候,才触发expired
每次以一个batch进行expire,所以每次收到4条events,并且不重复的,所以window是没有sliding的
再看过timeBatch的例子,这次用 all event
"define stream cseEventStream (symbol string, price float, volume long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.timeBatch(3 sec)" +
"select symbol, price " +
"insert all events into outputStream;";
结果如下,我们每发一组会sleep 1s,所以发6组后触发第一次expired,expire 6条events
并且可以看到,这次除了expire,在event reach的时候也会触发output,因为这次我们用的是all event
"WSO2", 0.0
"IBM", 0.0
"WSO2", 10.0
"IBM", 10.0
"WSO2", 20.0
"IBM", 20.0
receive events: 6
Event{timestamp=1447924146613, data=[WSO2, 0.0], isExpired=false}
Event{timestamp=1447924146614, data=[IBM, 0.0], isExpired=false}
Event{timestamp=1447924147614, data=[WSO2, 10.0], isExpired=false}
Event{timestamp=1447924147614, data=[IBM, 10.0], isExpired=false}
Event{timestamp=1447924148614, data=[WSO2, 20.0], isExpired=false}
Event{timestamp=1447924148614, data=[IBM, 20.0], isExpired=false}
"WSO2", 30.0
"IBM", 30.0
"WSO2", 40.0
"IBM", 40.0
"WSO2", 50.0
"IBM", 50.0
receive events: 12
Event{timestamp=1447924152571, data=[WSO2, 0.0], isExpired=false}
Event{timestamp=1447924152571, data=[IBM, 0.0], isExpired=false}
Event{timestamp=1447924152571, data=[WSO2, 10.0], isExpired=false}
Event{timestamp=1447924152571, data=[IBM, 10.0], isExpired=false}
Event{timestamp=1447924152571, data=[WSO2, 20.0], isExpired=false}
Event{timestamp=1447924152571, data=[IBM, 20.0], isExpired=false}
Event{timestamp=1447924149614, data=[WSO2, 30.0], isExpired=false}
Event{timestamp=1447924149614, data=[IBM, 30.0], isExpired=false}
Event{timestamp=1447924150614, data=[WSO2, 40.0], isExpired=false}
Event{timestamp=1447924150614, data=[IBM, 40.0], isExpired=false}
Event{timestamp=1447924151614, data=[WSO2, 50.0], isExpired=false}
Event{timestamp=1447924151614, data=[IBM, 50.0], isExpired=false}
但对于这样的场景,我们一般的需求是,对于batch做些统计, 例子,
"define stream cseEventStream (symbol string, price float, volume long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.lengthBatch(4) " +
"select symbol, price, avg(price) as avgPrice " +
"group by symbol " +
"insert into outputStream;";
得到的结果,
"WSO2", 0.0
"IBM", 0.0
"WSO2", 10.0
"IBM", 10.0
receive events: 2
Event{timestamp=1447991871794, data=[WSO2, 10.0, 5.0], isExpired=false}
Event{timestamp=1447991871794, data=[IBM, 10.0, 5.0], isExpired=false}
"WSO2", 20.0
"IBM", 20.0
"WSO2", 30.0
"IBM", 30.0
receive events: 2
Event{timestamp=1447991873795, data=[WSO2, 30.0, 25.0], isExpired=false}
Event{timestamp=1447991873795, data=[IBM, 30.0, 25.0], isExpired=false}
可以看到,对于batch中的数据可以groupby,并进行avg统计,
注意这里,不要用expired events,否则aggre结果一直为0,因为对于batch,每次expire完后,window里面是空的。
window.externalTime
https://docs.wso2.com/display/CEP400/Sample+0114+-+Using+External+Time+Windows
这个挺有用,可以以外部的时间进行slide window,因为大部分时间可能是根据采集时间,而非到达时间做聚合
但局限在于,externalTime必须递增的,有时候在实际场景中,无法保证严格的时序。
看例子,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.externalTime(time, 3 sec) " +
"select symbol, price, time, sum(price) as ap, count(price) as cp " +
"group by symbol " +
"insert expired events into outputStream;";
发送的代码如下,
int i = 0;
long time = 1447921187000L;
while (i < 10) {
float p = i*10;
inputHandler.send(new Object[]{"WSO2", p, time});
System.out.println("\"WSO2\", " + p + ", " + time);
inputHandler.send(new Object[] {"IBM", p, time});
System.out.println("\"IBM\", " + p + ", " + time);
Thread.sleep(1000);
i++;
time = time + 1000;
}
目的,就是按外部时间time,进行sliding window,结果如下,
"WSO2", 0.0, 1447921187000
"IBM", 0.0, 1447921187000
"WSO2", 10.0, 1447921188000
"IBM", 10.0, 1447921188000
"WSO2", 20.0, 1447921189000
"IBM", 20.0, 1447921189000
"WSO2", 30.0, 1447921190000
"IBM", 30.0, 1447921190000
receive events: 2
Event{timestamp=1447921190000, data=[WSO2, 0.0, 1447921187000, 30.0, 2], isExpired=false}
Event{timestamp=1447921190000, data=[IBM, 0.0, 1447921187000, 30.0, 2], isExpired=false}
"WSO2", 40.0, 1447921191000
"IBM", 40.0, 1447921191000
receive events: 2
Event{timestamp=1447921191000, data=[WSO2, 10.0, 1447921188000, 50.0, 2], isExpired=false}
Event{timestamp=1447921191000, data=[IBM, 10.0, 1447921188000, 50.0, 2], isExpired=false}
可以看到根据传入的time,当收到"WSO2", 30.0, 1447921190000 时触发3秒的过期
其他的和普通的sliding window没有区别
window.cron
https://docs.wso2.com/display/CEP400/Sample+0115+-+Quartz+scheduler+based+alerts
定时任务,其实用timeBatch也可以实现,只是cron更方便些
例子,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.cron('*/4 * * * * ?') " +
"select symbol, time, sum(price) as ap, count(price) as cp " +
"group by symbol " +
"insert into outputStream;";
关键是要理解cron的语法,参考http://www.cnblogs.com/wangyuyu/p/4230742.html
Siddhi的语法多了秒,所以第一个是秒,*/4,即每4秒触发一次
得到结果如下,可以看到确实是每4秒触发一次
"WSO2", 10.0, 1447921188000
"IBM", 10.0, 1447921188000
"WSO2", 20.0, 1447921189000
"IBM", 20.0, 1447921189000
"WSO2", 30.0, 1447921190000
"IBM", 30.0, 1447921190000
"WSO2", 40.0, 1447921191000
"IBM", 40.0, 1447921191000
receive events: 2
Event{timestamp=1448006719652, data=[WSO2, 1447921191000, 100.0, 4], isExpired=false}
Event{timestamp=1448006719652, data=[IBM, 1447921191000, 100.0, 4], isExpired=false}
"WSO2", 50.0, 1447921192000
"IBM", 50.0, 1447921192000
"WSO2", 60.0, 1447921193000
"IBM", 60.0, 1447921193000
"WSO2", 70.0, 1447921194000
"IBM", 70.0, 1447921194000
"WSO2", 80.0, 1447921195000
"IBM", 80.0, 1447921195000
receive events: 2
Event{timestamp=1448006723653, data=[WSO2, 1447921195000, 260.0, 4], isExpired=false}
Event{timestamp=1448006723653, data=[IBM, 1447921195000, 260.0, 4], isExpired=false}
window.unique, window.firstUnique
功能如其意,直接看例子,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.unique(symbol) " +
"select symbol, price, time " +
"insert into outputStream;";
得到结果,从结果看起来,就和普通的流流过一样,
因为每次这个symbol有更新都会触发一次event,
"WSO2", 0.0, 1447921187000
"IBM", 0.0, 1447921187000
receive events: 2
Event{timestamp=1448009613618, data=[WSO2, 0.0, 1447921187000], isExpired=false}
Event{timestamp=1448009613620, data=[IBM, 0.0, 1447921187000], isExpired=false}
"WSO2", 10.0, 1447921188000
"IBM", 10.0, 1447921188000
receive events: 1
Event{timestamp=1448009614633, data=[WSO2, 10.0, 1447921188000], isExpired=false}
receive events: 1
Event{timestamp=1448009614633, data=[IBM, 10.0, 1447921188000], isExpired=false}
"WSO2", 20.0, 1447921189000
"IBM", 20.0, 1447921189000
receive events: 2
Event{timestamp=1448009615650, data=[WSO2, 20.0, 1447921189000], isExpired=false}
Event{timestamp=1448009615650, data=[IBM, 20.0, 1447921189000], isExpired=false}
"WSO2", 30.0, 1447921190000
receive events: 1
"IBM", 30.0, 1447921190000
Event{timestamp=1448009616650, data=[WSO2, 30.0, 1447921190000], isExpired=false}
receive events: 1
Event{timestamp=1448009616650, data=[IBM, 30.0, 1447921190000], isExpired=false}
再看看first unique,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.firstUnique(symbol) " +
"select symbol, price, time " +
"insert into outputStream;";
得到的结果,可以看到只有symbol第一次出现时,会触发
"WSO2", 0.0, 1447921187000
"IBM", 0.0, 1447921187000
receive events: 1
Event{timestamp=1448008769827, data=[WSO2, 0.0, 1447921187000], isExpired=false}
receive events: 1
Event{timestamp=1448008769831, data=[IBM, 0.0, 1447921187000], isExpired=false}
"WSO2", 10.0, 1447921188000
"IBM", 10.0, 1447921188000
"WSO2", 20.0, 1447921189000
"IBM", 20.0, 1447921189000
"WSO2", 30.0, 1447921190000
"IBM", 30.0, 1447921190000
"WSO2", 40.0, 1447921191000
"IBM", 40.0, 1447921191000
"WSO2", 50.0, 1447921192000
"IBM", 50.0, 1447921192000
"WSO2", 60.0, 1447921193000
"IBM", 60.0, 1447921193000
"WSO2", 70.0, 1447921194000
"IBM", 70.0, 1447921194000
"WSO2", 80.0, 1447921195000
"IBM", 80.0, 1447921195000
"WSO2", 90.0, 1447921196000
"IBM", 90.0, 1447921196000
这个往往和join会同时使用,如
from SymbolStream#window.lenght(1) unidirectional join StockExchangeStream#window.unique("symbol")
insert into StockQuote StockExchangeStream.symbol as symbol,StockExchangeStream.price as lastTradedPrice
Output Rate Limiting
只所以在这里介绍这个,是因为觉得和unique一起用,很合适
基本语法,output
({<
output
-type>} every (<
time
interval>|<event interval> events) | snapshot every <
time
interval>)
其中"<output-type>","first", "last" and "all",默认是all
比如普通的window,如果每条都触发,太频繁了,我只想固定条数或时间触发一次就可以
这个对于unique尤为合适,因为使用unique,一般是只想知道最新的情况,所以每一条都触发是没有意义的,定期触发就可以
还是用前面的例子,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.unique(symbol) " +
"select symbol, price, time " +
"group by symbol " +
"output last every 5 events " +
"insert into outputStream;";
得到的结果,虽然加上group by symbol,所以每次都会分别输出wso2,ibm两条
但是对于event数的判断还是合一块的,并不是5条wso2或5条ibm触发
"WSO2", 0.0, 1447921187000
"IBM", 0.0, 1447921187000
"WSO2", 10.0, 1447921188000
"IBM", 10.0, 1447921188000
"WSO2", 20.0, 1447921189000
"IBM", 20.0, 1447921189000
receive events: 2
Event{timestamp=1448010405404, data=[WSO2, 20.0, 1447921189000], isExpired=false}
Event{timestamp=1448010404405, data=[IBM, 10.0, 1447921188000], isExpired=false}
"WSO2", 30.0, 1447921190000
"IBM", 30.0, 1447921190000
"WSO2", 40.0, 1447921191000
"IBM", 40.0, 1447921191000
receive events: 2
Event{timestamp=1448010407404, data=[IBM, 40.0, 1447921191000], isExpired=false}
Event{timestamp=1448010407404, data=[WSO2, 40.0, 1447921191000], isExpired=false}
用时间也是一样的,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.unique(symbol) " +
"select symbol, price, time " +
"group by symbol " +
"output last every 5 sec " +
"insert into outputStream;";
结果如下,
"WSO2", 0.0, 1447921187000
"IBM", 0.0, 1447921187000
"WSO2", 10.0, 1447921188000
"IBM", 10.0, 1447921188000
"WSO2", 20.0, 1447921189000
"IBM", 20.0, 1447921189000
"WSO2", 30.0, 1447921190000
"IBM", 30.0, 1447921190000
"WSO2", 40.0, 1447921191000
"IBM", 40.0, 1447921191000
receive events: 2
Event{timestamp=1448010645533, data=[WSO2, 40.0, 1447921191000], isExpired=false}
Event{timestamp=1448010645533, data=[IBM, 40.0, 1447921191000], isExpired=false}
"WSO2", 50.0, 1447921192000
"IBM", 50.0, 1447921192000
"WSO2", 60.0, 1447921193000
"IBM", 60.0, 1447921193000
"WSO2", 70.0, 1447921194000
"IBM", 70.0, 1447921194000
"WSO2", 80.0, 1447921195000
"IBM", 80.0, 1447921195000
"WSO2", 90.0, 1447921196000
"IBM", 90.0, 1447921196000
receive events: 2
Event{timestamp=1448010650533, data=[WSO2, 90.0, 1447921196000], isExpired=false}
Event{timestamp=1448010650533, data=[IBM, 90.0, 1447921196000], isExpired=false}
snapshot功能,emit all current events arrived so far,这个一般不会直接这么用,想不出啥场景
例子,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.unique(symbol) " +
"select symbol, price, time " +
"group by symbol " +
"output snapshot every 2 sec " +
"insert into outputStream;";
结果如下,
"WSO2", 0.0, 1447921187000
"IBM", 0.0, 1447921187000
"WSO2", 10.0, 1447921188000
"IBM", 10.0, 1447921188000
receive events: 4
Event{timestamp=1448011434403, data=[WSO2, 0.0, 1447921187000], isExpired=false}
Event{timestamp=1448011434405, data=[IBM, 0.0, 1447921187000], isExpired=false}
Event{timestamp=1448011435405, data=[WSO2, 10.0, 1447921188000], isExpired=false}
Event{timestamp=1448011435405, data=[IBM, 10.0, 1447921188000], isExpired=false}
"WSO2", 20.0, 1447921189000
"IBM", 20.0, 1447921189000
"WSO2", 30.0, 1447921190000
"IBM", 30.0, 1447921190000
receive events: 8
Event{timestamp=1448011434403, data=[WSO2, 0.0, 1447921187000], isExpired=false}
Event{timestamp=1448011434405, data=[IBM, 0.0, 1447921187000], isExpired=false}
Event{timestamp=1448011435405, data=[WSO2, 10.0, 1447921188000], isExpired=false}
Event{timestamp=1448011435405, data=[IBM, 10.0, 1447921188000], isExpired=false}
Event{timestamp=1448011436405, data=[WSO2, 20.0, 1447921189000], isExpired=false}
Event{timestamp=1448011436405, data=[IBM, 20.0, 1447921189000], isExpired=false}
Event{timestamp=1448011437405, data=[WSO2, 30.0, 1447921190000], isExpired=false}
Event{timestamp=1448011437405, data=[IBM, 30.0, 1447921190000], isExpired=false}
window.sort
在window中排序,
<event> sort(<int> windowLength, <string> attribute, <string> order, .. , <string> attributeN, <string> orderN)
order,"asc" or "desc",默认为asc
例子,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.sort(3, price, 'asc') " +
"select symbol, price, time " +
"group by symbol " +
"insert all events into outputStream;";
length为3,对price升序;这里的意思是,当window length >3时,即4,会输出按price升序排序,最大的那个event
结果如下,
"WSO2", 0.0, 1447921187000
"IBM", 0.0, 1447921187000
Events{ @timeStamp = 1448875633289, inEvents = [Event{timestamp=1448875633289, data=[WSO2, 0.0, 1447921187000], isExpired=false}], RemoveEvents = null }
Events{ @timeStamp = 1448875633290, inEvents = [Event{timestamp=1448875633290, data=[IBM, 0.0, 1447921187000], isExpired=false}], RemoveEvents = null }
"WSO2", 10.0, 1447921188000
"IBM", 10.0, 1447921188000
Events{ @timeStamp = 1448875634291, inEvents = [Event{timestamp=1448875634291, data=[WSO2, 10.0, 1447921188000], isExpired=false}], RemoveEvents = null }
Events{ @timeStamp = 1448875634291, inEvents = [Event{timestamp=1448875634291, data=[IBM, 10.0, 1447921188000], isExpired=false}], RemoveEvents = [Event{timestamp=1448875634291, data=[IBM, 10.0, 1447921188000], isExpired=true}] }
"WSO2", 20.0, 1447921189000
"IBM", 20.0, 1447921189000
Events{ @timeStamp = 1448875635292, inEvents = [Event{timestamp=1448875635292, data=[WSO2, 20.0, 1447921189000], isExpired=false}], RemoveEvents = [Event{timestamp=1448875635292, data=[WSO2, 20.0, 1447921189000], isExpired=true}] }
Events{ @timeStamp = 1448875635292, inEvents = [Event{timestamp=1448875635292, data=[IBM, 20.0, 1447921189000], isExpired=false}], RemoveEvents = [Event{timestamp=1448875635292, data=[IBM, 20.0, 1447921189000], isExpired=true}] }
可以看到,大于3的时候,current event和expired event收到的都是一样的,因为是asc排序,所以大于前3个的都会被过期
window.frequent;window.lossyFrequent
<event> frequent(<int> eventCount, <string> attribute, .. , <string> attributeN), based on Misra-Gries counting algorithm, 参考http://www.zhihu.com/question/23480657
这个processor的实现原理参考,http://mail.wso2.org/mailarchive/dev/2015-September/055230.html
说实在的,如果对这个算法不了解,相当的晦涩,
"define stream cseEventStream (symbol string, price float, time long);" +
"@info(name = 'query1') " +
"from cseEventStream[700 > price]#window.frequent(2, symbol) " +
"select symbol, price, time " +
"insert all events into outputStream;";
frequent的意思,就是你接收current events,如果当前stream的event,是属于top frequent的,就会输出,否则就会丢掉
说白了,从current events,你可以一直重复的收到属于top frequent的event,其他的则会丢掉
输入如下,
String str = "attributes to attributes to to events. If no no no no attributes";
String[] strs = str.split(" ");
for(String s:strs){
float p = i*10;
inputHandler.send(new Object[]{s, p, time});
System.out.println(s + ", " + p + ", " + time);
Thread.sleep(1000);
i++;
time = time + 1000;
}
得到结果,来分析一下,
attributes, 0.0, 1447921187000
Events{ @timeStamp = 1448873866506, inEvents = [Event{timestamp=1448873866506, data=[attributes, 0.0, 1447921187000], isExpired=false}], RemoveEvents = null }
to, 10.0, 1447921188000
Events{ @timeStamp = 1448873867509, inEvents = [Event{timestamp=1448873867509, data=[to, 10.0, 1447921188000], isExpired=false}], RemoveEvents = null }
attributes, 20.0, 1447921189000
Events{ @timeStamp = 1448873868509, inEvents = [Event{timestamp=1448873868509, data=[attributes, 20.0, 1447921189000], isExpired=false}], RemoveEvents = null }
to, 30.0, 1447921190000
Events{ @timeStamp = 1448873869509, inEvents = [Event{timestamp=1448873869509, data=[to, 30.0, 1447921190000], isExpired=false}], RemoveEvents = null }
to, 40.0, 1447921191000
Events{ @timeStamp = 1448873870509, inEvents = [Event{timestamp=1448873870509, data=[to, 40.0, 1447921191000], isExpired=false}], RemoveEvents = null }
events., 50.0, 1447921192000
If, 60.0, 1447921193000
Events{ @timeStamp = 1448873872509, inEvents = [Event{timestamp=1448873872509, data=[If, 60.0, 1447921193000], isExpired=false}], RemoveEvents = [Event{timestamp=1448873868509, data=[attributes, 20.0, 1447921189000], isExpired=true}] }
no, 70.0, 1447921194000
Events{ @timeStamp = 1448873873509, inEvents = [Event{timestamp=1448873873509, data=[no, 70.0, 1447921194000], isExpired=false}], RemoveEvents = [Event{timestamp=1448873870509, data=[to, 40.0, 1447921191000], isExpired=true}, Event{timestamp=1448873872509, data=[If, 60.0, 1447921193000], isExpired=true}] }
前面一直都没有问题,一直输入attributes,to,
直到输入events.,因为attributes,to已经占满2个位置,所以要触发过期,window里面的所有event的frequency减1,过期frequency=0的event
可是这里attributes,to的frequent都是大于0的,所以window里面没有可以expire的event,
那么只能把当前的events.给丢掉了,所以在current events中并没有收到这个event,‘events.’
因为我们只能收到top frequent的events
到收到if,再次触发expire,window里面的所有event的frequency再次减1,
此时,attributes的frequency已经为0,所以attribute被过期,而event,‘if’,被放入window中,
所以此时,我们会在current events中看到‘if’,而在expired events中看到‘attributes’
<event> lossyFrequent(<double> supportThreshold, <double> errorBound, <string> attribute, .. , <string> attributeN), based on Lossy Counting algorithm, 参考http://stackoverflow.com/questions/8033012/what-is-lossy-counting
没测,应该是判断过期的算法不一样,其他差不多
Siddhi CEP Window机制的更多相关文章
- Android全面解析之Window机制
前言 你好! 我是一只修仙的猿,欢迎阅读我的文章. Window,读者可能更多的认识是windows系统的窗口.在windows系统上,我们可以多个窗口同时运行,每个窗口代表着一个应用程序.但在安卓上 ...
- Android之window机制token验证
前言 很高兴遇见你~ 欢迎阅读我的文章 这篇文章讲解关于window token的问题,同时也是Context机制和Window机制这两篇文章的一个补充.如果你对Android的Window机制和Co ...
- 一文搞懂Flink Window机制
Windows是处理无线数据流的核心,它将流分割成有限大小的桶(buckets),并在其上执行各种计算. 窗口化的Flink程序的结构通常如下,有分组流(keyed streams)和无分组流(non ...
- Siddhi cep java 集成简单使用
Siddhi 是一个开源的cep (Complex Event Processing)类库,有一个明显的例子是uber 的事件处理,具体可以google 几张参考cep 以及siddhi 图 java ...
- storm(一) window机制
Watermark作用 在解释storm的window之前先说明一下watermark原理. Watermark中文翻译为水位线更为恰当. 顺序的数据从源头开始发送到到操作,中间过程肯定会出现数据乱序 ...
- Flink window机制
此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 问题 window是解决流计算中的什么问题? 怎么划分window?有哪几种window?window与时间属 ...
- Siddhi初探
官方对Siddhi的介绍如下: Siddhi CEP is a lightweight, easy-to-use Open Source Complex Event Processing Engine ...
- Flink中API使用详细范例--window
Flink Window机制范例实录: 什么是Window?有哪些用途? 1.window又可以分为基于时间(Time-based)的window 2.基于数量(Count-based)的window ...
- 进阶之路 | 奇妙的Window之旅
前言 本文已经收录到我的Github个人博客,欢迎大佬们光临寒舍: 我的GIthub博客 学习清单: Window&WindowManagerService Window&Window ...
随机推荐
- android 制作自定义标题栏
1.在AndroidManifest.xml设置主题 android:theme="@android:style/Theme.NoTitleBar.Fullscreen" 2.在l ...
- Knowledgeroot安装与使用入门
采用 PHP 开发的知识库系统,基于树状结构对内容进行组织.使用 FCKEditor 进行内容编辑. 效果http://demo.knowledgeroot.org/index.php?id=2230 ...
- 组合数学(全排列)+DFS CSU 1563 Lexicography
题目传送门 /* 题意:求第K个全排列 组合数学:首先,使用next_permutation 函数会超时,思路应该转变, 摘抄网上的解法如下: 假设第一位是a,不论a是什么数,axxxxxxxx一共有 ...
- BZOJ2905 : 背单词
首先对所有单词建立AC自动机,$S$是$T$的子串等价于$T$的某个前缀通过$fail$链可以走到$S$的终止节点,即$S$的终止节点是$T$某个前缀在$fail$树上的祖先. 设$f[i]$表示考虑 ...
- HTML的快速写法:Emmet和Haml
HTML代码写起来很费事,因为它的标签多. 一种解决方法是采用模板, 在别人写好的骨架内,填入自己的内容.还有一种就是我今天想要介绍的方法—-简写法. 常用的简写法,目前主要是Emmet和Haml两种 ...
- Java多线程编程详解
转自:http://programming.iteye.com/blog/158568 线程的同步 由于同一进程的多个线程共享同一片存储空间,在带来方便的同时,也带来了访问冲突这个严重的问题.Ja ...
- WP7.1 应用程序发布到Marketplace
从8月22起Windows Phone marketplace可以提交7.1 sdk开发的应用了,尽管提交页面和方式与7.0是同一个,但是还是会出现一些问题.并且在提交之前也注意一些问题. 7.0 应 ...
- NOIP200002税收与补贴
试题描述 每样商品的价格越低,其销量就会相应增大.现已知某种商品的成本及其在若干价位上的销量(产品不会低于成本销售),并假设相邻价位间销量的变化是线性的且在价格高于给定的最高价位后,销量以某固定数值递 ...
- winform学习之----打开文件对话框并将文件内容放入文本框
OpenFileDialog ofg = new OpenFileDialog(); ofg.Title = "ddd";//设置对话框标题 ofg.Multiselect = t ...
- spark Using MLLib in Scala/Java/Python
Using MLLib in ScalaFollowing code snippets can be executed in spark-shell. Binary ClassificationThe ...