Apriori algorithm

本文是个人对spmf中example1. mining frequent itemsets by using the apriori algorithm的学习.

What is Apriori?

Apriori is an algorithm for discovering frequent itemsets in transaction databases. It was proposed by Agrawal & Srikant

input file format:

1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

so the transaction is:

Transaction
{1, 3, 4}
{2, 3, 5}
{1, 2, 3, 5}
{2, 5}
{1, 2, 3, 5}

在java实现中，可用

private List<int[]> database = null;

database = new ArrayList<int[]>(); 用来存储上面的结构(即存储各个transaction)

output(with minsup of 40%)

itemsets support
{1} 3
{2} 4
{3} 4
{5} 4
{1, 2} 2
{1, 3} 3
{1, 5} 2
{2, 3} 3
{2, 5} 4
{3, 5} 3
{1, 2, 3} 2
{1, 2, 5} 2
{1, 3, 5} 2
{2, 3, 5} 3
{1, 2, 3, 5} 2

java实现的一些实现细节记录

用HashMap结构 Map<Integer, Integer> mapItemCount = new HashMap<Integer, Integer>();来记录每个item和其出现的次数

当k=1时，（k为 the size of itemset）

List<Integer> frequent1 = new ArrayList<Integer>();

判断当HashMap中各item出现的次数满足minsup时:

frequent1.add(entry.getKey());
saveItemsetToFile(entry.getKey(), entry.getValue());

下面产生候选集合：

当k=2时，即｛1，2｝、｛1，3｝这些itemsets，此时从frequent1中产生候选集合项，生成candidates（所有情况）,然后通过计算各候选项集的支持度，找出k=2时满足minsup的项集。(计算各候选项集支持度方法见下文)

当k=3或以上时，选取封装了 k-1时频繁项集 List<Itemset> 作为生成大小为K的候选集函数的输入，生成方法是：“we compare items of itemset1 and itemset2.If they have all the same k-1 items and the last item of itemset1 is smaller than the last item of itemset2, we will combine them to generate a candidate”，之后再利用allSubsetsOfSizeK_1AreFrequent()来检测生成的大小为k的预备候选集中，其所有的大小为k-1的子集是否存在于大小为k-1的频繁项集中，如果都存在，则将此大小为k的预备候选集即被视为候选集，接下来再计算各候选项集的支持度，找出满足minsup的候选集作为频繁项集。

计算各候选项集支持度的计算过程如下：

对于文件(database)中的每行(transaction)，用candidates中所有的candidate来试验是否存在于第一个transaction中，方法是，拿第一个transaction中的item与candidate中每个位置(pos)上的item进行比较，能比较到pos == candidate.itemset.length位置上时，说明该candidate已经存在于此transaction中。换个candidate继续上述过程，所有candidate都完成上述过程后，换个transaction继续上述过程。

计算过程核心部分代码如下：

for(int[] transaction: database){
loopCand: for(Itemset candidate : candidatesK){
　　　　　　int pos = 0;
　　　　　　for(int item: transaction){
　　　　　　 if(item == candidate.itemset[pos]){
　　　　　　pos++;
　　　　　　　　if(pos == candidate.itemset.length){
　　　　　　　　candidate.support++;
　　　　　　　　continue loopCand;

　　　　　　　　}//end the second if

　　　　　　}//end the first if

　　　　　　 else if(item > candidate.itemset[pos]){
　　　　　　　　　　continue loopCand;}
}//end for
}//end for
}//end the first for

Apriori algorithm的更多相关文章

关联规则算法(The Apriori algorithm)详解
一.前言在学习The Apriori algorithm算法时,参考了多篇博客和一篇论文,尽管这些都是很优秀的文章,但是并没有一篇文章详解了算法的整个流程,故整理多篇文章,并加入自己的一些注解,有了 ...
数据挖掘算法-Apriori Algorithm（关联规则）
http://www.cnblogs.com/jingwhale/p/4618351.html Apriori algorithm是关联规则里一项基本算法.是由Rakesh Agrawal和Ramak ...
先验算法(Apriori algorithm) - 机器学习算法
Apriori is an algorithm for frequent item set mining and association rule learning over transactiona ...
数据挖掘 Apriori Algorithm python实现
该算法主要是处理关联分析的: 大多书上面都会介绍,这里就不赘述了: dataset=[[1,2,5],[2,4],[2,3],[1,2,4],[1,3],[2,3],[1,3],[1,2,3,5],[ ...
#研发解决方案#基于Apriori算法的Nginx+Lua+ELK异常流量拦截方案
郑昀基于杨海波的设计文档创建于2015/8/13 最后更新于2015/8/25 关键词:异常流量.rate limiting.Nginx.Apriori.频繁项集.先验算法.Lua.ELK 本文档 ...
AprioriTID algorithm
What is AprioriTID? AprioriTID is an algorithm for discovering frequent itemsets (groups of items ap ...
基于Apriori算法的Nginx+Lua+ELK异常流量拦截方案郑昀基于杨海波的设计文档（转）
郑昀基于杨海波的设计文档创建于2015/8/13 最后更新于2015/8/25 关键词:异常流量.rate limiting.Nginx.Apriori.频繁项集.先验算法.Lua.ELK 本文档 ...
一步步教你轻松学关联规则Apriori算法
一步步教你轻松学关联规则Apriori算法 (白宁超 2018年10月22日09:51:05) 摘要:先验算法(Apriori Algorithm)是关联规则学习的经典算法之一,常常应用在商业等诸多领 ...
HAWQ + MADlib 玩转数据挖掘之（七）——关联规则方法之Apriori算法
一.关联规则简介关联规则挖掘的目标是发现数据项集之间的关联关系,是数据挖据中一个重要的课题.关联规则最初是针对购物篮分析(Market Basket Analysis)问题提出的.假设超市经理想更多 ...

随机推荐

LINUX SSH客户端的中文乱码问题
原因在于文件/etc/sysconfig/i18n 这个文件是系统的区域语言设置, i18n是国际化internationalization的缩写 i和n之间正好18个字母解释: LANG= ...
删除IE缓存中指定的文件
DeleteUrlCacheEntry 1.文件单元:WinInt VC声明 BOOL DeleteUrlCacheEntry ( LPCTSTR lpszUrlName); 函数功能删除Cache ...
C# log Helper
using System; using System.Collections.Generic; using System.Text; using System.Data.SqlClient; usin ...
GET POST方法长度限制
GET POST方法长度限制 1. Get方法长度限制 Http Get方法提交的数据大小长度并没有限制,HTTP协议规范没有对URL长度进行限制.这个限制是特定的浏览器及服务器对它的限制. ...
j2ee概览
J2EE诞生的背景是什么?Java 2平台企业版,也就是J2EE,定义了开发多层企业应用程序的标准.它的诞生并不是偶然的,它是在各种条件积累成熟之下的产物.原因之一:java语言的巨大成功.1994年 ...
PSPInstance Object | Web Python
PSPInstance Object | Web Python The PSPInstance object is available to all PSP pages through the psp ...
ACM学习-POJ-1143-Number Game
菜鸟学习ACM,纪录自己成长过程中的点滴. 学习的路上,与君共勉. ACM学习-POJ-1143-Number Game Number Game Time Limit: 1000MS Memory ...
PHP MySQL Order By 关键词之 Order By
ORDER BY 关键词 ORDER BY 关键词用于对记录集中的数据进行排序. 语法 SELECT column_name(s) FROM table_name ORDER BY column_na ...
Eddy's picture（prime+克鲁斯卡尔）
Eddy's picture Time Limit : 2000/1000ms (Java/Other) Memory Limit : 65536/32768K (Java/Other) Tota ...
SVN中tag branch trunk用法详解
SVN中tag branch trunk用法详解 2010-05-24 18:32 佚名字号:T | T 本文向大家简单介绍一下SVN中tag branch trunk用法,SVN中tag bran ...

Apriori algorithm

Apriori algorithm的更多相关文章

随机推荐

热门专题