关联分析FPGrowth算法在JavaWeb项目中的应用

关联分析（关联挖掘）是指在交易数据、关系数据或其他信息载体中，查找存在于项目集合或对象集合之间的频繁模式、关联、相关性或因果结构。关联分析的一个典型例子是购物篮分析。通过发现顾客放入购物篮中不同商品之间的联系，分析顾客的购买习惯。比如，67%的顾客在购买尿布的同时也会购买啤酒。通过了解哪些商品频繁地被顾客同时购买，可以帮助零售商制定营销策略。分析结果可以应用于商品货架布局、货存安排以及根据购买模式对顾客进行分类。

FPGrowth算法是韩嘉炜等人在2000年提出的关联分析算法，在算法中使用了一种称为频繁模式树（Frequent Pattern Tree）的数据结构，基于上述数据结构加快整个关联规则挖掘过程。采取如下分治策略：将提供频繁项集的数据库压缩到一棵频繁模式树（FP-Tree），但仍保留项集关联信息。该算法和Apriori算法最大的不同有两点：第一，不产生候选集，第二，只需要两次遍历数据库，大大提高了效率。

一、前言

首先理解频繁项集中的以下概念：

频繁项：在多个集合中，频繁出现的元素项。

频繁项集：在一系列集合中每项都含有某些相同的元素，这些元素形成一个子集，满足一定阀值就是频繁项集。

K项集：K个频繁项组成的一个集合。

下面用一个例子（事务数据库）说明支持度与置信度，每一行为一个事务，事务由若干个互不相同的项构成，任意几个项的组合称为一个项集。

A　　E　　F　　G
A　　F　　G
A　　B　　E　　F　　G
E　　F　　G

支持度：在所有项集中出现的可能性。如项集{A,F,G}的支持数为3，支持度为3/4。支持数大于阈值minSuport的项集称为频繁项集。{F,G}的支持数为4，支持度为4/4。{A}的支持数为3，支持度为3/4。
置信度：频繁项与某项的并集的支持度与频繁项集支持度的比值。如{F,G}-->{A}的置信度则为{A,F,G}的支持数除以{F,G}的支持数，即3/4。{A}-->{F,G}的置信度则为{A,F,G}的支持数除以{A}的支持数，即3/3。

综上所述，理论上可以通过FPGrowth算法从频繁集中挖掘相关规则，再通过置信度筛选出规则用于推荐功能。在本人这个JavaWeb项目中，使用FPGrowth算法基于所有用户搜索历史记录，结合当前搜索记录推荐用户可能感兴趣的（置信度大于阈值的搜索记录）、以及其他用户搜索过的（频繁项集中非当前搜索记录）。上述仅是个人观点，如有错误之处还请不吝赐教。

二、正文

1、用户搜索记录实体类：

 package entity;
 
 /**
  * 用户搜索历史记录
  * @author: yjl
  * @date: 2018/5/24
  */
 public class TQueryHistory {
 
     private Integer id;
 
     private String userAccount;    //用户账号
 
     private String queryCorpName;  //用户搜索的企业
 
     public TQueryHistory() {
     }
 
     public TQueryHistory(String userAccount, String queryCorpName) {
         this.userAccount = userAccount;
         this.queryCorpName = queryCorpName;
     }
 
     public TQueryHistory(Integer id, String userAccount, String queryCorpName) {
         this.id = id;
         this.userAccount = userAccount;
         this.queryCorpName = queryCorpName;
     }
 
     public Integer getId() {
         return id;
     }
 
     public void setId(Integer id) {
         this.id = id;
     }
 
     public String getUserAccount() {
         return userAccount;
     }
 
     public void setUserAccount(String userAccount) {
         this.userAccount = userAccount;
     }
 
     public String getQueryCorpName() {
         return queryCorpName;
     }
 
     public void setQueryCorpName(String queryCorpName) {
         this.queryCorpName = queryCorpName;
     }
 
     @Override
     public String toString() {
         return "TQueryHistory{" +
                 "id=" + id +
                 ", userAccount='" + userAccount + '\'' +
                 ", queryCorpName='" + queryCorpName + '\'' +
                 '}';
     }
 }

2、FPGrowth挖掘相关规则前的数据准备，类似于上述的事务数据库，corpName为用户当前搜索的企业，最后得到的interestedCorpList与otherSearchCorpList集合分别表示用户感兴趣的企业、其他用户搜索过的企业，若集合数量不足可以根据企业行业等属性补充：

 //获取所有用户的搜索记录
 List<TQueryHistory> allQueryHistory = searchCorpService.getAllQueryHistory();
 
 //根据用户账号分类
 Map<String, Integer> accountMap = new HashMap();
 for(TQueryHistory tQueryHistory: allQueryHistory){
     accountMap.put(tQueryHistory.getUserAccount(),0);
 }
 
 //根据已分类账号分配
 Map<String,List<String>> newQueryHistoryMap = new HashMap<>();
 for(Map.Entry<String,Integer> entry: accountMap.entrySet()){
     String account = entry.getKey();
     List<String> accountTQueryHistoryList = new ArrayList<>();
     for(TQueryHistory tQueryHistory: allQueryHistory){
         if(tQueryHistory.getUserAccount().equals(account)){
             accountTQueryHistoryList.add(tQueryHistory.getQueryCorpName());
         }
     }
     newQueryHistoryMap.put(account,accountTQueryHistoryList);
 }
 
 //遍历Map将企业名称写入文件，并传至FPTree
 String outfile = "QueryHistory.txt";
 BufferedWriter bw = new BufferedWriter(new FileWriter(outfile));
 for(Map.Entry<String,List<String>> entry: newQueryHistoryMap.entrySet()){
     List<String> corpNameList = entry.getValue();
 
     bw.write(joinList(corpNameList));
     bw.newLine();
 }
 bw.close();
 
 //Map取值分别放入对应的集合
 Map<String, List<String>> corpMap = FPTree.introQueryHistory(outfile,corpName);
 List<String> interestedCorpList = new ArrayList<>();
 List<String> otherSearchCorpList = new ArrayList<>();
 for(Map.Entry<String,List<String>> entry: corpMap.entrySet()){
     if("interestedCorpList".equals(entry.getKey())){
         interestedCorpList = entry.getValue();
     }
     if("otherSearchCorpList".equals(entry.getKey())){
         otherSearchCorpList = entry.getValue();
     }
 }

 //设置文件写入规则
 private static String joinList(List<String> list) {
     if (list == null || list.size() == 0) {
         return "";
     }
     StringBuilder sb = new StringBuilder();
     for (String ele : list) {
         sb.append(ele);
         sb.append(",");
     }
     return sb.substring(0, sb.length() - 1);
 }

3、FPStrongAssociationRule类为强关联规则变量：

 package util;
 
 import java.util.List;
 
 public class FPStrongAssociationRule {
 
     public List<String> condition;
 
     public String result;
 
     public int support;
 
     public double confidence;
 
 }

4、FPTreeNode类为FPTree的相关变量：

 package util;
 
 import java.util.ArrayList;
 import java.util.List;
 
 public class FPTreeNode {
 
     private String name;                    //节点名称
     private int count;                      //频数
     private FPTreeNode parent;              //父节点
     private List<FPTreeNode> children;      //子节点
     private FPTreeNode nextHomonym;         //下一个节点（由表头项维护的那个链表）
     private FPTreeNode tail;                //末节点（由表头项维护的那个链表）
 
     public FPTreeNode() {
     }
 
     public FPTreeNode(String name) {
         this.name = name;
     }
 
     public String getName() {
         return this.name;
     }
 
     public void setName(String name) {
         this.name = name;
     }
 
     public int getCount() {
         return this.count;
     }
 
     public void setCount(int count) {
         this.count = count;
     }
 
     public FPTreeNode getParent() {
         return this.parent;
     }
 
     public void setParent(FPTreeNode parent) {
         this.parent = parent;
     }
 
     public List<FPTreeNode> getChildren() {
         return this.children;
     }
 
     public void setChildren(List<FPTreeNode> children) {
         this.children = children;
     }
 
     public FPTreeNode getNextHomonym() {
         return this.nextHomonym;
     }
 
     public void setNextHomonym(FPTreeNode nextHomonym) {
         this.nextHomonym = nextHomonym;
     }
 
     public FPTreeNode getTail() {
         return tail;
     }
 
     public void setTail(FPTreeNode tail) {
         this.tail = tail;
     }
 
     //添加子节点
     public void addChild(FPTreeNode child) {
         if (getChildren() == null) {
             List<FPTreeNode> list = new ArrayList<>();
             list.add(child);
             setChildren(list);
         } else {
             getChildren().add(child);
         }
     }
 
     //查询子节点
     public FPTreeNode findChild(String name) {
         List<FPTreeNode> children = getChildren();
         if (children != null) {
             for (FPTreeNode child : children) {
                 if (child.getName().equals(name)) {
                     return child;
                 }
             }
         }
         return null;
     }
 
     public void countIncrement(int n) {
         this.count += n;
     }
 
     @Override
     public String toString() {
         return name;
     }
 }

5、FPTree类为FPGrowth算法挖掘规则，introQueryHistory函数根据传入所有用户的搜索记录以及当前搜索的企业，得到用户可能感兴趣的企业以及其他用户搜索过的企业，以及限制每个集合中的企业数量：

 package util;
 
 import java.io.BufferedReader;
 import java.io.FileReader;
 import java.io.IOException;
 import java.text.DecimalFormat;
 import java.util.*;
 import java.util.Map.Entry;
 
 public class FPTree {
 
     private int minSuport;      //频繁模式的最小支持数
     private double confident;   //关联规则的最小置信度
     private int totalSize;      //事务项的总数
     private Map<List<String>, Integer> frequentMap = new HashMap<>();  //存储每个频繁项及其对应的计数
     private Set<String> decideAttr = null; //关联规则中，哪些项可作为被推导的结果，默认情况下所有项都可以作为被推导的结果
 
     public void setMinSuport(int minSuport) {
         this.minSuport = minSuport;
     }
 
     public void setConfident(double confident) {
         this.confident = confident;
     }
 
     public void setDecideAttr(Set<String> decideAttr) { this.decideAttr = decideAttr;}
 
     /**
      * 获取强关联规则
      * @return
      * @Description:
      */
     private List<FPStrongAssociationRule> getRules(List<String> list) {
         List<FPStrongAssociationRule> rect = new LinkedList<>();
         if (list.size() > 1) {
             for (int i = 0; i < list.size(); i++) {
                 String result = list.get(i);
                 if (decideAttr.contains(result)) {
                     List<String> condition = new ArrayList<>();
                     condition.addAll(list.subList(0, i));
                     condition.addAll(list.subList(i + 1, list.size()));
                     FPStrongAssociationRule rule = new FPStrongAssociationRule();
                     rule.condition = condition;
                     rule.result = result;
                     rect.add(rule);
                 }
             }
         }
         return rect;
     }
 
     /**
      * 从若干个文件中读入Transaction Record，同时把所有项设置为decideAttr
      * @return
      * @Description:
      */
     public List<List<String>> readTransRocords(String[] filenames) {
         Set<String> set = new HashSet<>();
         List<List<String>> transaction = null;
         if (filenames.length > 0) {
             transaction = new LinkedList<>();
             for (String filename : filenames) {
                 try {
                     FileReader fr = new FileReader(filename);
                     BufferedReader br = new BufferedReader(fr);
                     try {
                         String line;
                         // 一项事务占一行
                         while ((line = br.readLine()) != null) {
                             if (line.trim().length() > 0) {
                                 // 每个item之间用","分隔
                                 String[] str = line.split(",");
                                 //每一项事务中的重复项需要排重
                                 Set<String> record = new HashSet<>();
                                 for (String w : str) {
                                     record.add(w);
                                     set.add(w);
                                 }
                                 List<String> rl = new ArrayList<>();
                                 rl.addAll(record);
                                 transaction.add(rl);
                             }
                         }
                     } finally {
                         br.close();
                     }
                 } catch (IOException ex) {
                     System.out.println("Read transaction records failed." + ex.getMessage());
                     System.exit(1);
                 }
             }
         }
 
         this.setDecideAttr(set);
         return transaction;
     }
 
     /**
      * 生成一个序列的各种子序列（序列是有顺序的）
      * @param residualPath
      * @param results
      */
     private void combine(LinkedList<FPTreeNode> residualPath, List<List<FPTreeNode>> results) {
         if (residualPath.size() > 0) {
             //如果residualPath太长，则会有太多的组合，内存会被耗尽的
             FPTreeNode head = residualPath.poll();
             List<List<FPTreeNode>> newResults = new ArrayList<>();
             for (List<FPTreeNode> list : results) {
                 List<FPTreeNode> listCopy = new ArrayList<>(list);
                 newResults.add(listCopy);
             }
 
             for (List<FPTreeNode> newPath : newResults) {
                 newPath.add(head);
             }
             results.addAll(newResults);
             List<FPTreeNode> list = new ArrayList<>();
             list.add(head);
             results.add(list);
             combine(residualPath, results);
         }
     }
 
     /**
      * 判断是否为单节点
      * @param root
      */
     private boolean isSingleBranch(FPTreeNode root) {
         boolean rect = true;
         while (root.getChildren() != null) {
             if (root.getChildren().size() > 1) {
                 rect = false;
                 break;
             }
             root = root.getChildren().get(0);
         }
         return rect;
     }
 
     /**
      * 计算事务集中每一项的频数
      * @param transRecords
      * @return
      */
     private Map<String, Integer> getFrequency(List<List<String>> transRecords) {
         Map<String, Integer> rect = new HashMap<>();
         for (List<String> record : transRecords) {
             for (String item : record) {
                 Integer cnt = rect.get(item);
                 if (cnt == null) {
                     cnt = new Integer(0);
                 }
                 rect.put(item, ++cnt);
             }
         }
         return rect;
     }
 
     /**
      * 根据事务集合构建FPTree
      * @param transRecords
      * @Description:
      */
     public void buildFPTree(List<List<String>> transRecords) {
         totalSize = transRecords.size();
         //计算每项的频数
         final Map<String, Integer> freqMap = getFrequency(transRecords);
         //每条事务中的项按F1排序
         for (List<String> transRecord : transRecords) {
             Collections.sort(transRecord, (o1, o2) -> freqMap.get(o2) - freqMap.get(o1));
         }
         FPGrowth(transRecords, null);
     }
 
     /**
      * FP树递归生长，从而得到所有的频繁模式
      * @param cpb  条件模式基
      * @param postModel   后缀模式
      */
     private void FPGrowth(List<List<String>> cpb, LinkedList<String> postModel) {
         Map<String, Integer> freqMap = getFrequency(cpb);
         Map<String, FPTreeNode> headers = new HashMap<>();
         for (Entry<String, Integer> entry : freqMap.entrySet()) {
             String name = entry.getKey();
             int cnt = entry.getValue();
             //每一次递归时都有可能出现一部分模式的频数低于阈值
             if (cnt >= minSuport) {
                 FPTreeNode node = new FPTreeNode(name);
                 node.setCount(cnt);
                 headers.put(name, node);
             }
         }
 
         FPTreeNode treeRoot = buildSubTree(cpb,headers);
         //如果只剩下虚根节点，则递归结束
         if ((treeRoot.getChildren() == null) || (treeRoot.getChildren().size() == 0)) {
             return;
         }
 
         //如果树是单枝的，则直接把“路径的各种组合+后缀模式”添加到频繁模式集中。这个技巧是可选的，即跳过此步进入下一轮递归也可以得到正确的结果
         if (isSingleBranch(treeRoot)) {
             LinkedList<FPTreeNode> path = new LinkedList<>();
             FPTreeNode currNode = treeRoot;
             while (currNode.getChildren() != null) {
                 currNode = currNode.getChildren().get(0);
                 path.add(currNode);
             }
             //调用combine时path不宜过长，否则会OutOfMemory
             if (path.size() <= 20) {
                 List<List<FPTreeNode>> results = new ArrayList<>();
                 combine(path, results);
                 for (List<FPTreeNode> list : results) {
                     int cnt = 0;
                     List<String> rule = new ArrayList<>();
                     for (FPTreeNode node : list) {
                         rule.add(node.getName());
                         cnt = node.getCount();  //cnt最FPTree叶节点的计数
                     }
                     if (postModel != null) {
                         rule.addAll(postModel);
                     }
                     frequentMap.put(rule, cnt);
                 }
                 return;
             } else {
                 System.err.println("length of path is too long: " + path.size());
             }
         }
 
         for (FPTreeNode header : headers.values()) {
             List<String> rule = new ArrayList<>();
             rule.add(header.getName());
             if (postModel != null) {
                 rule.addAll(postModel);
             }
             //表头项+后缀模式  构成一条频繁模式（频繁模式内部也是按照F1排序的），频繁度为表头项的计数
             frequentMap.put(rule, header.getCount());
             //新的后缀模式：表头项+上一次的后缀模式（注意保持顺序，始终按F1的顺序排列）
             LinkedList<String> newPostPattern = new LinkedList<>();
             newPostPattern.add(header.getName());
             if (postModel != null) {
                 newPostPattern.addAll(postModel);
             }
             //新的条件模式基
             List<List<String>> newCPB;
             newCPB = new LinkedList<>();
             FPTreeNode nextNode = header;
             while ((nextNode = nextNode.getNextHomonym()) != null) {
                 int counter = nextNode.getCount();
                 //获得从虚根节点（不包括虚根节点）到当前节点（不包括当前节点）的路径，即一条条件模式基。注意保持顺序：你节点在前，子节点在后，即始终保持频率高的在前
                 LinkedList<String> path = new LinkedList<>();
                 FPTreeNode parent = nextNode;
                 while ((parent = parent.getParent()).getName() != null) {//虚根节点的name为null
                     path.push(parent.getName());//往表头插入
                 }
                 //事务要重复添加counter次
                 while (counter-- > 0) {
                     newCPB.add(path);
                 }
             }
             FPGrowth(newCPB, newPostPattern);
         }
     }
 
     /**
      * 把所有事务插入到一个FP树当中
      * @param transRecords
      * @param headers
      * @return
      */
     private FPTreeNode buildSubTree(List<List<String>> transRecords,final Map<String, FPTreeNode> headers) {
         FPTreeNode root = new FPTreeNode();//虚根节点
         for (List<String> transRecord : transRecords) {
             LinkedList<String> record = new LinkedList<>(transRecord);
             FPTreeNode subTreeRoot = root;
             FPTreeNode tmpRoot;
             if (root.getChildren() != null) {
                 //延已有的分支，令各节点计数加1
                 while (!record.isEmpty()
                         && (tmpRoot = subTreeRoot.findChild(record.peek())) != null) {
                     tmpRoot.countIncrement(1);
                     subTreeRoot = tmpRoot;
                     record.poll();
                 }
             }
             //长出新的节点
             addNodes(subTreeRoot, record, headers);
         }
         return root;
     }
 
     /**
      * 往特定的节点下插入一串后代节点，同时维护表头项到同名节点的链表指针
      * @param ancestor
      * @param record
      * @param headers
      */
     private void addNodes(FPTreeNode ancestor, LinkedList<String> record,
                           final Map<String, FPTreeNode> headers) {
         while (!record.isEmpty()) {
             String item = record.poll();
             //单个项的出现频数必须大于最小支持数，否则不允许插入FP树。达到最小支持度的项都在headers中。每一次递归根据条件模式基本建立新的FPTree时，把要把频数低于minSuport的排除在外，这也正是FPTree比穷举法快的真正原因
             if (headers.containsKey(item)) {
                 FPTreeNode leafnode = new FPTreeNode(item);
                 leafnode.setCount(1);
                 leafnode.setParent(ancestor);
                 ancestor.addChild(leafnode);
 
                 FPTreeNode header = headers.get(item);
                 FPTreeNode tail=header.getTail();
                 if(tail!=null){
                     tail.setNextHomonym(leafnode);
                 }else{
                     header.setNextHomonym(leafnode);
                 }
                 header.setTail(leafnode);
                 addNodes(leafnode, record, headers);
             }
 
         }
     }
 
     /**
      * 获取所有的强规则
      * @return
      */
     public List<FPStrongAssociationRule> getAssociateRule() {
         assert totalSize > 0;
         List<FPStrongAssociationRule> rect = new ArrayList<>();
         //遍历所有频繁模式
         for (Entry<List<String>, Integer> entry : frequentMap.entrySet()) {
             List<String> items = entry.getKey();
             int count1 = entry.getValue();
             //一条频繁模式可以生成很多关联规则
             List<FPStrongAssociationRule> rules = getRules(items);
             //计算每一条关联规则的支持度和置信度
             for (FPStrongAssociationRule rule : rules) {
                 if (frequentMap.containsKey(rule.condition)) {
                     int count2 = frequentMap.get(rule.condition);
                     double confidence = 1.0 * count1 / count2;
                     if (confidence >= this.confident) {
                         rule.support = count1;
                         rule.confidence = confidence;
                         rect.add(rule);
                     }
                 } else {
                     System.err.println(rule.condition + " is not a frequent pattern, however "
                             + items + " is a frequent pattern");
                 }
             }
         }
         return rect;
     }
 
     /**
      * 限制List集合中企业数目为5条
      */
     private static void limitFiveCorp(List<String> corpList) {
         if(corpList.size() > 5){
             Random randomId = new Random();
             //对随机的5个企业名称排成原来的默认顺序
             List<Integer> indexes = new ArrayList<>();
             while(indexes.size() < 5) {
                 int index = randomId.nextInt(corpList.size());
                 if(!indexes.contains(index)) {
                     indexes.add(index);
                 }
             }
             Collections.sort(indexes);
             //取出indexes对应的list放到newList
             List<String> tempRelationsCorpList = new ArrayList<>();
             for(int index : indexes) {
                 tempRelationsCorpList.add(corpList.get(index));
             }
             corpList.clear();
             corpList.addAll(tempRelationsCorpList);
         }
     }
 
     public static Map<String, List<String>> introQueryHistory(String outfile,String corpName) {
         FPTree fpTree = new FPTree();
 
         //设置置信度与支持数
         fpTree.setConfident(0.3);
         fpTree.setMinSuport(3);
 
         List<List<String>> trans = fpTree.readTransRocords(new String[] { outfile });
         for(int i = 1;i < trans.size() - 1;i++){
             System.out.println("第"+i+"行数据："+ trans.get(i));
         }
 
         fpTree.buildFPTree(trans);
 
         List<FPStrongAssociationRule> rules = fpTree.getAssociateRule();
         DecimalFormat dfm = new DecimalFormat("#.##");
 
         Map<String, String> interestedCorpMap = new HashMap<>();  //需要返回的关联企业(您可能感兴趣的公司)
         Map<String, String> otherSearchCorpMap = new HashMap<>(); //需要返回的关联企业(其他人还搜过的公司)
         //根据置信度查询关联企业用于返回感兴趣的公司
         for (FPStrongAssociationRule rule : rules) {
             System.out.println(rule.condition + "->" + rule.result + "\t" + dfm.format(rule.support) + "\t" + dfm.format(rule.confidence));
             List<String> corpCondition = rule.condition;
             for(int i = 0;i < corpCondition.size();i++){
                 if(corpName.equals(corpCondition.get(i))){
                     interestedCorpMap.put(rule.result,dfm.format(rule.confidence));
                 }
             }
             if(corpName.equals(rule.result)){
                 for(int i = 0;i < corpCondition.size();i++){
                     if(!corpName.equals(corpCondition.get(i))){
                         interestedCorpMap.put(corpCondition.get(i),dfm.format(rule.confidence));
                     }
                 }
             }
         }
 
         //根据多项集查询关联企业用于返回其它搜过的公司
         for (FPStrongAssociationRule rule : rules) {
             List<String> corpCondition = rule.condition;
             for (int i = 0; i < corpCondition.size(); i++) {
                 if (corpName.equals(corpCondition.get(i)) && corpCondition.size() > 1) {
                     for (int j = 0; j < corpCondition.size(); j++) {
                         if (!corpName.equals(corpCondition.get(j))) {
                             otherSearchCorpMap.put(corpCondition.get(j), "0.00");
                         }
                     }
                 }
             }
         }
 
         List<String> interestedCorpList = new ArrayList<>();
         List<String> otherSearchCorpList = new ArrayList<>();
         for(Map.Entry<String,String> entry: interestedCorpMap.entrySet()){
             interestedCorpList.add(entry.getKey());
         }
         for(Map.Entry<String,String> entry: otherSearchCorpMap.entrySet()){
             otherSearchCorpList.add(entry.getKey());
         }
 
         limitFiveCorp(interestedCorpList);
         limitFiveCorp(otherSearchCorpList);
 
         Map<String, List<String>> corpMap = new HashMap<>();
         corpMap.put("interestedCorpList",interestedCorpList);
         corpMap.put("otherSearchCorpList",otherSearchCorpList);
 
         return corpMap;
     }
 
 }

附上控制台打印部分截图：

三、总结

在上面的代码中将整个事务数据库传给FPGrowth，在实际中这是不可取的，因为内存不可能容下整个事务数据库，我们可能需要从关系数据库中一条一条地读入来建立FP-Tree。但无论如何 FP-Tree是肯定需要放在内存中的，但内存如果容不下怎么办？另外FPGrowth仍然是非常耗时的，想提高速度怎么办？解决办法：分而治之，并行计算。

在实践中，关联规则挖掘可能并不像人们期望的那么有用。一方面是因为支持度置信度框架会产生过多的规则，并不是每一个规则都是有用的。另一方面大部分的关联规则并不像“啤酒与尿布”这种经典故事这么普遍。关联规则分析是需要技巧的，有时需要用更严格的统计学知识来控制规则的增殖。

本文部分学习参考了：http://www.cnblogs.com/zhangchaoyang/articles/2198946.html

至此是关于关联分析FPGrowth算法在JavaWeb项目中的应用，上述仅是个人观点，仅供参考。

如有疏漏错误之处，还请不吝赐教！

关联分析FPGrowth算法在JavaWeb项目中的应用的更多相关文章

JavaWeb 项目中的绝对路径和相对路径以及问题的解决方式
近期在做JavaWeb项目,总是出现各种的路径错误,并且发现不同情况下 / 所代表的含义不同,导致在调试路径上浪费了大量时间. 在JavaWeb项目中尽量使用绝对路径由于使用绝对路径是绝对不会出 ...
log4j在javaWeb项目中的使用
在前边的文章中对log4j的配置文件进行了说明,今天介绍如何在普通的javaWeb项目中使用log4j. 在日常的开发过程中,日志使用的很频繁,我们可以利用日志来跟踪程序的错误,程序运行时的输出参数等 ...
Druid使用起步—在javaWeb项目中配置监控连接池
当我们在javaWEB项目中使用到druid来作为我们的连接池的时候,一定不会忘了添加监控功能.下面我们就来看一下,在一个简单的web项目中(尚未使用任何框架)我们是如果来配置我们的web.xml来完 ...
JavaWeb项目中web.xml有关servlet的基本配置
JavaWeb项目中web.xml有关servlet的基本配置: 我们注意到,tomcat下的conf中也有一个web.xml文件,没错的,所有的JavaWeb项目中web.xml都继承自服务器下的w ...
ElasticSearch搜索引擎在JavaWeb项目中的应用
近几篇ElasticSearch系列: 1.阿里云服务器Linux系统安装配置ElasticSearch搜索引擎 2.Linux系统中ElasticSearch搜索引擎安装配置Head插件 3.Ela ...
通过调用API在JavaWeb项目中实现证件识别
本文详细介绍自己如何在JavaWeb项目中通过调用API实现证件识别. 一,Face++使用简介二,两种方式(图片URL与本地上传)实现证件识别一,Face++使用简介 Face++旷视人工智能开 ...
Javaweb项目中出现java.sql.SQLException: The server time zone value '�й��׼ʱ��' is unrecognized or represents more than one time zone.异常
javaweb项目中java.sql.SQLException: The server time zone value '�й��׼ʱ��' is unrecognized or represent ...
透彻分析和解决一切javaWeb项目乱码问题
前言乱码是我们在程序开发中经常碰到且让人头疼的一件事,尤其是我们在做javaweb开发,如果我们没有清楚乱码产生的原理,碰到乱码问题了就容易摸不着头脑,无从下手. 乱码主要出现在两部分,如下: 第一 ...
使用 FP-growth 算法高效挖掘海量数据中的频繁项集
前言对于如何发现一个数据集中的频繁项集,前文讲解的经典 Apriori 算法能够做到. 然而,对于每个潜在的频繁项,它都要检索一遍数据集,这是比较低效的.在实际的大数据应用中,这么做就更不好了. 本 ...

随机推荐

HDU 2018 Multi-University Training Contest 1 Triangle Partition 【YY】
传送门:http://acm.hdu.edu.cn/showproblem.php?pid=6300 Triangle Partition Time Limit: 2000/1000 MS (Java ...
mysql使用Navicat 导出和导入数据库
系统环境: Win7 x64软件准备:Navicat Premium_11.2.7简体中文版下载网址:http://www.cr173.com/soft/419023.html 现在我就向大家介绍 m ...
init/loadView/viewDidLoad/initWithNibName/awakeFromNib/initWithCoder的用法
init/loadView/viewDidLoad/viewDidUnload 这么细节的东西想来大家都不在意,平时也不会去关系,但是在面试时却常常被提到,所以了解viewController的生命周 ...
ARM MDK 编译产生：RO、RW和ZI DATA说明
1.比如编译一个工程文件,产生如下提示信息: Program Size: Code=18938 RO-data=622 RW-data=124 ZI-data=7724 RO段.RW段和ZI段要了解 ...
Context initialization failed org.springframework.beans.factory.BeanCreationException
严重: Context initialization failed org.springframework.beans.factory.BeanCreationException: Error cre ...
关于gitbash一直报：sh: __git_ps1: command not found的解决办法
curl -o ~/.git-prompt.sh https://raw.githubusercontent.com/git/git/master/contrib/completion/git-pro ...
Mint-UI 的 DatetimePicker 日期时间插件的安装与使用
简介:Mint-UI是饿了么出品的基于vue的移动端组件库(element-ui是桌面端) 官网:http://mint-ui.github.io/docs/#/zh-cn2 项目环境:vue-cli ...
button onclick实现跳转的常用方法
1.onclick="javascript:window.location.href='aa.htm' " 2.onclick="location='URL' " ...
Elasticsearch 聚合操作
数据准备: PUT /shop { "settings": { "number_of_shards": 3, "number_of_replicas& ...
php函数strtotime结合date时间修饰语的使用
下面简单介绍在项目开发中date时间函数和strtotime所遇到的问题,以及解决办法. 原文地址:小时刻个人技术博客 > http://small.aiweimeng.top/index.ph ...

关联分析FPGrowth算法在JavaWeb项目中的应用

关联分析FPGrowth算法在JavaWeb项目中的应用的更多相关文章

随机推荐

热门专题