Decision Tree such as C4.5 is easy to parallel. Following is an example.

This is a non-parallel version:

public void learnFromDataSet(Iterable<Sample<FK, FV, Boolean>> dataset){
for(Sample sample : dataset){
model.addSample((MapBasedBinarySample<FK, FV>)sample);
}
Queue<TreeNode<FK, FV>> Q = new LinkedList<TreeNode<FK, FV>>();
TreeNode<FK, FV> root = model.selectRootTreeNode();
model.addTreeNode(root);
Q.add(root);
while (!Q.isEmpty()){
TreeNode v = Q.poll();
if(v.getDepth() >= model.getMaxDepth()){
continue;
}
FeatureSplit<FK> featureSplit = model.selectFeature(v);
if(featureSplit.getFeatureId() == null){
continue;
}
v.setFeatureSplit(featureSplit);
Pair<TreeNode<FK,FV>, TreeNode<FK, FV>> children =
model.newTreeNode(v, featureSplit);
TreeNode leftNode = children.getKey();
TreeNode rightNode = children.getValue();
if(leftNode != null
&& leftNode.getSampleSize() > model.getMinSampleSizeInNode()){
v.setLeft(leftNode);
model.addTreeNode(leftNode);
Q.add(leftNode);
}
if(rightNode != null
&& rightNode.getSampleSize() > model.getMinSampleSizeInNode()){
v.setRight(rightNode);
model.addTreeNode(rightNode);
Q.add(rightNode);
}
}
}

And this is a parallel version:

public class NodeSplitThread implements Runnable{
private TreeNode<FK, FV> node = null;
private Queue<TreeNode<FK, FV>> Q = null; public NodeSplitThread(TreeNode<FK, FV> node, Queue<TreeNode<FK, FV>> Q){
this.node = node;
this.Q = Q;
}
@Override
public void run() {
if(node.getDepth() >= model.getMaxDepth()){
return;
}
FeatureSplit<FK> featureSplit = model.selectFeature(node);
if(featureSplit.getFeatureId() == null){
return;
}
node.setFeatureSplit(featureSplit);
Pair<TreeNode<FK,FV>, TreeNode<FK, FV>> children = model.newTreeNode(node, featureSplit);
TreeNode<FK, FV> leftNode = children.getKey();
TreeNode<FK, FV> rightNode = children.getValue(); if(leftNode != null && leftNode.getSampleSize() > model.getMinSampleSizeInNode()){
node.setLeft(leftNode);
model.addTreeNode(leftNode);
Q.add(leftNode);
}
if(rightNode != null && rightNode.getSampleSize() > model.getMinSampleSizeInNode()){
node.setRight(rightNode);
model.addTreeNode(rightNode);
Q.add(rightNode);
}
}
} public List<TreeNode<FK, FV>> pollTopN(Queue<TreeNode<FK, FV>> Q, int n){
List<TreeNode<FK, FV>> ret = new ArrayList<TreeNode<FK, FV>>();
for(int i = 0; i < n; ++i){
if(Q.isEmpty()) break;
TreeNode<FK, FV> node = Q.poll();
ret.add(node);
}
return ret;
} @Override
public void learnFromDataSet(Iterable<Sample<FK, FV, Boolean>> dataset){ for(Sample sample : dataset){
model.addSample((MapBasedBinarySample<FK, FV>)sample);
}
Queue<TreeNode<FK, FV>> Q = new ConcurrentLinkedQueue<TreeNode<FK, FV>>();
TreeNode<FK, FV> root = model.selectRootTreeNode();
model.addTreeNode(root);
Q.add(root);
ExecutorService threadPool = Executors.newFixedThreadPool(10);
while (!Q.isEmpty()){
List<TreeNode<FK, FV>> nodes = pollTopN(Q, 10);
List<Future> tasks = new ArrayList<Future>(nodes.size());
for(TreeNode<FK, FV> node : nodes){
Future task = threadPool.submit(new NodeSplitThread(node, Q));
tasks.add(task);
}
for(Future task : tasks){
try {
task.get();
} catch (InterruptedException e) {
continue;
} catch (ExecutionException e) {
continue;
}
}
}
threadPool.shutdown();
try {
threadPool.awaitTermination(60, TimeUnit.SECONDS);
} catch (InterruptedException e) {
threadPool.shutdownNow();
Thread.interrupted();
}
threadPool.shutdownNow();
}

http://xlvector.net/blog/?p=896

Parallel Decision Tree的更多相关文章

  1. Spark MLlib - Decision Tree源码分析

    http://spark.apache.org/docs/latest/mllib-decision-tree.html 以决策树作为开始,因为简单,而且也比较容易用到,当前的boosting或ran ...

  2. 决策树Decision Tree 及实现

    Decision Tree 及实现 标签: 决策树熵信息增益分类有监督 2014-03-17 12:12 15010人阅读 评论(41) 收藏 举报  分类: Data Mining(25)  Pyt ...

  3. Gradient Boosting Decision Tree学习

    Gradient Boosting Decision Tree,即梯度提升树,简称GBDT,也叫GBRT(Gradient Boosting Regression Tree),也称为Multiple ...

  4. 使用Decision Tree对MNIST数据集进行实验

    使用的Decision Tree中,对MNIST中的灰度值进行了0/1处理,方便来进行分类和计算熵. 使用较少的测试数据测试了在对灰度值进行多分类的情况下,分类结果的正确率如何.实验结果如下. #Te ...

  5. Sklearn库例子1:Sklearn库中AdaBoost和Decision Tree运行结果的比较

    DisCrete Versus Real AdaBoost 关于Discrete 和Real AdaBoost 可以参考博客:http://www.cnblogs.com/jcchen1987/p/4 ...

  6. 用于分类的决策树(Decision Tree)-ID3 C4.5

    决策树(Decision Tree)是一种基本的分类与回归方法(ID3.C4.5和基于 Gini 的 CART 可用于分类,CART还可用于回归).决策树在分类过程中,表示的是基于特征对实例进行划分, ...

  7. OpenCV码源笔记——Decision Tree决策树

    来自OpenCV2.3.1 sample/c/mushroom.cpp 1.首先读入agaricus-lepiota.data的训练样本. 样本中第一项是e或p代表有毒或无毒的标志位:其他是特征,可以 ...

  8. GBDT(Gradient Boosting Decision Tree)算法&协同过滤算法

    GBDT(Gradient Boosting Decision Tree)算法参考:http://blog.csdn.net/dark_scope/article/details/24863289 理 ...

  9. Gradient Boost Decision Tree(&Treelink)

    http://www.cnblogs.com/joneswood/archive/2012/03/04/2379615.html 1.      什么是Treelink Treelink是阿里集团内部 ...

随机推荐

  1. 如何将ppt转换为高清图片?

    PPT2010版本直接提供了“另存为”图片的功能,但另存为后的图片清晰度不够,这是因为office提供的默认点每英寸点数 (dpi)为96dpi,也就是说图片的尺寸为960x720像素,通过注册表可以 ...

  2. JAVA语言基础内部测试题(50道选择题)

    JAVA语言基础内部测试题 选择题(针对以下题目,请选择最符合题目要求的答案,针对每一道题目,所有答案都选对,则该题得分,所选答案错误或不能选出所有答案,则该题不得分.)(每题2分) 没有注明选择几项 ...

  3. 超全面的JavaWeb笔记day19<Service>

    今日内容 l Service事务 l 客户关系管理系统 Service事务 在Service中使用ThreadLocal来完成事务,为将来学习Spring事务打基础! 1 DAO中的事务 在DAO中处 ...

  4. POJ 1141 Brackets Sequence(区间DP, DP打印路径)

    Description We give the following inductive definition of a “regular brackets” sequence: the empty s ...

  5. 【RF库Collections测试】Copy Dictionary

    Name: Copy DictionarySource:Collections <test library>Arguments:[ dictionary ]Returns a copy o ...

  6. [转载]SQL truncate 、delete与drop区别

    相同点: 1.truncate和不带where子句的delete.以及drop都会删除表内的数据. 2.drop.truncate都是DDL语句(数据定义语言),执行后会自动提交. 不同点: 1. t ...

  7. JAXB简单样例

    参考网页:http://www.mkyong.com/java/jaxb-hello-world-example/JAXB完整教程:https://jaxb.java.net/tutorial/1.J ...

  8. JS控制元素可见(显示)与不可见(隐藏)

    方法一: document.getElementById("id").style.visibility="hidden"; document.getElemen ...

  9. UVa 10905 - Children's Game(求多个正整数排列后,所得的新的数字的极值)

    4thIIUCInter-University Programming Contest, 2005 A Children’s Game Input: standard input Output: st ...

  10. java基础---->java多线程之Join(二)

    如果主线程想等待子线程执行完成之后再结束,就可以使用join方法了.它的使用是等待线程对象销毁.今天我们就通过实例来学习一下多线程中join方法的使用.草在结它的种子,风在摇它的叶子.我们站着,不说话 ...