Parallel Decision Tree
Decision Tree such as C4.5 is easy to parallel. Following is an example.
This is a non-parallel version:
public void learnFromDataSet(Iterable<Sample<FK, FV, Boolean>> dataset){
for(Sample sample : dataset){
model.addSample((MapBasedBinarySample<FK, FV>)sample);
}
Queue<TreeNode<FK, FV>> Q = new LinkedList<TreeNode<FK, FV>>();
TreeNode<FK, FV> root = model.selectRootTreeNode();
model.addTreeNode(root);
Q.add(root);
while (!Q.isEmpty()){
TreeNode v = Q.poll();
if(v.getDepth() >= model.getMaxDepth()){
continue;
}
FeatureSplit<FK> featureSplit = model.selectFeature(v);
if(featureSplit.getFeatureId() == null){
continue;
}
v.setFeatureSplit(featureSplit);
Pair<TreeNode<FK,FV>, TreeNode<FK, FV>> children =
model.newTreeNode(v, featureSplit);
TreeNode leftNode = children.getKey();
TreeNode rightNode = children.getValue();
if(leftNode != null
&& leftNode.getSampleSize() > model.getMinSampleSizeInNode()){
v.setLeft(leftNode);
model.addTreeNode(leftNode);
Q.add(leftNode);
}
if(rightNode != null
&& rightNode.getSampleSize() > model.getMinSampleSizeInNode()){
v.setRight(rightNode);
model.addTreeNode(rightNode);
Q.add(rightNode);
}
}
}
And this is a parallel version:
public class NodeSplitThread implements Runnable{
private TreeNode<FK, FV> node = null;
private Queue<TreeNode<FK, FV>> Q = null;
public NodeSplitThread(TreeNode<FK, FV> node, Queue<TreeNode<FK, FV>> Q){
this.node = node;
this.Q = Q;
}
@Override
public void run() {
if(node.getDepth() >= model.getMaxDepth()){
return;
}
FeatureSplit<FK> featureSplit = model.selectFeature(node);
if(featureSplit.getFeatureId() == null){
return;
}
node.setFeatureSplit(featureSplit);
Pair<TreeNode<FK,FV>, TreeNode<FK, FV>> children = model.newTreeNode(node, featureSplit);
TreeNode<FK, FV> leftNode = children.getKey();
TreeNode<FK, FV> rightNode = children.getValue();
if(leftNode != null && leftNode.getSampleSize() > model.getMinSampleSizeInNode()){
node.setLeft(leftNode);
model.addTreeNode(leftNode);
Q.add(leftNode);
}
if(rightNode != null && rightNode.getSampleSize() > model.getMinSampleSizeInNode()){
node.setRight(rightNode);
model.addTreeNode(rightNode);
Q.add(rightNode);
}
}
}
public List<TreeNode<FK, FV>> pollTopN(Queue<TreeNode<FK, FV>> Q, int n){
List<TreeNode<FK, FV>> ret = new ArrayList<TreeNode<FK, FV>>();
for(int i = 0; i < n; ++i){
if(Q.isEmpty()) break;
TreeNode<FK, FV> node = Q.poll();
ret.add(node);
}
return ret;
}
@Override
public void learnFromDataSet(Iterable<Sample<FK, FV, Boolean>> dataset){
for(Sample sample : dataset){
model.addSample((MapBasedBinarySample<FK, FV>)sample);
}
Queue<TreeNode<FK, FV>> Q = new ConcurrentLinkedQueue<TreeNode<FK, FV>>();
TreeNode<FK, FV> root = model.selectRootTreeNode();
model.addTreeNode(root);
Q.add(root);
ExecutorService threadPool = Executors.newFixedThreadPool(10);
while (!Q.isEmpty()){
List<TreeNode<FK, FV>> nodes = pollTopN(Q, 10);
List<Future> tasks = new ArrayList<Future>(nodes.size());
for(TreeNode<FK, FV> node : nodes){
Future task = threadPool.submit(new NodeSplitThread(node, Q));
tasks.add(task);
}
for(Future task : tasks){
try {
task.get();
} catch (InterruptedException e) {
continue;
} catch (ExecutionException e) {
continue;
}
}
}
threadPool.shutdown();
try {
threadPool.awaitTermination(60, TimeUnit.SECONDS);
} catch (InterruptedException e) {
threadPool.shutdownNow();
Thread.interrupted();
}
threadPool.shutdownNow();
}
http://xlvector.net/blog/?p=896
Parallel Decision Tree的更多相关文章
- Spark MLlib - Decision Tree源码分析
http://spark.apache.org/docs/latest/mllib-decision-tree.html 以决策树作为开始,因为简单,而且也比较容易用到,当前的boosting或ran ...
- 决策树Decision Tree 及实现
Decision Tree 及实现 标签: 决策树熵信息增益分类有监督 2014-03-17 12:12 15010人阅读 评论(41) 收藏 举报 分类: Data Mining(25) Pyt ...
- Gradient Boosting Decision Tree学习
Gradient Boosting Decision Tree,即梯度提升树,简称GBDT,也叫GBRT(Gradient Boosting Regression Tree),也称为Multiple ...
- 使用Decision Tree对MNIST数据集进行实验
使用的Decision Tree中,对MNIST中的灰度值进行了0/1处理,方便来进行分类和计算熵. 使用较少的测试数据测试了在对灰度值进行多分类的情况下,分类结果的正确率如何.实验结果如下. #Te ...
- Sklearn库例子1:Sklearn库中AdaBoost和Decision Tree运行结果的比较
DisCrete Versus Real AdaBoost 关于Discrete 和Real AdaBoost 可以参考博客:http://www.cnblogs.com/jcchen1987/p/4 ...
- 用于分类的决策树(Decision Tree)-ID3 C4.5
决策树(Decision Tree)是一种基本的分类与回归方法(ID3.C4.5和基于 Gini 的 CART 可用于分类,CART还可用于回归).决策树在分类过程中,表示的是基于特征对实例进行划分, ...
- OpenCV码源笔记——Decision Tree决策树
来自OpenCV2.3.1 sample/c/mushroom.cpp 1.首先读入agaricus-lepiota.data的训练样本. 样本中第一项是e或p代表有毒或无毒的标志位:其他是特征,可以 ...
- GBDT(Gradient Boosting Decision Tree)算法&协同过滤算法
GBDT(Gradient Boosting Decision Tree)算法参考:http://blog.csdn.net/dark_scope/article/details/24863289 理 ...
- Gradient Boost Decision Tree(&Treelink)
http://www.cnblogs.com/joneswood/archive/2012/03/04/2379615.html 1. 什么是Treelink Treelink是阿里集团内部 ...
随机推荐
- Android移动网络如何抓取数据包
1)下载tcpdump工具 tcpdump(dump the traffic on a network)是Linux中强大的网络数据采集分析工具之一,可以将网络中传送的数据包头完全截获下来提供分析.它 ...
- 第六篇:GPU 并行优化的几种典型策略
前言 如何对现有的程序进行并行优化,是 GPU 并行编程技术最为关注的实际问题.本文将提供几种优化的思路,为程序并行优化指明道路方向. 优化前准备 首先,要明确优化的目标 - 是要将程序提速 2 倍? ...
- php学习十一:组合
我们在类当中我往往会用到一些对象,此时的继承就无法满足我们的需求,这个时候我们需要用到组合.继承如果是is..a的关系,那么组合就是has...a的关系,直接在本类里面声明即可,不过声明的是一个对象 ...
- php学习一:语法规则
1.书写规则 在html中嵌入php的时候,需要有结束语,即<?php ...?>,在靠近结束符号的最后一个语句可以不用写分号: 但是在单独的php中,最后可以不用以?>来结尾; 2 ...
- DAG 的最短路径算法
求图中节点的单源最短路径可以使用Dijkstra,BellmanFord, SPFA算法,而对于有向无环图DAG来说,可以通过简单的动态规划来进行求解. DAG的独特之处是所有节点可以线性化( ...
- JS-【同页面多次调用】轮播特效封装-json传多个参数
看着传那么一长串的参数神烦,继续深化!——json传参: html: <div class="scrollBanner"> <ul class="ban ...
- Android 获取系统默认输入法
import android.provider.Settings; import android.text.TextUtils; 获取默认输入法包名: private String getDefaul ...
- ubuntu 创建桌面快捷方式
$sudo apt-get install gnome-panel $gnome-desktop-item-edit /home/xxx/桌面 --create-new 命令行:填入程序名称,如/u ...
- PHP内存缓存功能memcached
PHP内存缓存功能memcached: http://blog.csdn.net/fangaoxin/article/details/6223383
- ios ASIHTTPRequest类库简介和使用说明
官方网站: http://allseeing-i.com/ASIHTTPRequest/ .可以从上面下载到最新源码,以及获取到相关的资料. 使用iOS SDK中的HTTP网络请求API,相当的复杂, ...