数据挖掘中ID3算法实现zz

id3

function D = ID3(train_features, train_targets, params, region)

% Classify using Quinlan's ID3 algorithm

% Inputs:

% features - Train features

% targets     - Train targets

% params - [Number of bins for the data, Percentage of incorrectly assigned samples at a node]

% region     - Decision region vector: [-x x -y y number_of_points]

%

% Outputs

% D - Decision sufrace

[Ni, M]    = size(train_features);

%Get parameters

[Nbins, inc_node] = process_params(params);

inc_node    = inc_node*M/100;

%For the decision region

N           = region(5);

mx          = ones(N,1) * linspace (region(1),region(2),N);

my          = linspace (region(3),region(4),N)' * ones(1,N);

flatxy      = [mx(:), my(:)]';

%Preprocessing

[f, t, UW, m]      = PCA(train_features, train_targets, Ni, region);

train_features  = UW * (train_features - m*ones(1,M));;

flatxy          = UW * (flatxy - m*ones(1,N^2));;

%First, bin the data and the decision region data

[H, binned_features]= high_histogram(train_features, Nbins, region);

[H, binned_xy]      = high_histogram(flatxy, Nbins, region);

%Build the tree recursively

disp('Building tree')

tree        = make_tree(binned_features, train_targets, inc_node, Nbins);

%Make the decision region according to the tree

disp('Building decision surface using the tree')

targets = use_tree(binned_xy, 1:N^2, tree, Nbins, unique(train_targets));

D = reshape(targets,N,N);

%END

function targets = use_tree(features, indices, tree, Nbins, Uc)

%Classify recursively using a tree

targets = zeros(1, size(features,2));

if (size(features,1) == 1),

    %Only one dimension left, so work on it

    for i = 1:Nbins,

        in = indices(find(features(indices) == i));

        if ~isempty(in),

            if isfinite(tree.child(i)),

                targets(in) = tree.child(i);

            else

                %No data was found in the training set for this bin, so choose it randomally

                n           = 1 + floor(rand(1)*length(Uc));

                targets(in) = Uc(n);

            end

        end

    end

    break

end

%This is not the last level of the tree, so:

%First, find the dimension we are to work on

dim = tree.split_dim;

dims= find(~ismember(1:size(features,1), dim));

%And classify according to it

for i = 1:Nbins,

    in      = indices(find(features(dim, indices) == i));

    targets = targets + use_tree(features(dims, :), in, tree.child(i), Nbins, Uc);

end

%END use_tree 

function tree = make_tree(features, targets, inc_node, Nbins)

%Build a tree recursively

[Ni, L]     = size(features);

Uc          = unique(targets);

%When to stop: If the dimension is one or the number of examples is small

if ((Ni == 1) | (inc_node > L)),

    %Compute the children non-recursively

    for i = 1:Nbins,

        tree.split_dim  = 0;

        indices         = find(features == i);

        if ~isempty(indices),

            if (length(unique(targets(indices))) == 1),

                tree.child(i) = targets(indices(1));

            else

                H               = hist(targets(indices), Uc);

                [m, T]          = max(H);

                tree.child(i)   = Uc(T);

            end

        else

            tree.child(i)   = inf;

        end

    end

    break

end

%Compute the node's I

for i = 1:Ni,

    Pnode(i) = length(find(targets == Uc(i))) / L;

end

Inode = -sum(Pnode.*log(Pnode)/log(2));

%For each dimension, compute the gain ratio impurity

delta_Ib    = zeros(1, Ni);

P           = zeros(length(Uc), Nbins);

for i = 1:Ni,

    for j = 1:length(Uc),

        for k = 1:Nbins,

            indices = find((targets == Uc(j)) & (features(i,:) == k));

            P(j,k)  = length(indices);

        end

    end

    Pk          = sum(P);

    P           = P/L;

    Pk          = Pk/sum(Pk);

    info        = sum(-P.*log(eps+P)/log(2));

    delta_Ib(i) = (Inode-sum(Pk.*info))/-sum(Pk.*log(eps+Pk)/log(2));

end

%Find the dimension minimizing delta_Ib

[m, dim] = max(delta_Ib);

%Split along the 'dim' dimension

tree.split_dim = dim;

dims           = find(~ismember(1:Ni, dim));

for i = 1:Nbins,

    indices       = find(features(dim, :) == i);

    tree.child(i) = make_tree(features(dims, indices), targets(indices), inc_node, Nbins);

end

数据挖掘中ID3算法实现zz的更多相关文章

数据挖掘之决策树ID3算法（C#实现）
决策树是一种非常经典的分类器,它的作用原理有点类似于我们玩的猜谜游戏.比如猜一个动物: 问:这个动物是陆生动物吗? 答:是的. 问:这个动物有鳃吗? 答:没有. 这样的两个问题顺序就有些颠倒,因为一般 ...
决策树-预测隐形眼镜类型（ID3算法，C4.5算法，CART算法，GINI指数,剪枝，随机森林）
1. 1.问题的引入 2.一个实例 3.基本概念 4.ID3 5.C4.5 6.CART 7.随机森林 2. 我们应该设计什么的算法,使得计算机对贷款申请人员的申请信息自动进行分类,以决定能否贷款? ...
Python实现ID3算法
自己用Python写的数据挖掘中的ID3算法,现在觉得Python是实现算法的最好工具: 先贴出ID3算法的介绍地址http://wenku.baidu.com/view/cddddaed0975f4 ...
机器学习之决策树(ID3)算法与Python实现
机器学习之决策树(ID3)算法与Python实现机器学习中,决策树是一个预测模型:他代表的是对象属性与对象值之间的一种映射关系.树中每个节点表示某个对象,而每个分叉路径则代表的某个可能的属性值,而每 ...
决策树 -- ID3算法小结
ID3算法(Iterative Dichotomiser 3 迭代二叉树3代),是一个由Ross Quinlan发明的用于决策树的算法:简单理论是越是小型的决策树越优于大的决策树. 算法归 ...
机器学习笔记----- ID3算法的python实战
本文申明:本文原创,如有转载请申明.数据代码来自实验数据都是来自[美]Peter Harrington 写的<Machine Learning in Action>这本书,侵删. Hell ...
决策树笔记：使用ID3算法
决策树笔记:使用ID3算法决策树笔记:使用ID3算法机器学习先说一个偶然的想法:同样的一堆节点构成的二叉树,平衡树和非平衡树的区别,可以认为是"是否按照重要度逐渐降低"的顺序 ...
paper 56 ：机器学习中的算法：决策树模型组合之随机森林（Random Forest）
周五的组会如约而至,讨论了一个比较感兴趣的话题,就是使用SVM和随机森林来训练图像,这样的目的就是在图像特征之间建立内在的联系,这个model的训练,着实需要好好的研究一下,下面是我们需要准备的入门 ...
ID3算法决策树的生成（2）
# coding:utf-8 import matplotlib.pyplot as plt import numpy as np import pylab def createDataSet(): ...

随机推荐

ringojs 使用rp 包管理web 应用依赖
rp 是一个类似npm 的包管理工具,我们可以使用package.json 定义依赖然后执行rp install 即可,类似ringo-admin 的功能其中packages 类似node 的no ...
hadoop深入研究:(十三)——序列化框架
hadoop深入研究:(十三)--序列化框架 Mapreduce之序列化框架(转自http://blog.csdn.net/lastsweetop/article/details/9376495) 框 ...
redis Linux 、Windows ubuntu 下的安装
Redis 安装 2018-07-05 Window 下安装下载地址:https://github.com/MSOpenTech/redis/releases. Redis 支持 32 位和 64 ...
Ubuntu14.04安装有道词典(openyoudao)
1. Openyoudao介绍 Openyoudao是有道字典在linux下的客户端,在取词翻译的基础上,对查询到的信息进行有效的整合.目前已经发布了0.4版本,新增了google翻译功能,可提供72 ...
There is no Action mapped for namespace / and action name login. - [unknown location]
(自己在浏览器中,直接进入项目的根目录,即 http://localhost:8080/ssh/ 时便报错,web.xml文件已经配置了欢迎页面 <welcome-file-list> ...
vue打包优化
网站首页第一次加载很慢,优化过后从十多二十秒缩短到了几秒,主要是打包的时候按需加载了,然后使用了gzip压缩. 这是优化之前的发现vendor特别大,所有引用的第三方库都会打到这个包里面;另外就是之 ...
从后台读取项目文件在前端iframe中展示
项目中有个需求是: 对于外部提供的前端项目,包含css.js.html.图片等的项目,将这个项目存进数据库,然后iframe中展示html,然后html中引用的js.css等文件也能从数据库中读取并 ...
Bootstrap-Plugin：过渡效果（Transition）插件
ylbtech-Bootstrap-Plugin:过渡效果(Transition)插件 1.返回顶部 1. Bootstrap 过渡效果(Transition)插件过渡效果(Transition)插 ...
[Java.web]EL表达式
<%@page import="cn.itcast.domain.Address"%> <%@page import="cn.itcast.domain ...
5月31日上课笔记-Mysql简介
一.mysql 配置mysql环境变量 path中添加 D:\Program Files\MySQL\MySQL Server 5.7\bin cmd命令: 登录:mysql -uroot -p 退出 ...

数据挖掘中ID3算法实现zz

数据挖掘中ID3算法实现zz的更多相关文章

随机推荐

热门专题