想学习一下SVM,所以找到了LIBSVM--A Library for Support Vector Machines,首先阅读了一下网站提供的A practical guide to SVM classification.

写一写个人认为主要的精华的东西。

SVMs is:a technique for data classification  

Goal is:to produce a model (based on training data) which predicts the target values of the test data given only the test data attributes.

Kernels:four basic kernels

Proposed Procedure:

1.transform data to the format of an SVM package

  first have to convert categorical attributes into numeric data.We recommend using m numbers to represent an m-category attribute and only one of the m numbers is one,and others are zeros. for example {red,green,blue} can be represented as (0,0,1),(0,1,0)and(1,0,0).

2.conduct simple scaling on the data

  Note:It's importance to use the same scaling factors for training and testing sets.

3.consider the RBF kernel K(x,y) = e-r||x-y||2

4.use cross-validation to find the best parameter C and r

  The cross-validation produce can prevent the overfitting problem.We recommend a "grid-search" on C and r using cross-validation.Various pairs of (C,r)values are tried and the one with the best cross-validation accuarcy is picked.Use a coarse grid to make a better region on the grid,a finer grid search on that region can be conducted.

  For very large data sets a feasible approach is to randomly choose a subset of the data set,conduct grid-search on them,and then do a better-region-only grid-search on the completly data set.

5.use the best parameter C and r to train the whole training set

6.Test

When to use Linear but not RBF Kernel ?

  If the number of features is large, one may not need to map data to a higher dimensional space. That is, the nonlinear mapping does not improve the performance.Using the linear kernel is good enough, and one only searches for the parameter C.

  C.1 Number of instances number of features  

    when the number of features is very large, one may not need to map the data.

  C.2 Both numbers of instances and features are large

    Such data often occur in document classication.LIBLINEAR is much faster than LIBSVM to obtain a model with comparable accuracy.LIBLINEAR is efficient for large-scale document classication.

  C.3 Number of instances number of features

    As the number of features is small, one often maps data to higher dimensional spaces(i.e., using nonlinear kernels).

Learn LIBSVM---a practical Guide to SVM classification的更多相关文章

  1. [笔记]A Practical Guide to Support Vector Classi cation

    <A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: ( ...

  2. A Practical Guide to Support Vector Classi cation

    <A Practical Guide to Support Vector Classication>是一篇libSVM使用入门教程以及一些实用技巧. 1. Basic Kernels: ( ...

  3. A Practical Guide to Distributed Scrum - 分布式Scrum的实用指南 - 读书笔记

    最近读了这本IBM出的<A Practical Guide to Distributed Scrum>(分布式Scrum的实用指南),书中的章节结构比较清楚,是针对Scrum项目进行,一个 ...

  4. 信号处理的好书Digital Signal Processing - A Practical Guide for Engineers and Scientists

    诚心给大家推荐一本讲信号处理的好书<Digital Signal Processing - A Practical Guide for Engineers and Scientists>[ ...

  5. 【SVM】A Practical Guide to Support Vector Classi cation

    零.简介 一般认为,SVM比神经网络要简单. 优化目标:

  6. Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 1

    转自: http://www.confluent.io/blog/stream-data-platform-1/ These days you hear a lot about "strea ...

  7. The Practical Guide to Empathy Maps: 10-Minute User Personas

    That’s where the empathy map comes in. When created correctly, empathy maps serve as the perfect lea ...

  8. Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform-part 2

    转自: http://confluent.io/blog/stream-data-platform-2          http://www.infoq.com/cn/news/2015/03/ap ...

  9. Parsing techniques: a practical guide下载

    轮子哥隆重推荐的书,一行代码.一句公式都没有,但是却什么都讲明白了的:<Parsing Techniques>.第一版官网免费下载,第二版多出来的东西你们用不上不用看了.全书只讲parsi ...

随机推荐

  1. C++----练习--整型赋值时的溢出

    1.如果所赋的值超出了类型的取值范围.那么只保留最低位 #include<iostream> int main() { ; //unsigned char c = 256; 有无符号都是一 ...

  2. ansilbe 入门001、ansible的介绍

    概述: ansible 作为一个配置管理工具.首先我们要“告诉”它管理的是那几台机器啊:而这个信息就在要ansible 的配置文件中体现了.默认情况下ansible的配置文件保存在 /etc/ansi ...

  3. sqlserver系统表操作

    查询表名中包含‘user’的方法Select * From sysobjects Where name like '%user%' 如果知道列名,想查找包含有该列的表名,可加上系统表syscolumn ...

  4. wordpress教程之WP_Query()类

    WP_Query的使用方法 在讲WP_Query之前我们要先区分一下两个名词: WP_Query是WordPress自带的的一个用于处理复杂请求的类(这里的请求的内容不仅包括文章,还可能是页面,用户, ...

  5. png透明图片

    2. JS处理 使用DD_belatedPNG(http://www.dillerdesign.com/experiment/DD_belatedPNG/),可以很简单的对界面上所有的透明图片进行同一 ...

  6. Windows7安装SQL Server 2008图解

    不知是什么时候,把这篇博客给删除了,今天才发现,想恢复好像又不行,所以重新发布一下吧! 这几天因为需要,一直想安装SQL Server 2008来作为Web后台的数据库进行些实验,但总是没有时间,今天 ...

  7. 转:ReportViewer控件使用方法

    a. ReportViewer关联Report1.rdlc的简单呈现b. 对带有报表参数的Report1.rdlc的呈现c. 利用程式生成的DataSet 填充报表d. 调用存储过程 生成DataSe ...

  8. 类linux系统/proc/sysrq-trigger文件功能作用

    立即重启计算机      echo "b" > /proc/sysrq-trigger 立即关闭计算机      echo "o" > /proc/ ...

  9. EasyUI DataGrid编辑单元格时使用combogrid

    仅提供一段columns配置代码供参考: conditions对象是一个已赋值的数组对象集合.下拉框数据可直接使用conditions数据,也可以通过url获取. columns : [[ { fie ...

  10. 格而知之9:一些关于GCD的笔记

    1.最近在重读当年刚开始学习多线程时的笔记,发觉其中有一些地方还是比较容易模糊,于是整理这篇笔记记录一下. 执行方式和队列 2.队列用来存放管理要执行的任务,它分为并发队列(Concurrent Di ...