@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).

Examples

Some setup

library(pipelearner)
library(tidyverse)
library(nycflights13) # Help functions
r_square <- function(model, data) {
actual <- eval(formula(model)[[2]], as.data.frame(data))
residuals <- predict(model, data) - actual
1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE))
}
add_rsquare <- function(result_tbl) {
result_tbl %>%
mutate(rsquare_train = map2_dbl(fit, train, r_square),
rsquare_test = map2_dbl(fit, test, r_square))
} # Data set
d <- weather %>%
select(visib, humid, precip, wind_dir) %>%
drop_na() %>%
sample_n(2000) # Set theme for plots
theme_set(theme_minimal())

k-fold cross validation

results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_cvpairs(k = 10) %>%
learn() results %>%
add_rsquare() %>%
select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(cv_pairs.id, rsquare, color = source)) +
geom_point() +
labs(x = "Fold",
y = "R Squared")

Learning curves

results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_curves(seq(.1, 1, .1)) %>%
learn() results %>%
add_rsquare() %>%
select(train_p, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(train_p, rsquare, color = source)) +
geom_line() +
geom_point(size = 2) +
labs(x = "Proportion of training data used",
y = "R Squared")

Grid Search

results <- d %>%
pipelearner(rpart::rpart, visib ~ .,
minsplit = c(2, 50, 100),
cp = c(.005, .01, .1)) %>%
learn() results %>%
mutate(minsplit = map_dbl(params, ~ .$minsplit),
cp = map_dbl(params, ~ .$cp)) %>%
add_rsquare() %>%
select(minsplit, cp, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source),
minsplit = paste("minsplit", minsplit, sep = "\n"),
cp = paste("cp", cp, sep = "\n")) %>%
ggplot(aes(source, rsquare, fill = source)) +
geom_col() +
facet_grid(minsplit ~ cp) +
guides(fill = "none") +
labs(x = NULL, y = "R Squared")

Model comparisons

results <- d %>%
pipelearner() %>%
learn_models(
c(lm, rpart::rpart, randomForest::randomForest),
visib ~ .) %>%
learn() results %>%
add_rsquare() %>%
select(model, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(model, rsquare, fill = source)) +
geom_col(position = "dodge", size = .5) +
labs(x = NULL, y = "R Squared") +
coord_flip()

Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors

Easy machine learning pipelines with pipelearner: intro and call for contributors的更多相关文章

  1. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  2. 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  3. 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  4. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  5. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  6. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(下)

    转载:http://www.jianshu.com/p/b73b6953e849 该资源的github地址:Qix <Statistical foundations of machine lea ...

  7. Intro to Machine Learning

    本节主要用于机器学习入门,介绍两个简单的分类模型: 决策树和随机森林 不涉及内部原理,仅仅介绍基础的调用方法 1. How Models Work 以简单的决策树为例 This step of cap ...

  8. Advice for applying Machine Learning

    https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...

  9. 壁虎书2 End-to-End Machine Learning Project

    the main steps: 1. look at the big picture 2. get the data 3. discover and visualize the data to gai ...

随机推荐

  1. H5学习第四周

    本周.我们结束了HTML标签和css样式部分,开始了JS的学习.JS是的内容和css,html基本上没有什么联系而且它比较需要逻辑思考能力,所以要从新开始学习. 使用js的三种方式: 1.html标签 ...

  2. VS Code 的常用快捷键

    VS Code 的常用快捷键和插件 一.vs code 的常用快捷键 1.注释: a) 单行注释:[ctrl+k,ctrl+c] 或 ctrl+/ b) 取消单行注释:[ctrl+k,ctrl+u] ...

  3. 模拟退火算法(SA)求解TSP 问题(C语言实现)

    这篇文章是之前写的智能算法(遗传算法(GA).粒子群算法(PSO))的补充.其实代码我老早之前就写完了,今天恰好重新翻到了,就拿出来给大家分享一下,也当是回顾与总结了. 首先介绍一下模拟退火算法(SA ...

  4. jmeter、java自动化学习地址

    自动化学习社区地址:http://www.hordehome.com/c/14-category jmeter学习地址(范丰平博客):http://www.cnblogs.com/fengpingfa ...

  5. Coordinator节点

    Coordinator节点 Coordinator 节点主要负责segment 的管理和分配.更具体的说,它同通过配置往historical 节点 load 或者 drop  segment .Coo ...

  6. ArrayList 进阶方法之ListIterator

    同样看的都是jdk1.8 中 ArrayList中的源码,整理测试一下而已ListIterator(int index)方法,返回指定下标(包含该下标)后的值,此时index位置的元素就是新列表迭代器 ...

  7. Python资源汇总

    Python 目录: 管理面板 算法和设计模式 反垃圾邮件 资产管理 音频 验证 构建工具 缓存 ChatOps工具 CMS 代码分析和Linter 命令行工具 兼容性 计算机视觉 并发和并行性 组态 ...

  8. 使用Blender的UV映射制作一个地球

    UV映射是一个用来2D图片纹理转换3D网格的标准技术.U和V表示平面坐标的两个轴,对应了3D空间中X.Y和Z.Blender手册是这样解释UV映射的:想象一个3D模型对象,例如一个球体,平铺到桌面上. ...

  9. VirtulBox虚拟机搭建Linux Centos系统

    简要说明 该文章目的是基于搭建hadoop的前置文章,当然也可以搭建Linux的入门文章.那我再重复一下安装准备软件. 环境准备:http://pan.baidu.com/s/1dFrHyxV  密码 ...

  10. jquery按钮绑定特殊事件

    本文主要介绍点击一个按钮处理事件的一些特殊情况和技巧. 一.第一次点击触发一个函数,之后点击都触发另一个函数 1.小白实现 2.大神实现 代码如下: <body> <button&g ...