Easy machine learning pipelines with pipelearner: intro and call for contributors
@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.
This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).
Examples
Some setup
library(pipelearner)
library(tidyverse)
library(nycflights13)
# Help functions
r_square <- function(model, data) {
actual <- eval(formula(model)[[2]], as.data.frame(data))
residuals <- predict(model, data) - actual
1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE))
}
add_rsquare <- function(result_tbl) {
result_tbl %>%
mutate(rsquare_train = map2_dbl(fit, train, r_square),
rsquare_test = map2_dbl(fit, test, r_square))
}
# Data set
d <- weather %>%
select(visib, humid, precip, wind_dir) %>%
drop_na() %>%
sample_n(2000)
# Set theme for plots
theme_set(theme_minimal())
k-fold cross validation
results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_cvpairs(k = 10) %>%
learn()
results %>%
add_rsquare() %>%
select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(cv_pairs.id, rsquare, color = source)) +
geom_point() +
labs(x = "Fold",
y = "R Squared")

Learning curves
results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_curves(seq(.1, 1, .1)) %>%
learn()
results %>%
add_rsquare() %>%
select(train_p, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(train_p, rsquare, color = source)) +
geom_line() +
geom_point(size = 2) +
labs(x = "Proportion of training data used",
y = "R Squared")

Grid Search
results <- d %>%
pipelearner(rpart::rpart, visib ~ .,
minsplit = c(2, 50, 100),
cp = c(.005, .01, .1)) %>%
learn()
results %>%
mutate(minsplit = map_dbl(params, ~ .$minsplit),
cp = map_dbl(params, ~ .$cp)) %>%
add_rsquare() %>%
select(minsplit, cp, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source),
minsplit = paste("minsplit", minsplit, sep = "\n"),
cp = paste("cp", cp, sep = "\n")) %>%
ggplot(aes(source, rsquare, fill = source)) +
geom_col() +
facet_grid(minsplit ~ cp) +
guides(fill = "none") +
labs(x = NULL, y = "R Squared")

Model comparisons
results <- d %>%
pipelearner() %>%
learn_models(
c(lm, rpart::rpart, randomForest::randomForest),
visib ~ .) %>%
learn()
results %>%
add_rsquare() %>%
select(model, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(model, rsquare, fill = source)) +
geom_col(position = "dodge", size = .5) +
labs(x = NULL, y = "R Squared") +
coord_flip()

Sign off
Thanks for reading and I hope this was useful for you.
For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.
If you’d like the code that produced this blog, check out the blogR GitHub repository.
转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors
Easy machine learning pipelines with pipelearner: intro and call for contributors的更多相关文章
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】
转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...
- 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总
<Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)
##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料(下)
转载:http://www.jianshu.com/p/b73b6953e849 该资源的github地址:Qix <Statistical foundations of machine lea ...
- Intro to Machine Learning
本节主要用于机器学习入门,介绍两个简单的分类模型: 决策树和随机森林 不涉及内部原理,仅仅介绍基础的调用方法 1. How Models Work 以简单的决策树为例 This step of cap ...
- Advice for applying Machine Learning
https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...
- 壁虎书2 End-to-End Machine Learning Project
the main steps: 1. look at the big picture 2. get the data 3. discover and visualize the data to gai ...
随机推荐
- asp.net core源码飘香:Options组件
简介: Options组件是一个小组件,但用的地方很多.它本质是将一个POCO类注册到容器中(主要在Startup中作为其他组件的配置功能提供),后续使用的时候就可以通过比如构造函数注入等获取到POC ...
- BS结构中,web如何将数据进行DES加密并写道IC卡中
在IC卡应用系统中,一般都要对IC卡数据进行DES加密,以保证数据的安全.友我科技RFID读写器云服务2.0充分考虑了这个需求,只需要软件工程师简单的配置即可实现数据的加解密并且写到数据块中.如下图所 ...
- 跟着刚哥梳理java知识点——深入理解String类(九)
一.String类 想要了解一个类,最好的办法就是看这个类的实现源代码,来看一下String类的源码: public final class String implements java.io.Ser ...
- DOM 待编辑
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...
- TCP/IP笔记(六)TCP与UDP
终于来到了传输层,这个面试问的最多了,内容比较多,要分两篇来总结,这是第一篇
- Css清除浮动最优方式之一
---恢复内容开始--- .container:before, .container:after { display: table; content: " "; } .contai ...
- Spring Cloud搭建微服务架构----前言
前言 微服务并不神秘,只是在互联网技术发展过程中的一个产物,整个架构系统随着客户端的多样性,服务越来越多,devops的发展而产生的架构变种. 许多公司,通过采用微处理结构模式解决单体应用的问题,分解 ...
- React Starter Kit 中文文档
最近没事又翻译了个玩意. Github上的一个Star 非常高的 React 样板程序. 由Node.js,Express,GraphQL和React构建,可选加入Redux等,并可以包含Webpac ...
- hyper-v使用wifi链接网络
公司了给本屌一个thinkpad笔记本,10G内存.想不出拿来干什么...装了一个win8.1_64位,cf,qq,hyper-v. 昨天第一次玩hyper-v新建了的时候选择“第二代”坑爹就开始了, ...
- Python全栈之路-Day31
1 反射 反射的精髓是通过字符串去获取对象属性的值 1.1 基于类和对象反射的属性 #!/usr/bin/env python # __Author__: "wanyongzhen" ...