In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.

The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.

When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.

The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).

 from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)

[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章

  1. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  2. [Machine Learning with Python] Data Visualization by Matplotlib Library

    Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...

  3. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  4. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  5. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  6. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. Sicily 8843 Ranking and Friendship

    http://soj.me/8843 题意:几个人想做好朋友,朋友之间相差位置小于等于k,且长度相同分析:排序,将长度相同的放在一起.若长度相同,第i个人能放进去的条件是位置相差下雨等于k.      ...

  2. 组装需要的json数据格式

    在实际项目中有时候会遇到一些有特殊要求的控件,比如easyui-combogrid,加载的并不是常见的json格式,这里我遇到过需要加载类似省市县这种三级数据格式.最后也是从别人的博客中学到的如何组装 ...

  3. oracle JOB 查询 添加 修改 删除

    -------------查询JOB----------------- select job, what, next_date, next_sec, sysdate, failures, broken ...

  4. StartWith 测试

    var clientConfiguration = GetConfiguration("couchbase.json"); ClusterHelper.Initialize(cli ...

  5. 踩坑 PHP Fatal Error Failed opening required File

    使用 require 引用文件时,报错如下: require 'https://dev.ryan.com/test.php'; [Sat Mar 19 23:10:50 2011] [warn] mo ...

  6. Java开发微信公众号(二)---开启开发者模式,接入微信公众平台开发

    接入微信公众平台开发,开发者需要按照如下步骤完成: 1.填写服务器配置 2.验证服务器地址的有效性 3.依据接口文档实现业务逻辑 资料准备: 1.一个可以访问的外网,即80的访问端口,因为微信公众号接 ...

  7. Eclipse 4.6(最新版本) js文件不能F3

    解决办法........我是没找到解决4.6版本的办法!不过可以换一个版本!猜想是因为 最新版本强制要求使用jdk1.8所导致的~!  换了一个4.5版本就一切Ok 换上主题一样漂亮护眼

  8. BZOJ 1050: [HAOI2006]旅行comf(枚举+并查集)

    [HAOI2006]旅行comf Description 给你一个无向图,N(N<=500)个顶点, M(M<=5000)条边,每条边有一个权值Vi(Vi<30000).给你两个顶点 ...

  9. Socket通信——服务器和客户端相互通信

    所谓socket通常也称作"套接字",用于描述IP地址和端口,是一个通信链的句柄.应用程序通常通过"套接字"向网络发出请求或者应答网络请求.  Socket和S ...

  10. NetScaler 12.1 Deploy Package

    NetScaler 12.1 Deploy Package NS_VPX_Deploy_Package 百度网盘共享地址https://pan.baidu.com/s/1OT0Hxuz6ZBLwwM5 ...