In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.

The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.

When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.

The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).

 from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)

[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章

  1. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  2. [Machine Learning with Python] Data Visualization by Matplotlib Library

    Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...

  3. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  4. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  5. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  6. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. 2018年湘潭大学程序设计竞赛 E 吃货

    题目描述 作为一个标准的吃货,mostshy又打算去联建商业街觅食了.混迹于商业街已久,mostshy已经知道了商业街的所有美食与其价格,而且他给每种美食都赋予了一个美味度,美味度越高表示他越喜爱这种 ...

  2. Session只读的影响

    .net中使用Session必须实现IRequiresSessionState接口,不过还有个只读的接口IReadOnlySessionState, 若是实现只读的接口,那么在该页面(如一般处理程序) ...

  3. tomcat8+idea远程调试

    window下 setenv.bat增加 set JPDA_OPTS=-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n lin ...

  4. C语言用一维数组打印杨辉三角(原:无意中想到)

    本贴地址 ] = { }; a[] = , a[] = ; int i, j,m; ; i <= ; i++) //2-11 输出10行 { ; j > ; j--) //关键在这句,倒着 ...

  5. complex类的定义、实现

    复数类complex的定义.实现(求模.复数加法) #include <iostream> #include <cmath> using namespace std; clas ...

  6. 一道题目关于Java类加载

    public class B { public static B t1 = new B(); public static B t2 = new B(); { System.out.println(&q ...

  7. couchbase map reduce

    map function(){emit(null,2);} reduce function(key, values, rereduce){ var response = {"a": ...

  8. 利用python多线程模块实现模拟接口并发

    import requestsimport jsonimport threadingimport timeimport uuid class postrequests(): def __init__( ...

  9. 并发编程——多进程——multiprocessing开启进程的方式及其属性(3)

    开启进程的两种方式——Process 方式一:函数方法 from multiprocessing import Process import time def task(name): print('% ...

  10. Map-Reduce基础

    1.设置文件读入分隔符 默认按行读入; 按句子读入 : conf1.set("textinputformat.record.delimiter", "."); ...