In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.

The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.

When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.

The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).

  1. from sklearn.pipeline import Pipeline
  2. from sklearn.preprocessing import StandardScaler
  3.  
  4. num_pipeline = Pipeline([
  5. ('imputer', SimpleImputer(strategy="median")),
  6. ('attribs_adder', CombinedAttributesAdder()),
  7. ('std_scaler', StandardScaler()),
  8. ])
  9.  
  10. try:
  11. from sklearn.compose import ColumnTransformer
  12. except ImportError:
  13. from future_encoders import ColumnTransformer # Scikit-Learn < 0.20
  14.  
  15. num_attribs = list(housing_num)
  16. cat_attribs = ["ocean_proximity"]
  17.  
  18. full_pipeline = ColumnTransformer([
  19. ("num", num_pipeline, num_attribs),
  20. ("cat", OneHotEncoder(), cat_attribs),
  21. ])
  22.  
  23. housing_prepared = full_pipeline.fit_transform(housing)

[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章

  1. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  2. [Machine Learning with Python] Data Visualization by Matplotlib Library

    Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...

  3. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  4. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  5. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  6. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. Kubernetes添加带Quota限额的CephFS StorageClass

    1. 在Ceph上为Kubernetes创建一个文件系统 # ceph osd pool create cephfs_data # ceph osd pool create cephfs_metada ...

  2. WPF触控程序开发(四)——MultiTouchVista_-_second_release_-_refresh_2的救赎

    起源 Multitouch是一款可用于Win7模拟触摸屏幕的开源软件(关于它的使用介绍),最后一次更新是在11年5月份,我是13年初开始用的,当时开发了一款类似IPhone相册的图片展示触控程序,就是 ...

  3. 南邮部分wp

    MYSQL 打开robots.txt 鍒お寮€蹇冿紝flag涓嶅湪杩欙紝杩欎釜鏂囦欢鐨勭敤閫斾綘鐪嬪畬浜嗭紵 鍦–TF姣旇禌涓紝杩欎釜鏂囦欢寰€寰€瀛樻斁鐫€鎻愮ず淇℃伅 这一看乱码,放到新建tx ...

  4. Python+Selenium练习篇之10-刷新当前页面

    本文介绍如何调用webdriver中刷新页面的方法. 相关脚本代码如下: # coding=utf-8import timefrom selenium import webdriver driver ...

  5. dib build ipa image Injection password

    针对dib制作的deploy image,注入密码有两种方式: devuser/dynamic-login .对应 dib 添加密码,是通过 dynamic-login element 来完成的. 首 ...

  6. Vue在tradingView遇到的问题

    K线图刷新或重新加载时闪白 首先需要了解的是,闪白是 iframe的机制 所以只要解决掉iframe就可以了 首先找到 charting_library.min.js 搜索 找到配置项 style=& ...

  7. mybatis maven 代码生成器(mysql)

    pom.xml <?xml version="1.0" encoding="UTF-8"?> <project xmlns="htt ...

  8. 在iBatis中操作Blob数据类型

    这里的Blob数据类型指的是保存了文本的blob数据类型 直接读取blob类型存储的文本,可能会出现乱码,所以需要读取完后进行手动转码 这里使用ibatis作为持久层 SELECT urlconten ...

  9. 【bzoj2789】[Poi2012]Letters 树状数组求逆序对

    题目描述 给出两个长度相同且由大写英文字母组成的字符串A.B,保证A和B中每种字母出现的次数相同. 现在每次可以交换A中相邻两个字符,求最少需要交换多少次可以使得A变成B. 输入 第一行一个正整数n ...

  10. ZOJ 3781 Paint the Grid Reloaded(BFS+缩点思想)

    Paint the Grid Reloaded Time Limit: 2 Seconds      Memory Limit: 65536 KB Leo has a grid with N rows ...