[Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.
The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.
When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.
The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)
[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- [Poj1273]Drainage Ditches(网络流)
Description 给图,求最大流 最大流模板题,这里用dinic Code #include <cstdio> #include <cstring> #include & ...
- android静默安装和智能安装(转)
Android 静默安装和智能安装的实现方法 http://blog.csdn.net/fuchaosz/article/details/51852442 Android静默安装实现方案,仿360手机 ...
- Compoer介绍
Compoer介绍 Composer 是 PHP 的一个依赖管理工具.它允许你申明项目所依赖的代码库,它会在你的项目中为你安装他们. 安装Composer Composer.phar 是 Compos ...
- 【Minimum Path Sum】cpp
题目: Given a m x n grid filled with non-negative numbers, find a path from top left to bottom right w ...
- Docker与CTF
Docker与CTF 主要是用来搭建环境,漏洞环境,CTF比赛题目复现. docker你可以把它理解为一个vmware. iamges:vmware需要的iso镜像 container:vmware运 ...
- Spring boot 整合jsp、thymeleaf、freemarker
1.创建spring boot 项目 2.pom文件配置如下: <dependencies> <dependency> <groupId>org.springfra ...
- 使用hibernate建立mysql连接以及生成映射类和配置文件*.cfg.xml
建立数据库连接 找到window—open perspective—myeclipse database explore空白出右键new注意 driver template 和driver class ...
- [python][django学习篇][4]django完成数据库代码翻译:迁移数据库(migration)
上一篇我们已经完成数据库的设计,但是仅仅是python语言,并没有真正创建了数据库表.翻译成数据库语言,真正创建数据库表由django manage.py来实现,这一过程专业术语:迁移数据库 切换到m ...
- java基础-流
大致列一下这个周末需要学习的内容 1 容器 2 线程 3 流 (本节内容) 一. 流 按方向-------------输入流输出流 按处理数据单位-----字节流字符流 按功能------------ ...
- 微信小程序--问题汇总及详解之picker 增、删
<block wx:for="{{salesList}}" wx:for-index="index" wx:key="id" wx:f ...