[Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.
The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.
When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.
The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)
[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- 【高精度】模板 (C++)
//n为长度 1.高精加 复杂度:O(n) #include<iostream> #include<cstring> #include<algorithm> usi ...
- 策略模式—Java实现(转)
1. 现实需求 客户有了新的需求,这时我们直接新增策略即可,改很少的代码.基本符合我们面向对象原则中的开闭原则(对扩展开放,对修改关系),实现了高内聚低耦合. 2. 策略模式定义 策略模式,又叫算法簇 ...
- time模块和datetime模块详解
一.time模块 time模块中时间表现的格式主要有三种: a.timestamp时间戳,时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量 b.struct_time时间元组,共 ...
- python 二——函数、装饰器、生成器、面向对象编程(初级)
本节内容 1.函数 2.装饰器 3.生成器 4.类 一.函数 函数式:将某功能代码封装到函数中,日后便无需重复编写,仅调用函数即可 面向对象:对函数进行分类和封装,让开发“更快更好更强...” 函数式 ...
- 62、在app遇到全局异常时避免直接退出,如何让app接管异常处理?
1.创建一个类为CrashHandler import android.content.Context; import android.os.Looper; import android.util.L ...
- 【Climbing Stairs】cpp
题目: You are climbing a stair case. It takes n steps to reach to the top. Each time you can either cl ...
- csu-2018年11月月赛Round2-div2题解
csu-2018年11月月赛Round2-div2题解 A(2193):昆虫繁殖 Description 科学家在热带森林中发现了一种特殊的昆虫,这种昆虫的繁殖能力很强.每对成虫过x个月产y对卵,每对 ...
- Form 组件动态绑定数据
1.Form 组件的作用: a.对用户提交的数据进行验证(form表单/ajax) b.保留用户上次输入的信息 c.可以生成html标签(input表单类的标签) 2..由于form组件中每个字段都是 ...
- Leetcode 546.移除盒子
移除盒子 给出一些不同颜色的盒子,盒子的颜色由数字表示,即不同的数字表示不同的颜色.你将经过若干轮操作去去掉盒子,直到所有的盒子都去掉为止.每一轮你可以移除具有相同颜色的连续 k 个盒子(k > ...
- linux基础(基本命令)
Linux学习 1.Linux安装.配置 Linux的操作背景介绍 Linux操作系统 开源.自由且开发源代码的类Unix操作系统 厂商较多 著名的有R ...