[Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline
. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.
Scatter Plot
housing.plot(kind="scatter", x="longitude", y="latitude")
You can set the parameter alpha
to study the density of points:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap
) called jet
. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s
), and the color represents the price (option c
).
%matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")
Note that the argument sharex=False
fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).
Scatter Matrix
from pandas.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")
Histogram
Histogram is a useful method to study the distribution of numeric attributes.
%matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
For single attribute, you can use the following statement:
housing["median_income"].hist()
Correlation Plot
We can calculate the correlation coefficients between each pair of attributes using corr()
method and look at the value by sort_values()
:
corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)
Also, we can use scatter_matrix
function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.
from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- vue之列表循环
文档:https://cn.vuejs.org/v2/guide/list.html 当 Vue.js 用 v-for 正在更新已渲染过的元素列表时,它默认用“就地复用”策略.如果数据项的顺序被改变, ...
- Python中的端口协议之基于UDP协议的通信传输
UDP协议: 1.python中基于udp协议的客户端与服务端通信简单过程实现 2.udp协议的一些特点(与tcp协议的比较) 3.利用socketserver模块实现udp传输协议的并 ...
- pyecharts用法,本人亲测,陆续更新
主题 除了默认的白色底色和dark之外,还支持安装扩展包 pip install echarts-themes-pypkg echarts-themes-pypkg 提供了 vintage, maca ...
- #pragma与_Pragma(转载)
C90为预处理指令家族带来一位新成员:#pragma.一般情况下,大家很少见到它. #pragma的作用是为特定的编译器提供特定的编译指示,这些指示是具体针对某一种(或某一些)编译器的, ...
- 通过IAR工程文件查看对应IAR版本号
IAR使用技巧——如何使用合适的版本打开IAR工程 2014年07月05日 23:49:08 xukai871105 阅读数:12895 标签: IAR 更多 个人分类: 嵌入式ARM 0.前言 ...
- Win7里IIS7部署WebService
最近忙于一个Web的Bug修正,是先人写的一个东东,架构很简单,一个前端的项目,一个WebService的项目,以及后台的一些dll.之前一直很排斥这个产品,因为它没法启动,印象中没有跑得起来过的时候 ...
- LA 5010 Go Deeper 2-SAT 二分
题意: 有\(n\)个布尔变量\(x_i\),有一个递归函数.如果满足条件\(x[a[dep]] + x[b[dep]] \neq c[dep]\),那么就再往深递归一层. 问最多能递归多少层. 分析 ...
- js---JSONP原理及使用
极简解释: 利用<script>标签没有跨域限制的“漏洞”(历史遗迹啊)来达到与第三方通讯的目的.当需要通讯时,本站脚本创建一个<script>元素,地址指向第三方的API网址 ...
- 爬虫Scrapy框架-Crawlspider链接提取器与规则解析器
Crawlspider 一:Crawlspider简介 CrawlSpider其实是Spider的一个子类,除了继承到Spider的特性和功能外,还派生除了其自己独有的更加强大的特性和功能.其中最显著 ...
- sqlserver修改一个列
--修改一个列alter table UserInfo alter Column [Address] nvarchar(64) null