Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.

Scatter Plot

housing.plot(kind="scatter", x="longitude", y="latitude")

You can set the parameter alpha to study the density of points:

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)

The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).

 %matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")

Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).

Scatter Matrix

 from pandas.plotting import scatter_matrix

 attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")

Histogram

Histogram is a useful method to study the distribution of numeric attributes.

 %matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()

For single attribute, you can use the following statement:

housing["median_income"].hist()

Correlation Plot

We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():

corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)

Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.

from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))

[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章

  1. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  2. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  3. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  4. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  5. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  6. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. java代码导出数据到Excel、js导出数据到Excel(三)

     jsp内容忽略,仅写个出发按钮:          <button style="width: 100px" onclick="expertExcel()&quo ...

  2. Python基础——字符串操作

    运算符 加(+)   str2="hello"+"python" print(str2) 乘(*)   str1="hello python" ...

  3. python3.6:DLL load failed:找不到指定的模块(from PyQt5 import QtCore)

    本人小白搭建pyqt环境时遇到问题 运行代码 from PyQt5 import QtCore' 发现错误 ImportError: DLL load failed: 找不到指定的模块 这个问题折磨了 ...

  4. Python中re(正则表达式)模块使用方法

    Python中常用的正则表达式处理函数: re.match re.match 尝试从字符串的开始匹配一个模式,如:下面的例子匹配第一个单词. import re text = "JGood ...

  5. C++中重载、覆盖和隐藏的区别,以及适用场景

    一.重载.覆盖和隐藏的区别 二.适用场景 1.重载: 适用于不同的数据类型都需要使用到的功能函数.以数据相加的函数为例,可以在同一个文件内提供以下的重载函数以支持同样的功能: int add(int, ...

  6. Linux压缩与归档

    文件的压缩     aaaaaabbbbccc压缩成为6a4b3c     压缩工具:     gzip/gunzip: .gz后缀         只能压缩文件,不能压缩目录,因其不具备归档功能   ...

  7. Sublime插件开发——简单的代码模板插件

    最近一段一直使用sublime进行golang开发,整体感觉很不错,虽然比不上eclipse之类IDE强大,但是用起来很轻巧便捷,开发golang完全做够了.由于有一部分代码复用率很高,经常要用到,而 ...

  8. ModelViewSet的继承关系

  9. day37-- &MySQL step1

    m1.客户端与数据库服务器端是通过socket来交互数据,对数据库的理解:数据库就是一个文件夹,表就类比文件.m2.常用语句#查看数据库show databases:#创建数据库create data ...

  10. Mime类型与文件后缀对照表及探测文件MIME的方法

    说明:刚刚写了一篇<IHttpHandler的妙用(2):防盗链!我的资源只有我的用户才能下载>的文章,网址:http://blog.csdn.net/zhoufoxcn/archive/ ...