Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.

Scatter Plot

housing.plot(kind="scatter", x="longitude", y="latitude")

You can set the parameter alpha to study the density of points:

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)

The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).

 %matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")

Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).

Scatter Matrix

 from pandas.plotting import scatter_matrix

 attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")

Histogram

Histogram is a useful method to study the distribution of numeric attributes.

 %matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()

For single attribute, you can use the following statement:

housing["median_income"].hist()

Correlation Plot

We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():

corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)

Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.

from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))

[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章

  1. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  2. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  3. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  4. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  5. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  6. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. python爬虫基础11-selenium大全5/8-动作链

    Selenium笔记(5)动作链 本文集链接:https://www.jianshu.com/nb/25338984 简介 一般来说我们与页面的交互可以使用Webelement的方法来进行点击等操作. ...

  2. guava笔记

    ​guava是在原先google-collection 的基础上发展过来的,是一个比较优秀的外部开源包,最近项目中使用的比较多,列举一些点.刚刚接触就被guava吸引了... ​    ​这个是gua ...

  3. Divisibility by 25 CodeForces - 988E

    You are given an integer nn from 11 to 10181018 without leading zeroes. In one move you can swap any ...

  4. php三种方式操作mysql数据库

    php可以通过三种方式操作数据库,分别用mysql扩展库,mysqli扩展库,和mysqli的预处理模式分别举案例加以说明 1.通过mysql方式操作数据库 工具类核心代码: <?php cla ...

  5. HDU4010 Query on The Trees (LCT动态树)

    Query on The Trees Time Limit: 10000/5000 MS (Java/Others)    Memory Limit: 65768/65768 K (Java/Othe ...

  6. HDU 3727 Jewel 主席树

    题意: 一开始有一个空序列,然后有下面四种操作: Insert x在序列尾部加入一个值为\(x\)的元素,而且保证序列中每个元素都互不相同. Query_1 s t k查询区间\([s,t]\)中第\ ...

  7. EXCEL宏做数据拆分

    用处:将大容量的EXCEL工作簿分解成若干个小的工作簿 Sub aa() For i = 1 To 8 Set nb = Workbooks.Add nb.SaveAs Filename:=ThisW ...

  8. linux下java命令行引用jar包

     一般情况下: 如果java 文件和jar 包在同一目录 poi-3.0-alpha3-20061212.jar testTwo.java 编译: javac -cp poi-3.0-alpha3-2 ...

  9. Java中接口的作用

    转载于:https://www.zhihu.com/question/20111251 困惑:例如我定义了一个接口,但是我在继承这个接口的类中还要写接口的实现方法,那我不如直接就在这个类中写实现方法岂 ...

  10. HDU——2054A==B?

    A == B ? Time Limit: 1000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total S ...