[Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline
. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.
Scatter Plot
housing.plot(kind="scatter", x="longitude", y="latitude")
You can set the parameter alpha
to study the density of points:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap
) called jet
. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s
), and the color represents the price (option c
).
%matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")
Note that the argument sharex=False
fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).
Scatter Matrix
from pandas.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")
Histogram
Histogram is a useful method to study the distribution of numeric attributes.
%matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
For single attribute, you can use the following statement:
housing["median_income"].hist()
Correlation Plot
We can calculate the correlation coefficients between each pair of attributes using corr()
method and look at the value by sort_values()
:
corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)
Also, we can use scatter_matrix
function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.
from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- python爬虫基础11-selenium大全5/8-动作链
Selenium笔记(5)动作链 本文集链接:https://www.jianshu.com/nb/25338984 简介 一般来说我们与页面的交互可以使用Webelement的方法来进行点击等操作. ...
- guava笔记
guava是在原先google-collection 的基础上发展过来的,是一个比较优秀的外部开源包,最近项目中使用的比较多,列举一些点.刚刚接触就被guava吸引了... 这个是gua ...
- Divisibility by 25 CodeForces - 988E
You are given an integer nn from 11 to 10181018 without leading zeroes. In one move you can swap any ...
- php三种方式操作mysql数据库
php可以通过三种方式操作数据库,分别用mysql扩展库,mysqli扩展库,和mysqli的预处理模式分别举案例加以说明 1.通过mysql方式操作数据库 工具类核心代码: <?php cla ...
- HDU4010 Query on The Trees (LCT动态树)
Query on The Trees Time Limit: 10000/5000 MS (Java/Others) Memory Limit: 65768/65768 K (Java/Othe ...
- HDU 3727 Jewel 主席树
题意: 一开始有一个空序列,然后有下面四种操作: Insert x在序列尾部加入一个值为\(x\)的元素,而且保证序列中每个元素都互不相同. Query_1 s t k查询区间\([s,t]\)中第\ ...
- EXCEL宏做数据拆分
用处:将大容量的EXCEL工作簿分解成若干个小的工作簿 Sub aa() For i = 1 To 8 Set nb = Workbooks.Add nb.SaveAs Filename:=ThisW ...
- linux下java命令行引用jar包
一般情况下: 如果java 文件和jar 包在同一目录 poi-3.0-alpha3-20061212.jar testTwo.java 编译: javac -cp poi-3.0-alpha3-2 ...
- Java中接口的作用
转载于:https://www.zhihu.com/question/20111251 困惑:例如我定义了一个接口,但是我在继承这个接口的类中还要写接口的实现方法,那我不如直接就在这个类中写实现方法岂 ...
- HDU——2054A==B?
A == B ? Time Limit: 1000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others) Total S ...