Here I list some useful functions in Python to get familiar with your data. As an example, we load a dataset named housing which is a DataFrame object. Usually, the first thing to do is get top five rows the dataset by head() function:

housing = load_housing_data()
housing.head()

The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values.

housing.info()

The describe() function will return statistics including count, mean, median, std, min, max and quantiles of each feature.

housing.describe()

For categorical varibles, we usually hope to see the labels and the count for each label. value_counts() function works here:

housing["ocean_proximity"].value_counts()

That’s it. I’ll update more functions if I meet in further study.

[Machine Learning with Python] Familiar with Your Data的更多相关文章

  1. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  2. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  3. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  4. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  5. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

  6. [Machine Learning with Python] My First Data Preprocessing Pipeline with Titanic Dataset

    The Dataset was acquired from https://www.kaggle.com/c/titanic For data preprocessing, I firstly def ...

  7. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  8. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  9. [Machine Learning with Python] How to get your data?

    Using Pandas Library The simplest way is to read data from .csv files and store it as a data frame o ...

随机推荐

  1. python中字符串的一些用法

    一.字符串的拼接:      a=‘123’      b=‘abc’       d=‘hello world’ 1.print(a+b) 2.print(a,b) 3. c=‘ ’.join((a ...

  2. leetcode-10-basic

    35. Search Insert Position Given a sorted array and a target value, return the index if the target i ...

  3. LeetCode(224) Basic Calculator

    题目 Implement a basic calculator to evaluate a simple expression string. The expression string may co ...

  4. scanf(),gets(),getchar()

    scanf()与gets()区别: scanf( )函数和gets( )函数都可用于输入字符串,但在功能上有区别.若想从键盘上输入字符串"hi hello",则应该使用gets() ...

  5. Mysql显示某个数据库的所有表

    显示表名: show tables; //先用use进入要查看表的库 mysql> use mysql; Database changed mysql> show tables; +--- ...

  6. Java中的数据类型和引用

    JAVA数据类型分primitive数据类型和引用数据类型. Java中的primitive数据类型分为四类八种.primitive也不知道怎么翻译比较贴切, 暂且叫他基本数据类型吧, 其实直接从英文 ...

  7. HDU 5044 Tree LCA

    题意: 给出一棵\(n(1 \leq n \leq 10^5)\)个节点的树,每条边和每个点都有一个权值,初始所有权值为0. 有两种操作: \(ADD1 \, u \, v \, k\):将路径\(u ...

  8. 【Shell】使用shell打印菜单,一键安装Web应用

    问题描述: [解答] [root@A04-Test- scripts]# more menu.sh #!/bin/bash echo "1.[install lamp]" echo ...

  9. 多实例设置本地IP访问sql server 数据库

    我们本地有时候有多个数据库版本(^_^..别说了都是泪),都是为了兼容不同版本的数据而安装的! 最近我们需要用IP来访问,就有了这一段折腾的历程. 上图片为我安装的三个不同的版本,一个为sql ser ...

  10. Win7通知区域的图标怎么去除?

    由于本人有洁癖,最近在用win7的时候,很收不了已经卸载了的一些软件,在win7右下角的通知区域图标中还留有痕迹,于是上网查找了下解决方案. 用以下方法完美解决问题. 这里依然是以注册表的修改方法为主 ...