numpy、scipy官方文档 pandas官方网站 matplotlib官方文档

一、数据结构

二、数据处理

1、数据获取（excel文件数据基本信息）

#coding=utf-8

import pandas as pd

import numpy as np

excel_data = pd.read_excel("test.xlsx")

print excel_data.shape            #显示数据多少行多少列

print excel_data.index            #显示数据所有行的索引数

print excel_data.columns          #显示数据所有列的列名

print excel_data.info             #显示所有列的列名

print excel_data.dtypes           #显示数据的类型

输出：

'''

   name  age       time adress  home

0   cat  2.0 1900-01-01      a   NaN

1   dog  3.0 1900-01-02      b   NaN

2   pig  4.0 1900-01-03      c   NaN

3  bird  5.0        NaT      d   NaN

4   NaN  6.0 1900-01-02      e   NaN

5   pig  7.0 1900-01-03    NaN   NaN

6  bird  NaN        NaT    NaN   NaN

'''

excel_data

'''

(7, 5)

'''

excel_data.shape

'''

RangeIndex(start=0, stop=7, step=1)

'''

excel_data.index

'''

Index([u'name', u'age', u'time', u'adress', u'home'], dtype='object')

'''

excel_data.columns

'''

<bound method DataFrame.info of    name  age       time adress  home

0   cat  2.0 1900-01-01      a   NaN

1   dog  3.0 1900-01-02      b   NaN

2   pig  4.0 1900-01-03      c   NaN

3  bird  5.0        NaT      d   NaN

4   NaN  6.0 1900-01-02      e   NaN

5   pig  7.0 1900-01-03    NaN   NaN

6  bird  NaN        NaT    NaN   NaN>

'''

excel_data.info

'''

name              object

age              float64

time      datetime64[ns]

adress            object

home             float64

dtype: object

'''

excel_data.dtypes

#Help on function read_excel in module pandas.io.excel:

read_excel(*args, **kwargs)

    Read an Excel table into a pandas DataFrame

    Parameters

    ----------

    io : string, path object (pathlib.Path or py._path.local.LocalPath),

        file-like object, pandas ExcelFile, or xlrd workbook.

        The string could be a URL. Valid URL schemes include http, ftp, s3,

        and file. For file URLs, a host is expected. For instance, a local

        file could be file://localhost/path/to/workbook.xlsx

    sheet_name : string, int, mixed list of strings/ints, or None, default 0

        Strings are used for sheet names, Integers are used in zero-indexed

        sheet positions.

        Lists of strings/integers are used to request multiple sheets.

        Specify None to get all sheets.

        str|int -> DataFrame is returned.

        list|None -> Dict of DataFrames is returned, with keys representing

        sheets.

        Available Cases

        * Defaults to 0 -> 1st sheet as a DataFrame

        * 1 -> 2nd sheet as a DataFrame

        * "Sheet1" -> 1st sheet as a DataFrame

        * [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames

        * None -> All sheets as a dictionary of DataFrames

    sheetname : string, int, mixed list of strings/ints, or None, default 0

        .. deprecated:: 0.21.0

           Use `sheet_name` instead

    header : int, list of ints, default 0

        Row (0-indexed) to use for the column labels of the parsed

        DataFrame. If a list of integers is passed those row positions will

        be combined into a ``MultiIndex``. Use None if there is no header.

    names : array-like, default None

        List of column names to use. If file contains no header row,

        then you should explicitly pass header=None

    index_col : int, list of ints, default None

        Column (0-indexed) to use as the row labels of the DataFrame.

        Pass None if there is no such column.  If a list is passed,

        those columns will be combined into a ``MultiIndex``.  If a

        subset of data is selected with ``usecols``, index_col

        is based on the subset.

    parse_cols : int or list, default None

        .. deprecated:: 0.21.0

           Pass in `usecols` instead.

    usecols : int or list, default None

        * If None then parse all columns,

        * If int then indicates last column to be parsed

        * If list of ints then indicates list of column numbers to be parsed

        * If string then indicates comma separated list of Excel column letters and

          column ranges (e.g. "A:E" or "A,C,E:F").  Ranges are inclusive of

          both sides.

    squeeze : boolean, default False

        If the parsed data only contains one column then return a Series

    dtype : Type name or dict of column -> type, default None

        Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}

        Use `object` to preserve data as stored in Excel and not interpret dtype.

        If converters are specified, they will be applied INSTEAD

        of dtype conversion.

        .. versionadded:: 0.20.0

    engine: string, default None

        If io is not a buffer or path, this must be set to identify io.

        Acceptable values are None or xlrd

    converters : dict, default None

        Dict of functions for converting values in certain columns. Keys can

        either be integers or column labels, values are functions that take one

        input argument, the Excel cell content, and return the transformed

        content.

    true_values : list, default None

        Values to consider as True

        .. versionadded:: 0.19.0

    false_values : list, default None

        Values to consider as False

        .. versionadded:: 0.19.0

    skiprows : list-like

        Rows to skip at the beginning (0-indexed)

    nrows : int, default None

        Number of rows to parse

        .. versionadded:: 0.23.0

    na_values : scalar, str, list-like, or dict, default None

        Additional strings to recognize as NA/NaN. If dict passed, specific

        per-column NA values. By default the following values are interpreted

        as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',

        '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan',

        'null'.

    keep_default_na : bool, default True

        If na_values are specified and keep_default_na is False the default NaN

        values are overridden, otherwise they're appended to.

    verbose : boolean, default False

        Indicate number of NA values placed in non-numeric columns

    thousands : str, default None

        Thousands separator for parsing string columns to numeric.  Note that

        this parameter is only necessary for columns stored as TEXT in Excel,

        any numeric columns will automatically be parsed, regardless of display

        format.

    comment : str, default None

        Comments out remainder of line. Pass a character or characters to this

        argument to indicate comments in the input file. Any data between the

        comment string and the end of the current line is ignored.

    skip_footer : int, default 0

        .. deprecated:: 0.23.0

           Pass in `skipfooter` instead.

    skipfooter : int, default 0

        Rows at the end to skip (0-indexed)

    convert_float : boolean, default True

        convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric

        data will be read in as floats: Excel stores all numbers as floats

        internally

    Returns

    -------

    parsed : DataFrame or Dict of DataFrames

        DataFrame from the passed in Excel file.  See notes in sheet_name

        argument for more information on when a Dict of Dataframes is returned.

read_excel参数解析

获取行

excel_data.head(5)                   #显示数据的前5行

excel_data.tail(5)                      #显示数据的后5行

excel_data.loc[0]                       #获取第一行的数据

excel_data.loc[2:4]                    #返回第3行到第4行的数据

excel_data.loc[[2,5,10]]             #返回行标号为2，5，10三行数据，注意必须是由列表包含起来的数据。

excel_data.iloc[0]                       #获取第一行

获取列

excel_data["name"]                      #返回这一列("name")的数据

excel_data[["name","age"]]          #返回列名为name和 age的两列数据

excel_data["name"].unique()         #显示数据name列的所有唯一值, 有0值是因为对数据缺失值进行了填充

获取某行某列

excel_data.head(5)["name"]                 #获取前5行的name列

excel_data.head(5)["name"][0]             #获取前5行的name列的元素值

excel_data.at[1,"age"]                          #表示取第二行"age"列的数据

excel_data.loc[0]["name"]                     #获取第一行且列名为name的数据

excel_data.loc[:,"age"]                          #获取age的那一列,这个冒号的意思是所有行，逗号表示行与列的区分

excel_data.loc[:,["age","time"]]             #获取所有行的age列和time列的数据

excel_data.loc[1,["age","time"]]             #获取第二行的age和time列的数据

excel_data.iloc[0:2,0:2]                          #获取前两行前两列的数据

excel_data.iloc[[1,2,4],[0,2]]                   #获取第1，2，4行中的0，2列的数据

获取空值

excel_data.notnull()                    #excel_data的非空值为True

excel_data.isnull()                      #isnull是Python中检验空值的函数，返回的结果是逻辑值，包含空值返回True，不包含则返回False。可以对整个数据表进行检查，也可以单独对某一列进行空值检查。

行列数据获取

2、数据清洗转换

1）增

2）删

a、删除无效行、列（整行、列都是空白，且说明无效的行、列）

b、删除指定行、列

Help on method drop in module pandas.core.frame:

drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') method of pandas.core.frame.DataFrame instance

    Drop specified labels from rows or columns.

    Remove rows or columns by specifying label names and corresponding

    axis, or by specifying directly index or column names. When using a

    multi-index, labels on different levels can be removed by specifying

    the level.

    Parameters

    ----------

    labels : single label or list-like

        Index or column labels to drop.

    axis : {0 or 'index', 1 or 'columns'}, default 0

        Whether to drop labels from the index (0 or 'index') or

        columns (1 or 'columns').

    index, columns : single label or list-like

        Alternative to specifying axis (``labels, axis=1``

        is equivalent to ``columns=labels``).

        .. versionadded:: 0.21.0

    level : int or level name, optional

        For MultiIndex, level from which the labels will be removed.

    inplace : bool, default False

        If True, do operation inplace and return None.

    errors : {'ignore', 'raise'}, default 'raise'

        If 'ignore', suppress error and only existing labels are

        dropped.

excel_data.drop

#Help on method dropna in module pandas.core.frame:

dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False) method of pandas.core.frame.DataFrame instance

    Remove missing values.

    See the :ref:`User Guide <missing_data>` for more on which values are

    considered missing, and how to work with missing data.

    Parameters

    ----------

    axis : {0 or 'index', 1 or 'columns'}, default 0

        Determine if rows or columns which contain missing values are

        removed.

        * 0, or 'index' : Drop rows which contain missing values.

        * 1, or 'columns' : Drop columns which contain missing value.

        .. deprecated:: 0.23.0: Pass tuple or list to drop on multiple

        axes.

    how : {'any', 'all'}, default 'any'

        Determine if row or column is removed from DataFrame, when we have

        at least one NA or all NA.

        * 'any' : If any NA values are present, drop that row or column.

        * 'all' : If all values are NA, drop that row or column.

    thresh : int, optional

        Require that many non-NA values.

    subset : array-like, optional

        Labels along other axis to consider, e.g. if you are dropping rows

        these would be a list of columns to include.

    inplace : bool, default False

        If True, do operation inplace and return None.

excel_data.dropna

3）改

#Help on method fillna in module pandas.core.frame:

fillna(self, value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs) method of pandas.core.frame.DataFrame instance

    Fill NA/NaN values using the specified method

    Parameters

    ----------

    value : scalar, dict, Series, or DataFrame

        Value to use to fill holes (e.g. 0), alternately a

        dict/Series/DataFrame of values specifying which value to use for

        each index (for a Series) or column (for a DataFrame). (values not

        in the dict/Series/DataFrame will not be filled). This value cannot

        be a list.

    method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None

        Method to use for filling holes in reindexed Series

        pad / ffill: propagate last valid observation forward to next valid

        backfill / bfill: use NEXT valid observation to fill gap

    axis : {0 or 'index', 1 or 'columns'}

    inplace : boolean, default False

        If True, fill in place. Note: this will modify any

        other views on this object, (e.g. a no-copy slice for a column in a

        DataFrame).

    limit : int, default None

        If method is specified, this is the maximum number of consecutive

        NaN values to forward/backward fill. In other words, if there is

        a gap with more than this number of consecutive NaNs, it will only

        be partially filled. If method is not specified, this is the

        maximum number of entries along the entire axis where NaNs will be

        filled. Must be greater than 0 if not None.

    downcast : dict, default is None

        a dict of item->dtype of what to downcast if possible,

        or the string 'infer' which will try to downcast to an appropriate

        equal type (e.g. float64 to int64 if possible)

excel_data.fillna

excel_data.reindex()

excel_data.rename()

excel_data.replace()

excel_data.astype()

excel_data.duplicated()

excel_data.unique()

excel_data.drop_duplictad()

pandas再次学习的更多相关文章

pandas的学习总结
pandas的学习总结作者:csj更新时间:2017.12.31 email:59888745@qq.com 说明:因内容较多,会不断更新 xxx学习总结: 回主目录:2017 年学习记录和总结 1 ...
再次学习mysql优化
再次学习mysql优化表的设计规范化(三范式) 添加索引(普通索引.主键索引.唯一索引.全文索引) 分表(水平分割.垂直分割) 读写分离(写add.update.delete) 存储过程对mysq ...
pandas库学习笔记（二）DataFrame入门学习
Pandas基本介绍——DataFrame入门学习前篇文章中,小生初步介绍pandas库中的Series结构的创建与运算,今天小生继续“死磕自己”为大家介绍pandas库的另一种最为常见的数据结构D ...
Pandas基础学习与Spark Python初探
摘要:pandas是一个强大的Python数据分析工具包,pandas的两个主要数据结构Series(一维)和DataFrame(二维)处理了金融,统计,社会中的绝大多数典型用例科学,以及许多工程领域 ...
初步了解pandas（学习笔记）
1 pandas简介 pandas 是一种列存数据分析 API.它是用于处理和分析输入数据的强大工具,很多机器学习框架都支持将 pandas 数据结构作为输入. 虽然全方位介绍 pandas API ...
pandas时间序列学习笔记
目录创建一个时间序列 pd.date_range() info() asfred() shifted(),滞后函数 diff()求差分加减乘除 DataFrame.reindex() 通过data ...
pandas包学习笔记
目录 zip Importing & exporting data Plotting with pandas Visual exploratory data analysis 折线图散点图 ...
再次学习 java 类的编译
做JAVA开发的都知道myeclipse, 我们在myeclipse中新建一个类,然后保存, 如何正常的话,那么在项目指定的目录(也就是项目的output目录)就会生成同名的class文件, 可是,我 ...
pandas库学习笔记（一）Series入门学习
Pandas基本介绍: pandas is an open source, BSD-licensed (permissive free software licenses) library provi ...

随机推荐

BZOJ 3689: 异或之可持久化trie+堆
和超级钢琴几乎是同一道题吧... code: #include <bits/stdc++.h> #define N 200006 #define ll long long #define ...
如何把上传图片时候的文件对象转换为图片的url !
getObjectURL(file) { var url = null; if (window.createObjectURL != undefined) { url = window.createO ...
「2019-8-13提高模拟赛」树 (tree)
传送门 Description 你有一个 \(n\)个点的树,第 \(i\)个点的父亲是\(p_i\).每个点有一个权值 \(t_i\) 和一个颜色黑或者白.所有点一开始都是白色. 你要进行 \(m\ ...
[RoarCTF]Easy Java
目录 [RoarCTF]Easy Java 知识点 1.WEB-INF/web.xml泄露 [RoarCTF]Easy Java 题目复现链接:https://buuoj.cn/challenges ...
IDEA启动Springboot时，解决报错java.lang.NoClassDefFoundError: javax/servlet/Filter
如下所示,将spring-boot-starter-tomcat依赖中的<scope>provided</scope>注释掉 <dependency> <gr ...
postgresql中pg_walfile_name()
pg_walfile_name(lsn pg_lsn):将wal位置转换成文件名 pg_lsn数据类型可以用于存储LSN数据,LSN是指向WAL中某个位置的指针.pg_lsn用XLogRecPtr的形 ...
Redis常见问题及解决方案
在Redis的运维使用过程中你遇到过那些问题,又是如何解决的呢?本文收集了一些Redis的常见问题以及解决方案,与大家一同探讨. 码字不易,欢迎大家转载,烦请注明出处:谢谢配合你的Redis有big ...
jQuery.validator.addMethod自定义验证方法
在开发中用到了jQuery的validate控件,有时需要自定义验证方法.我们可以通过jQuery.validator.addMethod()来实现,下面是例子: <!DOCTYPE html ...
不使用BASE64Encoder、BASE64Decoder
BASE64Encoder/BASE64Decoder类在sun.misc包下,是sun公司的内部方法,后期有删除的潜在可能,建议使用apache commons.codec下的Base64替代. m ...
plsql 32位，Oracle Client 64位无法读取tnsnames.ora文件
ORACLE_HOME=C:\app\fjz\product\11.2.0\client_1 1)设置windows系统环境变量: TNS_ADMIN=C:\app\fjz\product\11.2. ...

pandas再次学习