http://blog.csdn.net/pipisorry/article/details/52209377

pandas时序数据文件读取

dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')
data = pd.read_csv('AirPassengers.csv', parse_dates='Month', index_col='Month',date_parser=dateparse)
print data.head()

read_csv时序参数

parse_dates:这是指定含有时间数据信息的列。正如上面所说的，列的名称为“月份”。
index_col:使用pandas 的时间序列数据背后的关键思想是：目录成为描述时间数据信息的变量。所以该参数告诉pandas使用“月份”的列作为索引。
date_parser：指定将输入的字符串转换为可变的时间数据。Pandas默认的数据读取格式是‘YYYY-MM-DD HH:MM:SS’？如需要读取的数据没有默认的格式，就要人工定义。这和dataparse的功能部分相似，这里的定义可以为这一目的服务。The default uses dateutil.parser.parser to do the conversion.

[pandas.read_csv]

[python模块：时间处理模块]

时间序列分析和处理Time Series

pandas has simple, powerful, and efficient functionality for performingresampling operations during frequency conversion (e.g., converting secondlydata into 5-minutely data). This is extremely common in, but not limited to,financial applications.

时序数据生成和表示

c = pandas.Timestamp('2012-01-01 00:00:08')

In [103]: rng = pd.date_range('1/1/2012', periods=100, freq='S')

In [104]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)

In [105]: ts.resample('5Min', how='sum')
Out[105]:
2012-01-01    25083
Freq: 5T, dtype: int32

Time zone representation

In [106]: rng = pd.date_range('3/6/2012 00:00', periods=5, freq='D')

In [107]: ts = pd.Series(np.random.randn(len(rng)), rng)

In [108]: ts
Out[108]:
2012-03-06    0.464000
2012-03-07    0.227371
2012-03-08   -0.496922
2012-03-09    0.306389
2012-03-10   -2.290613
Freq: D, dtype: float64

In [109]: ts_utc = ts.tz_localize('UTC')

In [110]: ts_utc
Out[110]:
2012-03-06 00:00:00+00:00    0.464000
2012-03-07 00:00:00+00:00    0.227371
2012-03-08 00:00:00+00:00   -0.496922
2012-03-09 00:00:00+00:00    0.306389
2012-03-10 00:00:00+00:00   -2.290613
Freq: D, dtype: float64

时序转换

Convert to another time zone

In [111]: ts_utc.tz_convert('US/Eastern')
Out[111]:
2012-03-05 19:00:00-05:00    0.464000
2012-03-06 19:00:00-05:00    0.227371
2012-03-07 19:00:00-05:00   -0.496922
2012-03-08 19:00:00-05:00    0.306389
2012-03-09 19:00:00-05:00   -2.290613
Freq: D, dtype: float64

Converting between time span representations

In [112]: rng = pd.date_range('1/1/2012', periods=5, freq='M')
In [113]: ts = pd.Series(np.random.randn(len(rng)), index=rng)
In [114]: ts
Out[114]:
2012-01-31   -1.134623
2012-02-29   -1.561819
2012-03-31   -0.260838
2012-04-30    0.281957
2012-05-31    1.523962
Freq: M, dtype: float64

In [115]: ps = ts.to_period()
In [116]: ps
Out[116]:
2012-01   -1.134623
2012-02   -1.561819
2012-03   -0.260838
2012-04    0.281957
2012-05    1.523962
Freq: M, dtype: float64

In [117]: ps.to_timestamp()
Out[117]:
2012-01-01   -1.134623
2012-02-01   -1.561819
2012-03-01   -0.260838
2012-04-01    0.281957
2012-05-01    1.523962
Freq: MS, dtype: float64

Converting between period and timestamp enables some convenient arithmeticfunctions to be used. In the following example, we convert a quarterlyfrequency with year ending in November to 9am of the end of the month followingthe quarter end:

In [118]: prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV')
In [119]: ts = pd.Series(np.random.randn(len(prng)), prng)
In [120]: ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
In [121]: ts.head()
Out[121]:
1990-03-01 09:00   -0.902937
1990-06-01 09:00    0.068159
1990-09-01 09:00   -0.057873
1990-12-01 09:00   -0.368204
1991-03-01 09:00   -1.144073
Freq: H, dtype: float64

[pandas-docs/stable/timeseries]

[pandas cookbook Timeseries]

皮皮blog

pandas时序类型

pandas 的 TimeStamp

pandas 最基本的时间日期对象是一个从 Series 派生出来的子类 TimeStamp，这个对象与 datetime 对象保有高度兼容性，可通过 pd.to_datetime() 函数转换。（一般是从 datetime 转换为 Timestamp）

lang:python
>>> pd.to_datetime(now)
Timestamp('2014-06-17 15:56:19.313193', tz=None)
>>> pd.to_datetime(np.nan)
NaT

pandas 的时间序列

pandas 最基本的时间序列类型就是以时间戳（TimeStamp）为 index 元素的 Series 类型。

lang:python
>>> dates = [datetime(2011,1,1),datetime(2011,1,2),datetime(2011,1,3)]
>>> ts = Series(np.random.randn(3),index=dates)
>>> ts
2011-01-01    0.362289
2011-01-02    0.586695
2011-01-03   -0.154522
dtype: float64
>>> type(ts)
<class 'pandas.core.series.Series'>
>>> ts.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01, ..., 2011-01-03]
Length: 3, Freq: None, Timezone: None
>>> ts.index[0]
Timestamp('2011-01-01 00:00:00', tz=None)

时间序列之间的算术运算会自动按时间对齐。

索引、选取、子集构造

时间序列只是 index 比较特殊的 Series ，因此一般的索引操作对时间序列依然有效。其特别之处在于对时间序列索引的操作优化。如使用各种字符串进行索引：

lang:python
>>> ts['20110101']
0.36228897878097266
>>> ts['2011-01-01']
0.36228897878097266
>>> ts['01/01/2011']
0.36228897878097266

对于较长的序列，还可以只传入 “年” 或 “年月” 选取切片：

lang:python
>>> ts
2011-01-01    0.362289
2011-01-02    0.586695
2011-01-03   -0.154522
2012-12-25    0.111869
dtype: float64
>>> ts['2012']
2012-12-25    0.111869
dtype: float64
>>> ts['2011-1-2':'2012-12']
2011-01-02    0.586695
2011-01-03   -0.154522
2012-12-25    0.111869
dtype: float64

除了这种字符串切片方式外，还有一种实例方法可用：ts.truncate(after='2011-01-03')。

值得注意的是，切片时使用的字符串时间戳并不必存在于 index 之中，如 ts.truncate(before='3055') 也是合法的。

Time/Date Components

There are several time/date properties that one can access from Timestamp or a collection of timestamps like a DateTimeIndex.

Property	Description
year	The year of the datetime
month	The month of the datetime
day	The days of the datetime
hour	The hour of the datetime
minute	The minutes of the datetime
second	The seconds of the datetime
microsecond	The microseconds of the datetime
nanosecond	The nanoseconds of the datetime
date	Returns datetime.date
time	Returns datetime.time
dayofyear	The ordinal day of year
weekofyear	The week ordinal of the year
week	The week ordinal of the year
dayofweek	The numer of the day of the week with Monday=0, Sunday=6
weekday	The number of the day of the week with Monday=0, Sunday=6
weekday_name	The name of the day in a week (ex: Friday)
quarter	Quarter of the date: Jan=Mar = 1, Apr-Jun = 2, etc.
days_in_month	The number of days in the month of the datetime
is_month_start	Logical indicating if first day of month (defined by frequency)
is_month_end	Logical indicating if last day of month (defined by frequency)
is_quarter_start	Logical indicating if first day of quarter (defined by frequency)
is_quarter_end	Logical indicating if last day of quarter (defined by frequency)
is_year_start	Logical indicating if first day of year (defined by frequency)
is_year_end	Logical indicating if last day of year (defined by frequency)

Furthermore, if you have a Series with datetimelike values, then you can access these properties via the .dt accessor, see the docs.

[Time/Date Components¶]

日期的范围、频率以及移动

pandas 中的时间序列一般被默认为不规则的，即没有固定的频率。但出于分析的需要，我们可以通过插值的方式将序列转换为具有固定频率的格式。一种快捷方式是使用 .resample(rule) 方法：

lang:python
>>> ts
2011-01-01    0.362289
2011-01-02    0.586695
2011-01-03   -0.154522
2011-01-06    0.222958
dtype: float64
>>> ts.resample('D')
2011-01-01    0.362289
2011-01-02    0.586695
2011-01-03   -0.154522
2011-01-04         NaN
2011-01-05         NaN
2011-01-06    0.222958
Freq: D, dtype: float64

生成日期范围

pd.date_range() 可用于生成指定长度的 DatetimeIndex。参数可以是起始结束日期，或单给一个日期，加一个时间段参数。日期是包含的。

lang:python
>>> pd.date_range('20100101','20100110')
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-01, ..., 2010-01-10]
Length: 10, Freq: D, Timezone: None
>>> pd.date_range(start='20100101',periods=10)
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-01, ..., 2010-01-10]
Length: 10, Freq: D, Timezone: None
>>> pd.date_range(end='20100110',periods=10)
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-01, ..., 2010-01-10]
Length: 10, Freq: D, Timezone: None

默认情况下，date_range 会按天计算时间点。这可以通过 freq 参数进行更改，如 “BM” 代表 bussiness end of month。

lang:python
>>> pd.date_range('20100101','20100601',freq='BM')
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-29, ..., 2010-05-31]
Length: 5, Freq: BM, Timezone: None

频率和日期偏移量

pandas 中的频率是由一个基础频率和一个乘数组成的。基础频率通常以一个字符串别名表示，如上例中的 “BM”。对于每个基础频率，都有一个被称为日期偏移量（date offset）的对象与之对应。可以通过实例化日期偏移量来创建某种频率：

lang:python
>>> Hour()
<Hour>
>>> Hour(2)
<2 * Hours>
>>> Hour(1) + Minute(30)
<90 * Minutes>

但一般来说不必这么麻烦，使用前面提过的字符串别名来创建频率就可以了：

lang:python
>>> pd.date_range('00:00','12:00',freq='1h20min')
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-06-17 00:00:00, ..., 2014-06-17 12:00:00]
Length: 10, Freq: 80T, Timezone: None

可用的别名，可以通过 help() 或文档来查询，这里就不写了。

移动（超前和滞后）数据

移动（shifting）指的是沿着时间轴将数据前移或后移。Series 和 DataFrame 都有一个 .shift() 方法用于执行单纯的移动操作，index 维持不变：

lang:python
>>> ts
2011-01-01    0.362289
2011-01-02    0.586695
2011-01-03   -0.154522
2011-01-06    0.222958
dtype: float64
>>> ts.shift(2)
2011-01-01         NaN
2011-01-02         NaN
2011-01-03    0.362289
2011-01-06    0.586695
dtype: float64
>>> ts.shift(-2)
2011-01-01   -0.154522
2011-01-02    0.222958
2011-01-03         NaN
2011-01-06         NaN
dtype: float64

上例中因为移动操作产生了 NA 值，另一种移动方法是移动 index，而保持数据不变。这种移动方法需要额外提供一个 freq 参数来指定移动的频率：

lang:python
>>> ts.shift(2,freq='D')
2011-01-03    0.362289
2011-01-04    0.586695
2011-01-05   -0.154522
2011-01-08    0.222958
dtype: float64
>>> ts.shift(2,freq='3D')
2011-01-07    0.362289
2011-01-08    0.586695
2011-01-09   -0.154522
2011-01-12    0.222958
dtype: float64

时期及其算术运算

本节使用的时期（period）概念不同于前面的时间戳（timestamp），指的是一个时间段。但在使用上并没有太多不同，pd.Period 类的构造函数仍需要一个时间戳，以及一个 freq 参数。freq 用于指明该 period 的长度，时间戳则说明该 period 在公园时间轴上的位置。

lang:python
>>> p = pd.Period(2010,freq='M')
>>> p
Period('2010-01', 'M')
>>> p + 2
Period('2010-03', 'M')

上例中我给 period 的构造器传了一个 “年” 单位的时间戳和一个 “Month” 的 freq，pandas 便自动把 2010 解释为了 2010-01。

period_range 函数可用于创建规则的时间范围：

lang:python
>>> pd.period_range('2010-01','2010-05',freq='M')
<class 'pandas.tseries.period.PeriodIndex'>
freq: M
[2010-01, ..., 2010-05]
length: 5

PeriodIndex 类保存了一组 period，它可以在任何 pandas 数据结构中被用作轴索引：

lang:python
>>> Series(np.random.randn(5),index=pd.period_range('201001','201005',freq='M'))
2010-01    0.755961
2010-02   -1.074492
2010-03   -0.379719
2010-04    0.153662
2010-05   -0.291157
Freq: M, dtype: float64

时期的频率转换

Period 和 PeriodIndex 对象都可以通过其 .asfreq(freq, method=None, how=None) 方法被转换成别的频率。

lang:python
>>> p = pd.Period('2007',freq='A-DEC')
>>> p.asfreq('M',how='start')
Period('2007-01', 'M')
>>> p.asfreq('M',how='end')
Period('2007-12', 'M')
>>> ts = Series(np.random.randn(1),index=[p])
>>> ts
2007   -0.112347
Freq: A-DEC, dtype: float64
>>> ts.asfreq('M',how='start')
2007-01   -0.112347
Freq: M, dtype: float64

时间戳与时期间相互转换

以时间戳和以时期为 index 的 Series 和 DataFrame 都有一对 .to_period() 和 to_timestamp(how='start') 方法用于互相转换 index 的类型。因为从 period 到 timestamp 的转换涉及到一个取端值的问题，所以需要一个额外的 how 参数，默认为 'start'：

lang:python
>>> ts = Series(np.random.randn(5),index=pd.period_range('201001','201005',freq='M'))
>>> ts
2010-01   -0.312160
2010-02    0.962652
2010-03   -0.959478
2010-04    1.240236
2010-05   -0.916218
Freq: M, dtype: float64
>>> ts.to_timestamp()
2010-01-01   -0.312160
2010-02-01    0.962652
2010-03-01   -0.959478
2010-04-01    1.240236
2010-05-01   -0.916218
Freq: MS, dtype: float64
>>> ts.to_timestamp(how='end')
2010-01-31   -0.312160
2010-02-28    0.962652
2010-03-31   -0.959478
2010-04-30    1.240236
2010-05-31   -0.916218
Freq: M, dtype: float64
>>> ts.to_timestamp().to_period()
2010-01-01 00:00:00.000   -0.312160
2010-02-01 00:00:00.000    0.962652
2010-03-01 00:00:00.000   -0.959478
2010-04-01 00:00:00.000    1.240236
2010-05-01 00:00:00.000   -0.916218
Freq: L, dtype: float64
>>> ts.to_timestamp().to_period('M')
2010-01   -0.312160
2010-02    0.962652
2010-03   -0.959478
2010-04    1.240236
2010-05   -0.916218
Freq: M, dtype: float64

重采样及频率转换

重采样（resampling）指的是将时间序列从一个频率转换到另一个频率的过程。pandas 对象都含有一个.resample(freq, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0) 方法用于实现这个过程。

本篇最前面曾用 resample 规整化过时间序列。当时进行的是插值操作，因为原索引的频率与给出的 freq 参数相同。resample 方法更多的应用场合是 freq 发生改变的时候，这时操作就分为升采样（upsampling）和降采样（downsampling）两种。具体的区别都体现在参数里。

lang:python
>>> ts
2010-01   -0.312160
2010-02    0.962652
2010-03   -0.959478
2010-04    1.240236
2010-05   -0.916218
Freq: M, dtype: float64
>>> ts.resample('D',fill_method='ffill')#升采样
2010-01-01   -0.31216
2010-01-02   -0.31216
2010-01-03   -0.31216
2010-01-04   -0.31216
2010-01-05   -0.31216
2010-01-06   -0.31216
2010-01-07   -0.31216
2010-01-08   -0.31216
2010-01-09   -0.31216
2010-01-10   -0.31216
2010-01-11   -0.31216
2010-01-12   -0.31216
2010-01-13   -0.31216
2010-01-14   -0.31216
2010-01-15   -0.31216
...
2010-05-17   -0.916218
2010-05-18   -0.916218
2010-05-19   -0.916218
2010-05-20   -0.916218
2010-05-21   -0.916218
2010-05-22   -0.916218
2010-05-23   -0.916218
2010-05-24   -0.916218
2010-05-25   -0.916218
2010-05-26   -0.916218
2010-05-27   -0.916218
2010-05-28   -0.916218
2010-05-29   -0.916218
2010-05-30   -0.916218
2010-05-31   -0.916218
Freq: D, Length: 151
>>> ts.resample('A-JAN',how='sum')#降采样
2010   -0.312160
2011    0.327191
Freq: A-JAN, dtype: float64

[pandas 时间序列操作]

from: http://blog.csdn.net/pipisorry/article/details/52209377

ref: [时间序列预测全攻略（附带Python代码）]

[Complete guide to create a Time Series Forecast (with Codes in Python)]

pandas小记：pandas时间序列分析和处理Timeseries的更多相关文章

pandas时间序列分析和处理Timeseries
pandas最基本的时间序列类型就是以时间戳(TimeStamp)为index元素的Series类型. 生成日期范围: pd.date_range()可用于生成指定长度的DatetimeIndex.参 ...
pandas小记：pandas基本设置
http://blog.csdn.net/pipisorry/article/details/49519545 ): print(df) Note: 试了好久终于找到了这种设置方法! 它是这样实现的 ...
python时间序列分析
题记:毕业一年多天天coding,好久没写paper了.在这动荡的日子里,也希望写点东西让自己静一静.恰好前段时间用python做了一点时间序列方面的东西,有一丁点心得体会想和大家 ...
基于 Keras 的 LSTM 时间序列分析——以苹果股价预测为例
简介时间序列简单的说就是各时间点上形成的数值序列,时间序列分析就是通过观察历史数据预测未来的值.预测未来股价走势是一个再好不过的例子了.在本文中,我们将看到如何在递归神经网络的帮助下执行时间序列分析 ...
[python] 时间序列分析之ARIMA
1 时间序列与时间序列分析在生产和科学研究中,对某一个或者一组变量进行观察测量,将在一系列时刻所得到的离散数字组成的序列集合,称之为时间序列. 时间序列分析是根据系统观察得到的时间序列数据, ...
R时间序列分析实例
一.作业要求自选时间序列完成时间序列的建模过程,要求序列的长度>=100. 报告要求以下几部分内容: 数据的描述:数据来源.期间.数据的定义.数据长度. 作时间序列图并进行简单评价. 进行时间 ...
SPSS时间序列分析
时间序列分析必须建立在预处理的基础上…… 今天看了一条新闻体会到了网络日志的重要性…… 指数平滑法(Exponential Smoothing,ES)是布朗(Robert G..Brown)所提出,布 ...
时间序列分析算法【R详解】
简介在商业应用中,时间是最重要的因素,能够提升成功率.然而绝大多数公司很难跟上时间的脚步.但是随着技术的发展,出现了很多有效的方法,能够让我们预测未来.不要担心,本文并不会讨论时间机器,讨论的都是很 ...
小记Java时间工具类
小记Java时间工具类废话不多说,这里主要记录以下几个工具两个时间只差(Data) 获取时间的格式格式化时间返回String 两个时间只差(String) 获取两个时间之间的日期.月份.年份 ...

随机推荐

LeetCode Binary Search Summary 二分搜索法小结
二分查找法作为一种常见的查找方法,将原本是线性时间提升到了对数时间范围,大大缩短了搜索时间,具有很大的应用场景,而在LeetCode中,要运用二分搜索法来解的题目也有很多,但是实际上二分查找法的查找目 ...
IDEA2017.3.4破解方式
下载jetbrainsCrack-2.7-release-str.jar包下载地址: https://files.cnblogs.com/files/xifenglou/JetBrains.zip ...
IIS&ASP.NET 站点IP跳转到域名
前言:先到微软的 https://www.iis.net/downloads/microsoft/url-rewrite 下载URL Rewrite 目标:输入ip跳转到域名所在的网站比如58的1 ...
ios开发-MapKit（地图框架）使用简介
我们使用app的时候,很多软件都自带了地图功能.我们可以看到自己的位置,看到周围商场等信息.我们也可以导航,划线等. 其实苹果的MapKit使用起来还是很简单的.这里简单的介绍一下. 0.使用前准备 ...
06_Linux目录文件操作命令3查找命令_我的Linux之路
上几节已经大致跟大家说了在Linux端文件目录操作的一些命令这篇随笔,我们继续来学习对文件目录的操作命令对文件或目录进行查找的命令 find 指定目录下查找文件 find(选项)(参数) find ...
[SDOI2008]Sandy的卡片
题目描述 Sandy和Sue的热衷于收集干脆面中的卡片. 然而,Sue收集卡片是因为卡片上漂亮的人物形象,而Sandy则是为了积攒卡片兑换超炫的人物模型. 每一张卡片都由一些数字进行标记,第i张卡片的 ...
模板Link Cut Tree （动态树）
题目描述给定N个点以及每个点的权值,要你处理接下来的M个操作.操作有4种.操作从0到3编号.点从1到N编号. 0:后接两个整数(x,y),代表询问从x到y的路径上的点的权值的xor和.保证x到y是联 ...
LOJ #6031 字符串
Description Solution 当 \(k\) 值较小时,发现询问串比较多,串长比较小然后对 \(Q\) 个询问区间离线跑莫队,一次考虑每一个区间的贡献假设一个区间 \([i,j]\) ...
Educational Codeforces Round 17F Tree nesting
来自FallDream的博客,未经允许,请勿转载, 谢谢. 给你两棵树,一棵比较大(n<=1000),一棵比较小(m<=12) 问第一棵树中有多少个连通子树和第二棵同构. 答案取膜1e9+ ...
例10-5 uva12716
题意:gcd(a,b) = a^b,( 1≤ a , b ≤ n) 思路: ① a^b = c, 所以 a^c = b,而且c是a的约数,枚举a,c,再gcd判断 ② 打表可知 a-b = c,而且a ...

pandas小记：pandas时间序列分析和处理Timeseries

其它时间序列处理相关的包