import numpy as np
import pandas as pd

认识

Time series data is an impotant from of data in many different fields, such as finance, economics, ecology, neuroscience(神经学) and physics. Anything that is observed or measured at many points in time forms a time series.

Many time series are fixed frequency , which is to say that data points occur at regular intervals according to some rule, such as every 15 seconds, every 5 minutes, or once per month.

Time series can also be irregular(不规则的) without a fixed unit of time or offset between units. How you mark and refer to time series data depends on the application, and you may have one of the following:

  • Timestamps, specific instants in time (时间戳)
  • Fixed periods, such as the month January 2007 or the full year 2010 (时期)
  • Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals. (时间间隔)
  • Experiment or elapsed time(试验时间逝去); each timestamp is a measure of time relative to a particular start time (e.g. the diameter(直径) of a cookie baking each second since being palced in the oven)

In this chapter, I am mainly concerned with time series in the first three categories, though many of the teachniques can applied to experimental time series where the index may be an integer or floating-point number indicating elapsed time from the start of the experiment. The simplest and most widely used kind of time series are those indexed by timestamp.

pandas also supports indexes based on timedeltas, which can be a useful way of representing experiment or elapsed time. We do not explore timedelta indexes in this book , but you can learn more in the pandas documenttaion.

pandas provides many buit-in time series tools and data algorithims. You can efficiently work with very large time series and easily slice and dice, aggregate, and resample(重采样) irrgular-and fixed-frequency time series. Some of these tools are especially useful financial and economics applications, but you could certainly use them to analyze server log, too.

The Pyhton standard library includes data types for date and time data, as well as calendar-related(日历相关) functionality. The datetime, time, calendar modules are the main places to start. the datetime.datetime type, or simply datetime, is widely used.

from datetime import datetime
now = datetime.now()

now
datetime.datetime(2019, 4, 27, 15, 3, 14, 103616)
now.year, now.month, now.day, now.hour, now.minute
(2019, 4, 27, 15, 3)

datetime stores(存储) both the date and time down to the microsecond timedelta reprecents the temporal(临时的) difference between two datetime objects:

"cj 特方便, 在时间相加上"

delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)

delta
'cj 特方便, 在时间相加上'

datetime.timedelta(926, 56700)
delta.days, delta.seconds
(926, 56700)

You can add (or subtract) a timedelata or multiple thereof to a datetime object to yield a new shifted object:

from datetime import timedelta
start = datetime(2011, 1, 7)

"加12天"
start + timedelta(12)
'加12天'

datetime.datetime(2011, 1, 19, 0, 0)
" 减去24天"
start - 2*timedelta(12)
' 减去24天'

datetime.datetime(2010, 12, 14, 0, 0)

Table 11-1 summarizes the data types in the datetime module. While this chapter is mainly concerned with the data types in pandas and high-level time series manupulation, you may encounter the datetime-based types in many other places in Pyhton in the wild.

Type Description
date Store calendar date (year, month, day) using the Gregorian calendar
time Store time of day as hours,minutes, seconds, and microseconds
datetime Store both date and time
timedelta Reprecents the difference between tow datetime values(as days,second..)
tzinfo Base type for storing time zone infomation

String和Datetime间的转换

You can format datetime object and pandas Timestamp objects, which I'll introduce later, as strings using str or the strftime method, passing a format specification:

stamp = datetime(2011, 1, 3)

stamp
str(stamp)
datetime.datetime(2011, 1, 3, 0, 0)

'2011-01-03 00:00:00'
stamp.strftime('%Y-%m-%d')  # 四位数字的年
'2011-01-03'
stamp.strftime('%y-%m-%d')  # 2位数字的年
'11-01-03'

See Table 11-2 for a complete list of the format codes.

Type Description
%Y Four-digit year(4个数字的年)
%y Two-digit year
%m Two-dight month [01, 12]
%d Two-dight day [01, 31]
%H Hour(24-hour clock) [00, 23]
%I Hour(12-hour clock) [00, 12])
%M Two-dight minute [00, 59]
%S Second [00, 61] (second 60, 61 acccount for leap second)
%w Weekday as integer[0(Sundday), 6]
%U
%W
%z UTC time zone offset as +HHMM or -HHMM; empty if time zone naive
%F Shortcut for %Y-%m-%d (eg. 2012-4-8)
%D Shortcut for %m/%d/%y (eg. 04/18/12)

You can use these same format codes to convert strings to dates using date time.strptime:

value = "2011-01-03"
datetime.strptime(value, '%Y-%m-%d')
datetime.datetime(2011, 1, 3, 0, 0)
datestrs = ['7/6/2011', '8/6/2011']

[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]
[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

Datetime.strptime is a good way to parse a date with a know format. However, it can be a bit annoying to have to write a format spec each time, especially for common date formats.In this case, you can use the parse.parse method in the third-party dateutil package (this is installed automatically when you install pandas).

from dateutil.parser import parse
parse("2011-01-03")
datetime.datetime(2011, 1, 3, 0, 0)
parse("2011/01/03")
datetime.datetime(2011, 1, 3, 0, 0)

dateutil si capable of parsing most human-intelligble date representation:

parse('Jan 31, 1997, 10:45 PM')
datetime.datetime(1997, 1, 31, 22, 45)

In international locales, day appering before month is very common, so you can pass dayfirst=True to indicate this:

parse('6/12/2011', dayfirst=True)
datetime.datetime(2011, 12, 6, 0, 0)

pandas is generally oriented toward working with arrays of dates, whether used an axis index or a column in a DataFrame. The to_datetime method parses many different kinds of date representations. Standard date formats like ISO 8601 can be parsed very quickly:

datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

pd.to_datetime(datestrs)
DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

It also handles values that should be condidered missing (None, empty string. etc.):

idx = pd.to_datetime(datestrs + [None])

idx
DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)
idx[2]
NaT
pd.isnull(idx)
array([False, False,  True])

NaT (Not a Time) is pandas's null value for timestamp data.

dateutil.parser is a useful but imperfect tool. Notably, it will recognize some strings as dates that you might prefer that it didn't for example. '42' will be parsed as the year 2042 with today's ccalendar date.

datetime objects also have a number of locale-specific formatting options for systems in other countries or languages. For example, the abbreviated(缩写) month names will be different on German or French systems compared with English systme. See Table 11-3 for a listing.

  • %a Abbreviated weekday name
  • %A Full weekday name
  • %b 缩写月份的名字
  • %B 全写月份
  • %c Full date and time (eg. Tue 01 May 2012 04:20:57 PM)
  • %p 包含AM or PM
  • %x (eg. '05/01/2012')
  • %X (eg. '04:24:12 PM')

pandas 之 datetime 初识的更多相关文章

  1. python之pandas学习笔记-初识pandas

    初识pandas python最擅长的就是数据处理,而pandas则是python用于数据分析的最常用工具之一,所以学python一定要学pandas库的使用. pandas为python提供了高性能 ...

  2. Pandas 数据处理 | Datetime 在 Pandas 中的一些用法!

    Datatime 是 Python 中一种时间数据类型,对于不同时间格式之间的转换是比较方便的,而在 Pandas 中也同样支持 DataTime 数据机制,可以借助它实现许多有用的功能,例如 1,函 ...

  3. Pandas 基础(1) - 初识及安装 yupyter

    Hello, 大家好, 昨天说了我会再更新一个关于 Pandas 基础知识的教程, 这里就是啦......Pandas 被广泛应用于数据分析领域, 是一个很好的分析工具, 也是我们后面学习 machi ...

  4. 整理总结 python 中时间日期类数据处理与类型转换(含 pandas)

    我自学 python 编程并付诸实战,迄今三个月. pandas可能是我最高频使用的库,基于它的易学.实用,我也非常建议朋友们去尝试它.--尤其当你本身不是程序员,但多少跟表格或数据打点交道时,pan ...

  5. (转) Using the latest advancements in AI to predict stock market movements

    Using the latest advancements in AI to predict stock market movements 2019-01-13 21:31:18 This blog ...

  6. Python基础 | 日期时间操作

    目录 获取时间 时间映射 格式转换 字符串转日期 日期转字符串 unixtime 时间计算 时间偏移 时间差 "日期时间数据"作为三大基础数据类型之一,在数据分析中会经常遇到. 本 ...

  7. pandas中将timestamp转为datetime

    参考自:http://stackoverflow.com/questions/35312981/using-pandas-to-datetime-with-timestamps 在pandas Dat ...

  8. pandas处理时间序列(1):pd.Timestamp()、pd.Timedelta()、pd.datetime( )、 pd.Period()、pd.to_timestamp()、datetime.strftime()、pd.to_datetime( )、pd.to_period()

      Pandas库是处理时间序列的利器,pandas有着强大的日期数据处理功能,可以按日期筛选数据.按日期显示数据.按日期统计数据.   pandas的实际类型主要分为: timestamp(时间戳) ...

  9. pandas初识

    pandas初识 1.生成DataFrame型的数据 import pandas as pd import numpy as np dates = pd.date_range('20130101',p ...

随机推荐

  1. 将select 转为json

    CREATE PROCEDURE[dbo].[WXSP_SerializeJSON](@ParameterSQL AS VARCHAR(MAX))ASBEGIN DECLARE @SQL NVARCH ...

  2. 游戏设计模式——C++单例类

    前言: 本文将探讨单例类设计模式,单例类的懒汉模式/饿汉模式,单例类的多线程安全性,最后将利用C++模板减少单例类代码量. 本文假设有一个Manager管理类,并以此为探究单例类的设计模式. 懒汉模式 ...

  3. Ambari 大数据集群管理

    最近做了一个大数据项目,研究了下集群的搭建,现在将集群搭建整理的资料与大家分享一下!如有疑问可在评论区回复. 1前置配置 Centos7系统,每台系统都有java运行环境 全程使用root用户,避免安 ...

  4. NN tutorials:

    确实“人话”解释清楚了 ^_^ 池化不只有减少参数的作用,还可以: 不变性,更关注是否存在某些特征而不是特征具体的位置.可以看作加了一个很强的先验,让学到的特征要能容忍一些的变化.防止过拟合,提高模型 ...

  5. 弄明白kubernetes中的“三种IP”

    Node IP : Node节点的IP地址 Pod IP:Pod的IP地址 Cluster IP : Service 的IP地址 首先,Node IP是Kubernetes集群中每个节点(服务器)物理 ...

  6. webrtc笔记(2): 1对1实时视频/语音通讯原理概述

    开始正文之前,先思考1个问题:2个处于不同网络环境的(具备摄像头/麦克风多媒体设备的)浏览器,要实现点对点的实时视频/语音通讯,难点在哪? 至少得先搞定下面2个问题: 1.彼此要了解对方支持的媒体格式 ...

  7. NETCore下IConfiguration和IOptions的用法(转载)

    原文:https://www.jianshu.com/p/b9416867e6e6 新建一个NETCore Web API项目,在Startup.cs里就会开始使用IConfiguration和IOp ...

  8. golang实战--客户信息管理系统

    总计架构图: model/customer.go package model import ( "fmt" ) type Customer struct { Id int Name ...

  9. 博云 x 某农商行 | 银行信息化运维系统升级的最佳实践

    随着银行新一代信息化运维系统建设的推进,应用系统更新换代速度明显提升.数字化转型的发展对银行业务需求的敏捷性提出了越来越高的要求,促进敏捷开发和资源敏捷部署成为大势所趋. 背景 江苏某农村商业银行成立 ...

  10. webstorm关闭烦人的eslint语法检查

    使用了eslint语法检查之后发现JS代码里面处处是红线,通过右键菜单中的fix eslint problems选项又会发现页面代码的格式被eslint换行得不分青红皂白,索性关闭exlint语法检查 ...