美国2012年总统候选人政治献金数据分析

导入包

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

方便操作,将月份和参选人以及所在政党进行定义

months = {'JAN' : 1, 'FEB' : 2, 'MAR' : 3, 'APR' : 4, 'MAY' : 5, 'JUN' : 6,
'JUL' : 7, 'AUG' : 8, 'SEP' : 9, 'OCT': 10, 'NOV': 11, 'DEC' : 12}
of_interest = ['Obama, Barack', 'Romney, Mitt', 'Santorum, Rick',
'Paul, Ron', 'Gingrich, Newt']
parties = {
'Bachmann, Michelle': 'Republican',
'Romney, Mitt': 'Republican',
'Obama, Barack': 'Democrat',
"Roemer, Charles E. 'Buddy' III": 'Reform',
'Pawlenty, Timothy': 'Republican',
'Johnson, Gary Earl': 'Libertarian',
'Paul, Ron': 'Republican',
'Santorum, Rick': 'Republican',
'Cain, Herman': 'Republican',
'Gingrich, Newt': 'Republican',
'McCotter, Thaddeus G': 'Republican',
'Huntsman, Jon': 'Republican',
'Perry, Rick': 'Republican'
}
df = pd.read_csv('./data/usa_election.txt')
df.head()
C:\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2728: DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 20-JUN-11 NaN NaN NaN SA17A 736166
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 23-JUN-11 NaN NaN NaN SA17A 736166
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 05-JUL-11 NaN NaN NaN SA17A 749073
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 01-AUG-11 NaN NaN NaN SA17A 749073
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 20-JUN-11 NaN NaN NaN SA17A 736166
# 新建一列各个候选人所在党派party
df['party'] = df['cand_nm'].map(parties)
df.head()

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num party
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 20-JUN-11 NaN NaN NaN SA17A 736166 Republican
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 23-JUN-11 NaN NaN NaN SA17A 736166 Republican
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 05-JUL-11 NaN NaN NaN SA17A 749073 Republican
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 01-AUG-11 NaN NaN NaN SA17A 749073 Republican
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 20-JUN-11 NaN NaN NaN SA17A 736166 Republican
# party这一列中有哪些元素
df['party'].unique()
array(['Republican', 'Democrat', 'Reform', 'Libertarian'], dtype=object)
# 统计party列中各个元素出现次数
df['party'].value_counts()
Democrat       292400
Republican 237575
Reform 5364
Libertarian 702
Name: party, dtype: int64
# 查看各个党派收到的政治献金总数contb_receipt_amt
df.groupby(by='party')['contb_receipt_amt'].sum()
party
Democrat 8.105758e+07
Libertarian 4.132769e+05
Reform 3.390338e+05
Republican 1.192255e+08
Name: contb_receipt_amt, dtype: float64
# 查看每天各个党派收到的政治献金总数contb_receipt_amt
df.groupby(by=['contb_receipt_dt','party'])['contb_receipt_amt'].sum()
contb_receipt_dt  party
01-APR-11 Reform 50.00
Republican 12635.00
01-AUG-11 Democrat 175281.00
Libertarian 1000.00
Reform 1847.00
Republican 234598.46
01-DEC-11 Democrat 651532.82
Libertarian 725.00
Reform 875.00
Republican 486405.96
01-FEB-11 Republican 250.00
01-JAN-11 Republican 8600.00
01-JAN-12 Democrat 58098.80
Reform 515.00
Republican 75704.72
01-JUL-11 Democrat 165961.00
Libertarian 2000.00
Reform 100.00
Republican 115848.72
01-JUN-11 Democrat 145459.00
Libertarian 500.00
Reform 50.00
Republican 433109.20
01-MAR-11 Republican 1000.00
01-MAY-11 Democrat 82644.00
Reform 480.00
Republican 28663.87
01-NOV-11 Democrat 122529.87
Libertarian 3000.00
Reform 1792.00
...
30-OCT-11 Reform 3910.00
Republican 43913.16
30-SEP-11 Democrat 3373517.24
Libertarian 550.00
Reform 2050.00
Republican 4886331.76
31-AUG-11 Democrat 374387.44
Libertarian 10750.00
Reform 450.00
Republican 1017735.02
31-DEC-11 Democrat 3553072.57
Reform 695.00
Republican 1094376.72
31-JAN-11 Republican 6000.00
31-JAN-12 Democrat 1418410.31
Reform 150.00
Republican 869890.41
31-JUL-11 Democrat 20305.00
Reform 966.00
Republican 12781.02
31-MAR-11 Reform 200.00
Republican 62475.00
31-MAY-11 Democrat 351705.66
Libertarian 250.00
Reform 100.00
Republican 301339.80
31-OCT-11 Democrat 204996.87
Libertarian 4250.00
Reform 3105.00
Republican 734601.83
Name: contb_receipt_amt, Length: 1183, dtype: float64
# 将表中日期格式转换为'yyyy-mm-dd'  day-m-y
def transformDate(d):
day,month,year = d.split('-')
month = months[month]
return '20'+year+'-'+str(month)+'-'+day
df['contb_receipt_dt'] = df['contb_receipt_dt'].apply(transformDate)
df.head()

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num party
0 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 250.0 2011-6-20 NaN NaN NaN SA17A 736166 Republican
1 C00410118 P20002978 Bachmann, Michelle HARVEY, WILLIAM MOBILE AL 3.6601e+08 RETIRED RETIRED 50.0 2011-6-23 NaN NaN NaN SA17A 736166 Republican
2 C00410118 P20002978 Bachmann, Michelle SMITH, LANIER LANETT AL 3.68633e+08 INFORMATION REQUESTED INFORMATION REQUESTED 250.0 2011-7-05 NaN NaN NaN SA17A 749073 Republican
3 C00410118 P20002978 Bachmann, Michelle BLEVINS, DARONDA PIGGOTT AR 7.24548e+08 NONE RETIRED 250.0 2011-8-01 NaN NaN NaN SA17A 749073 Republican
4 C00410118 P20002978 Bachmann, Michelle WARDENBURG, HAROLD HOT SPRINGS NATION AR 7.19016e+08 NONE RETIRED 300.0 2011-6-20 NaN NaN NaN SA17A 736166 Republican
# 查看老兵(捐献者职业)主要支持谁  :查看老兵们捐赠给谁的钱最多
# 1.将老兵对应的行数据取出
df['contbr_occupation'] == 'DISABLED VETERAN'
old_bing = df.loc[df['contbr_occupation'] == 'DISABLED VETERAN']
# 2.根据候选人分组
old_bing.groupby(by='cand_nm')['contb_receipt_amt'].sum()
cand_nm
Cain, Herman 300.00
Obama, Barack 4205.00
Paul, Ron 2425.49
Santorum, Rick 250.00
Name: contb_receipt_amt, dtype: float64
df['contb_receipt_amt'].max()
1944042.43
#捐赠金额最大的人的职业以及捐献额  .通过query("查询条件来查找捐献人职业")
df.query('contb_receipt_amt == 1944042.43')

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num party
176127 C00431445 P80003338 Obama, Barack OBAMA VICTORY FUND 2012 - UNITEMIZED CHICAGO IL 60680 NaN NaN 1944042.43 2011-12-31 NaN X * SA18 763233 Democrat

pandas分组聚合案例的更多相关文章

  1. Pandas 分组聚合

    # 导入相关库 import numpy as np import pandas as pd 创建数据 index = pd.Index(data=["Tom", "Bo ...

  2. Python Pandas分组聚合

    Pycharm 鼠标移动到函数上,CTRL+Q可以快速查看文档,CTR+P可以看基本的参数. apply(),applymap()和map() apply()和applymap()是DataFrame ...

  3. Pandas 分组聚合 :分组、分组对象操作

    1.概述 1.1 group语法 df.groupby(self, by=None, axis=0, level=None, as_index: bool=True, sort: bool=True, ...

  4. DataAnalysis-Pandas分组聚合

    title: Pandas分组聚合 tags: 数据分析 python categories: DataAnalysis toc: true date: 2020-02-10 16:28:49 Des ...

  5. pandas分组和聚合

    Pandas分组与聚合 分组 (groupby) 对数据集进行分组,然后对每组进行统计分析 SQL能够对数据进行过滤,分组聚合 pandas能利用groupby进行更加复杂的分组运算 分组运算过程:s ...

  6. Pandas分组运算(groupby)修炼

    Pandas分组运算(groupby)修炼 Pandas的groupby()功能很强大,用好了可以方便的解决很多问题,在数据处理以及日常工作中经常能施展拳脚. 今天,我们一起来领略下groupby() ...

  7. 34.分组聚合操作—bucket

    主要知识点: 学习聚合知识     一.准备数据     1.家电卖场案例背景建立index 以一个家电卖场中的电视销售数据为背景,来对各种品牌,各种颜色的电视的销量和销售额,进行各种各样角度的分析 ...

  8. 白日梦的Elasticsearch实战笔记,ES账号免费借用、32个查询案例、15个聚合案例、7个查询优化技巧。

    目录 一.导读 二.福利:账号借用 三._search api 搜索api 3.1.什么是query string search? 3.2.什么是query dsl? 3.3.干货!32个查询案例! ...

  9. 白日梦的Elasticsearch实战笔记,32个查询案例、15个聚合案例、7个查询优化技巧。

    目录 一.导读 三._search api 搜索api 3.1.什么是query string search? 3.2.什么是query dsl? 3.3.干货!32个查询案例! 四.聚合分析 4.1 ...

随机推荐

  1. VS2012 改C# 模版

    原始文件位置: C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\ItemTemplatesCache\CSharp\Co ...

  2. 211-基于FMC的ADC-DAC子卡

    基于FMC的ADC-DAC子卡 一.板卡概述 FMC-1AD-1DA-1SYNC是我司自主研发的一款1路1G AD采集.1路2.5G DA回放的FMC.1路AD同步信号子卡.板卡采用标准FMC子卡架构 ...

  3. 读书笔记二、pandas之DataFrame

    注:DataFrame的构造方法与Series类似,只不过同时接受一条一维数据源,每一条都会成为单独的一列. 注意,返回的Series拥有原DataFrame 相同的索引,且其name属性也已经被相应 ...

  4. tf.expand_dims

    想要增加一维,可以使用tf.expand_dims(input, dim, name=None)函数 t = np.array(np.arange(1, 1 + 30).reshape([2, 3, ...

  5. spring+mybatis+log4j 输出SQL

    1.在mybatis-config.xml配置中添加setting配置参数,会打印SQL执行结果 <?xml version="1.0" encoding="UTF ...

  6. WiFi密码新攻击破解方法,黑客攻破只需10秒

    近日,中国知名黑客安全组织东方联盟研究人员透露了一种新的WiFi黑客技术,使黑客更容易破解大多数现代路由器的WiFi密码,并且攻破只需要10秒,速度非常快. 方法是利用由流行的密码破解工具Hashca ...

  7. 【leetcode】816. Ambiguous Coordinates

    题目如下: 解题思路:我的方案是先把S拆分成整数对,例如S='1230',先拆分成(1,230),(12,30),(123,0),然后再对前面整数对进行加小数点处理.比如(12,30)中的12可以加上 ...

  8. linux运维、架构之路-Nginx服务

    一.Nginx服务 1.介绍         Nginx软件常见的使用方式或架构为:LNMP(linux nginx mysql php),Nginx三大主要功能,web网站服务,反向代理负载均衡(n ...

  9. mysql版本

    $ mysql Welcome to the MariaDB monitor.  Commands end with ; or \g. Your MySQL connection id is 4791 ...

  10. Python全栈开发,Day2

    一.Pycharm的使用 1.创建项目 2.python调整字体大小随ctrl+鼠标滚轮上下滚动 3.python新建程序自动补全编码和环境 4.设置断点(在代码前面行号后面单击鼠标左键) 5.调试断 ...