| Data Wrangling |

# Sort all the data into one file

files = ['BeijingPM20100101_20151231.csv','ChengduPM20100101_20151231.csv','GuangzhouPM20100101_20151231.csv','ShanghaiPM20100101_20151231.csv','ShenyangPM20100101_20151231.csv']
out_columns = ['No', 'year', 'month', 'day', 'hour', 'season', 'PM_US Post']

# Create a void dataframe

df_all_cities = pd.DataFrame()

# Iterate to write diffrent files

for inx, val in enumerate(files):
df = pd.read_csv(val)
df = df[out_columns]
# create a city column
df['city'] = val.split('P')[0]
# map season
df['season'] = df['season'].map({1:'Spring', 2:'Summer', 3:'Autumn', 4: 'Winter'})
# append each file and merge all files into one
df_all_cities = df_all_cities.append(df)

# replace the space in variable names with '_'

df_all_cities.columns = [c.replace(' ', '_') for c in df_all_cities.columns]

# Assignment: 

# print the length of data
print("The number of row in this dataset is ",len(Beijing_data.index))
# calculating the number of records in column "PM_Dongsi"
print("There number of missing data records in PM_Dongsi is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsi'].dropna()))
print("There number of missing data records in PM_Dongsihuan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsihuan'].dropna()))
print("There number of missing data records in PM_Nongzhanguan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Nongzhanguan'].dropna()))
print("There number of missing data records in DEWP is: ",len(Beijing_data.index) - len(Beijing_data['DEWP'].dropna()))
print("There number of missing data records in HUMI is: ",len(Beijing_data.index) - len(Beijing_data['HUMI'].dropna()))
print("There number of missing data records in PRES is: ",len(Beijing_data.index) - len(Beijing_data['PRES'].dropna()))
print("There number of missing data records in TEMP is: ",len(Beijing_data.index) - len(Beijing_data['TEMP'].dropna()))
print("There number of missing data records in cbwd is: ",len(Beijing_data.index) - len(Beijing_data['cbwd'].dropna()))
print("There number of missing data records in Iws is: ",len(Beijing_data.index) - len(Beijing_data['Iws'].dropna()))
print("There number of missing data records in precipitation is: ",len(Beijing_data.index) - len(Beijing_data['precipitation'].dropna()))
print("There number of missing data records in Iprec is: ",len(Beijing_data.index) - len(Beijing_data['Iprec'].dropna()))

Learning notes | Data Analysis: 1.2 data wrangling的更多相关文章

  1. Learning notes | Data Analysis: 1.1 data evaluation

    | Data Evaluation | - Use Shift + Enter or Shift + Return to run the upper box so as to make it disp ...

  2. How to use data analysis for machine learning (example, part 1)

    In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...

  3. Learning Spark: Lightning-Fast Big Data Analysis 中文翻译

    Learning Spark: Lightning-Fast Big Data Analysis 中文翻译行为纯属个人对于Spark的兴趣,仅供学习. 如果我的翻译行为侵犯您的版权,请您告知,我将停止 ...

  4. 用pandas进行数据清洗(二)(Data Analysis Pandas Data Munging/Wrangling)

    在<用pandas进行数据清洗(一)(Data Analysis Pandas Data Munging/Wrangling)>中,我们介绍了数据清洗经常用到的一些pandas命令. 接下 ...

  5. An Introduction to Stock Market Data Analysis with R (Part 1)

    Around September of 2016 I wrote two articles on using Python for accessing, visualizing, and evalua ...

  6. 学习笔记之Python for Data Analysis

    Python for Data Analysis, 2nd Edition https://www.safaribooksonline.com/library/view/python-for-data ...

  7. 《利用Python进行数据分析: Python for Data Analysis 》学习随笔

    NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐 变量名 变量方法 路径 解释 ?解释, ??显示函数源码 ?搜索命名 ...

  8. Python for Data Analysis

    Data Analysis with Python ch02 一些有趣的数据分析结果 Male描述的是美国新生儿男孩纸的名字的最后一个字母的分布 Female描述的是美国新生儿女孩纸的名字的最后一个字 ...

  9. 深入浅出数据分析 Head First Data Analysis Code 数据与代码

    <深入浅出数据分析>英文名为Head First Data Analysis Code, 这本书中提供了学习使用的数据和程序,原书链接由于某些原因不 能打开,这里在提供一个下载的链接.去下 ...

随机推荐

  1. 用CSS写扫描二维码图标

    代码如下: <style>.icon{margin:300px;width:30px;height:30px;position:relative}.icon .b{border:2px s ...

  2. Struts2学习-Ioc学习-spring

    1.面向对象写法(带着面向过程的思维)电脑 computer = new 电脑(); [电脑代码中 new 打印机()]computer.打印文本("hello 140"); 电脑 ...

  3. db2巡检小脚本

    写了下db2巡检的一个小脚本,只能做常规检查,减少日常工作量,脚本内容如下: #!/bash/bin echo "物理CPU个数为:"cat /proc/cpuinfo| grep ...

  4. python csv写入数据,消除空行

    import csv rowlist=[{'first_name': 'mark', 'last_name': 'zhao','age':21}, {'first_name': 'tony', 'la ...

  5. Topic model的变种及其应用[1]

    转: http://www.blogbus.com/krischow-logs/65749376.html   LDA 着实 带领着 Topic model 火了一把. 但是其实我们华人世界内,也不乏 ...

  6. python UI自动化实战记录八:添加配置

    添加配置文件写入测试地址等,当环境切换时只需修改配置文件即可. 1 在项目目录下添加文件 config.ini 写入: [Domain] domain = http://test.domain.cn ...

  7. 一个理解PHP面向对象编程(OOP)的实例

    <?php //定义一个“人”类作为父类 class Person{ //声明一个新变量公共变量$name,可被任何包中的类访问 public $name;//人的名字 public $sex; ...

  8. HDU 2018 Multi-University Training Contest 3 Problem A. Ascending Rating 【单调队列优化】

    任意门:http://acm.hdu.edu.cn/showproblem.php?pid=6319 Problem A. Ascending Rating Time Limit: 10000/500 ...

  9. if else 和 switch的效率

    switch在判断分支时,没有判断所有的可能性,而是用一个静态表来解决这个问题,所以速度要比if-else快. 但是,switch对较复杂的表达式进行判断,所以当我们需要判断一些简单数值时,用swit ...

  10. elementUI之switch应用的坑

    前言: 因为项目中用到了饿了么出品的element-ui这一套ui框架,所以很多地方都踩在了坑里,前面碰到了一些,今天着重聊一下switch这个组件. 首先switch接受Boolean类型的数据,莫 ...