Learning notes | Data Analysis: 1.2 data wrangling
| Data Wrangling |
# Sort all the data into one file
files = ['BeijingPM20100101_20151231.csv','ChengduPM20100101_20151231.csv','GuangzhouPM20100101_20151231.csv','ShanghaiPM20100101_20151231.csv','ShenyangPM20100101_20151231.csv']
out_columns = ['No', 'year', 'month', 'day', 'hour', 'season', 'PM_US Post']
# Create a void dataframe
df_all_cities = pd.DataFrame()
# Iterate to write diffrent files
for inx, val in enumerate(files):
df = pd.read_csv(val)
df = df[out_columns]
# create a city column
df['city'] = val.split('P')[0]
# map season
df['season'] = df['season'].map({1:'Spring', 2:'Summer', 3:'Autumn', 4: 'Winter'})
# append each file and merge all files into one
df_all_cities = df_all_cities.append(df)
# replace the space in variable names with '_'
df_all_cities.columns = [c.replace(' ', '_') for c in df_all_cities.columns]
# Assignment:
# print the length of data
print("The number of row in this dataset is ",len(Beijing_data.index))
# calculating the number of records in column "PM_Dongsi"
print("There number of missing data records in PM_Dongsi is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsi'].dropna()))
print("There number of missing data records in PM_Dongsihuan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Dongsihuan'].dropna()))
print("There number of missing data records in PM_Nongzhanguan is: ",len(Beijing_data.index) - len(Beijing_data['PM_Nongzhanguan'].dropna()))
print("There number of missing data records in DEWP is: ",len(Beijing_data.index) - len(Beijing_data['DEWP'].dropna()))
print("There number of missing data records in HUMI is: ",len(Beijing_data.index) - len(Beijing_data['HUMI'].dropna()))
print("There number of missing data records in PRES is: ",len(Beijing_data.index) - len(Beijing_data['PRES'].dropna()))
print("There number of missing data records in TEMP is: ",len(Beijing_data.index) - len(Beijing_data['TEMP'].dropna()))
print("There number of missing data records in cbwd is: ",len(Beijing_data.index) - len(Beijing_data['cbwd'].dropna()))
print("There number of missing data records in Iws is: ",len(Beijing_data.index) - len(Beijing_data['Iws'].dropna()))
print("There number of missing data records in precipitation is: ",len(Beijing_data.index) - len(Beijing_data['precipitation'].dropna()))
print("There number of missing data records in Iprec is: ",len(Beijing_data.index) - len(Beijing_data['Iprec'].dropna()))
Learning notes | Data Analysis: 1.2 data wrangling的更多相关文章
- Learning notes | Data Analysis: 1.1 data evaluation
| Data Evaluation | - Use Shift + Enter or Shift + Return to run the upper box so as to make it disp ...
- How to use data analysis for machine learning (example, part 1)
In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...
- Learning Spark: Lightning-Fast Big Data Analysis 中文翻译
Learning Spark: Lightning-Fast Big Data Analysis 中文翻译行为纯属个人对于Spark的兴趣,仅供学习. 如果我的翻译行为侵犯您的版权,请您告知,我将停止 ...
- 用pandas进行数据清洗(二)(Data Analysis Pandas Data Munging/Wrangling)
在<用pandas进行数据清洗(一)(Data Analysis Pandas Data Munging/Wrangling)>中,我们介绍了数据清洗经常用到的一些pandas命令. 接下 ...
- An Introduction to Stock Market Data Analysis with R (Part 1)
Around September of 2016 I wrote two articles on using Python for accessing, visualizing, and evalua ...
- 学习笔记之Python for Data Analysis
Python for Data Analysis, 2nd Edition https://www.safaribooksonline.com/library/view/python-for-data ...
- 《利用Python进行数据分析: Python for Data Analysis 》学习随笔
NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐 变量名 变量方法 路径 解释 ?解释, ??显示函数源码 ?搜索命名 ...
- Python for Data Analysis
Data Analysis with Python ch02 一些有趣的数据分析结果 Male描述的是美国新生儿男孩纸的名字的最后一个字母的分布 Female描述的是美国新生儿女孩纸的名字的最后一个字 ...
- 深入浅出数据分析 Head First Data Analysis Code 数据与代码
<深入浅出数据分析>英文名为Head First Data Analysis Code, 这本书中提供了学习使用的数据和程序,原书链接由于某些原因不 能打开,这里在提供一个下载的链接.去下 ...
随机推荐
- 用CSS写扫描二维码图标
代码如下: <style>.icon{margin:300px;width:30px;height:30px;position:relative}.icon .b{border:2px s ...
- Struts2学习-Ioc学习-spring
1.面向对象写法(带着面向过程的思维)电脑 computer = new 电脑(); [电脑代码中 new 打印机()]computer.打印文本("hello 140"); 电脑 ...
- db2巡检小脚本
写了下db2巡检的一个小脚本,只能做常规检查,减少日常工作量,脚本内容如下: #!/bash/bin echo "物理CPU个数为:"cat /proc/cpuinfo| grep ...
- python csv写入数据,消除空行
import csv rowlist=[{'first_name': 'mark', 'last_name': 'zhao','age':21}, {'first_name': 'tony', 'la ...
- Topic model的变种及其应用[1]
转: http://www.blogbus.com/krischow-logs/65749376.html LDA 着实 带领着 Topic model 火了一把. 但是其实我们华人世界内,也不乏 ...
- python UI自动化实战记录八:添加配置
添加配置文件写入测试地址等,当环境切换时只需修改配置文件即可. 1 在项目目录下添加文件 config.ini 写入: [Domain] domain = http://test.domain.cn ...
- 一个理解PHP面向对象编程(OOP)的实例
<?php //定义一个“人”类作为父类 class Person{ //声明一个新变量公共变量$name,可被任何包中的类访问 public $name;//人的名字 public $sex; ...
- HDU 2018 Multi-University Training Contest 3 Problem A. Ascending Rating 【单调队列优化】
任意门:http://acm.hdu.edu.cn/showproblem.php?pid=6319 Problem A. Ascending Rating Time Limit: 10000/500 ...
- if else 和 switch的效率
switch在判断分支时,没有判断所有的可能性,而是用一个静态表来解决这个问题,所以速度要比if-else快. 但是,switch对较复杂的表达式进行判断,所以当我们需要判断一些简单数值时,用swit ...
- elementUI之switch应用的坑
前言: 因为项目中用到了饿了么出品的element-ui这一套ui框架,所以很多地方都踩在了坑里,前面碰到了一些,今天着重聊一下switch这个组件. 首先switch接受Boolean类型的数据,莫 ...