Data

The zip file containing the data can be downloaded here:

The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. For example, data for monitor 200 is contained in the file "200.csv". Each file contains three variables:

  • Date: the date of the observation in YYYY-MM-DD format (year-month-day)
  • sulfate: the level of sulfate PM in the air on that date (measured in micrograms per cubic meter)
  • nitrate: the level of nitrate PM in the air on that date (measured in micrograms per cubic meter)

For this programming assignment you will need to unzip this file and create the directory 'specdata'. Once you have unzipped the zip file, do not make any modifications to the files in the 'specdata' directory. In each file you'll notice that there are many days where either sulfate or nitrate (or both) are missing (coded as NA). This is common with air pollution monitoring data in the United States.

Part 1

Write a function named 'pollutantmean' that calculates the mean of a pollutant (sulfate or nitrate) across a specified list of monitors. The function 'pollutantmean' takes three arguments: 'directory', 'pollutant', and 'id'. Given a vector monitor ID numbers, 'pollutantmean' reads that monitors' particulate matter data from the directory specified in the 'directory' argument and returns the mean of the pollutant across all of the monitors, ignoring any missing values coded as NA. A prototype of the function is as follows

pollutantmean <- function(directory, pollutant, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files ## 'pollutant' is a character vector of length 1 indicating
## the name of the pollutant for which we will calculate the
## mean; either "sulfate" or "nitrate". ## 'id' is an integer vector indicating the monitor ID numbers
## to be used ## Return the mean of the pollutant across all monitors list
## in the 'id' vector (ignoring NA values)
## NOTE: Do not round the result!
}

You can see some example output from this function. The function that you write should be able to match this output. Please save your code to a file named pollutantmean.R.

pollutantmean <- function(directory, pollutant, id = 1:332){
tempsum <- 0
templen <- 0
for(i in id ){
fid <- sprintf("%03d",i)
filename <- paste(directory,"/",fid,".csv",sep="")
dat <- read.csv(filename)
src <- dat[pollutant]
src <- na.omit(src) # omit NA
tempsum <- tempsum + sum(src)
templen <- templen + dim(src)[1]
}
if(templen >0 ){
pollutantmean <- tempsum / templen
}
pollutantmean
}

Part 2

Write a function that reads a directory full of files and reports the number of completely observed cases in each data file. The function should return a data frame where the first column is the name of the file and the second column is the number of complete cases. A prototype of this function follows

complete <- function(directory, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files ## 'id' is an integer vector indicating the monitor ID numbers
## to be used ## Return a data frame of the form:
## id nobs
## 1 117
## 2 1041
## ...
## where 'id' is the monitor ID number and 'nobs' is the
## number of complete cases
}

You can see some example output from this function. The function that you write should be able to match this output. Please save your code to a file named complete.R. To run the submit script for this part, make sure your working directory has the file complete.R in it.

complete <- function(directory, id = 1:332){
nobs <- NULL
for(i in id){
fid <- sprintf("%03d",i)
filename <- paste(directory,"/",fid,".csv",sep="")
dat <- read.csv(filename)
src <- na.omit(dat) # omit NA
nobs <- c(nobs,dim(src)[1])
}
complete <- data.frame(id,nobs)
}

Part 3

Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows

corr <- function(directory, threshold = 0) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files ## 'threshold' is a numeric vector of length 1 indicating the
## number of completely observed observations (on all
## variables) required to compute the correlation between
## nitrate and sulfate; the default is 0 ## Return a numeric vector of correlations
## NOTE: Do not round the result!
}

For this function you will need to use the 'cor' function in R which calculates the correlation between two vectors. Please read the help page for this function via '?cor' and make sure that you know how to use it.

You can see some example output from this function. The function that you write should be able to match this output. Please save your code to a file named corr.R. To run the submit script for this part, make sure your working directory has the file corr.R in it.

corr <- function(directory, threshold = 0) {
corr.list <- NULL
id <- 1:332
dat <- NULL
for(i in id){
fid <- sprintf("%03d",i)
filename <- paste(directory,"/",fid,".csv",sep="")
dat <- read.csv(filename)
src <- na.omit(dat) # omit NA'
dat <- src
len <- length(dat$ID);
if(len > threshold && threshold >=0 ){
corr.re <- cor(dat$sulfate, dat$nitrate)
corr.list=c(corr.list, corr.re)
}
}
corr.list
}

R 语言程序设计的更多相关文章

  1. R语言介绍

    R语言简介 R语言是一种为统计计算和图形显示而设计的语言环境,是贝尔实验室(Bell Laboratories)的Rick Becker.John Chambers和Allan Wilks开发的S语言 ...

  2. 统计计算与R语言的资料汇总(截止2016年12月)

    本文在Creative Commons许可证下发布. 在fedora Linux上断断续续使用R语言过了9年后,发现R语言在国内用的人逐渐多了起来.由于工作原因,直到今年暑假一个赴京工作的机会与一位统 ...

  3. 机器学习 1、R语言

    R语言 R是用于统计分析.绘图的语言和操作环境.R是属于GNU系统的一个自由.免费.源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具. 特点介绍 •主要用于统计分析.绘图.数据挖掘 •R内置 ...

  4. 【R语言系列】R语言初识及安装

    一.R是什么 R语言是由新西兰奥克兰大学的Ross Ihaka和Robert Gentleman两个人共同发明. 其词法和语法分别源自Schema和S语言. R定义:一个能够自由幼小的用于统计计算和绘 ...

  5. # C语言程序设计第一次作业1234

    ---恢复内容开始--- C语言程序设计第一次作业 1.求圆面积和周长 输入圆的半径,计算圆的周长和面积 (1)流程图 (2)测试数据及运行结果 测试数据r=3 运行结果 2.判断闰年 输入一个四位年 ...

  6. 比较分析C++、Java、Python、R语言的面向对象特征,这些特征如何实现的?有什么相同点?

    一门课的课后题答案,在这里备份一下: 面向对象程序设计语言 –  比较分析C++.Java.Python.R语言的面向对象特征,这些特征如何实现的?有什么相同点? C++ 语言的面向对象特征: 对象模 ...

  7. R语言手册

    在R的官方教程里是这么给R下注解的:一个数据分析和图形显示的程序设计环境(A system for data analysis and visualization which is built bas ...

  8. 《数据挖掘:R语言实战》

    <数据挖掘:R语言实战> 基本信息 作者: 黄文    王正林 丛书名: 大数据时代的R语言 出版社:电子工业出版社 ISBN:9787121231223 上架时间:2014-6-6 出版 ...

  9. 【R】R语言常用函数

    R语言常用函数 基本 一.数据管理vector:向量 numeric:数值型向量 logical:逻辑型向量character:字符型向量 list:列表 data.frame:数据框c:连接为向量或 ...

随机推荐

  1. 【BZOJ-1787&1832】Meet紧急集合&聚会 倍增LCA

    1787: [Ahoi2008]Meet 紧急集合 Time Limit: 20 Sec  Memory Limit: 162 MBSubmit: 2259  Solved: 1023[Submit] ...

  2. BZOJ1040 [ZJOI2008]骑士

    Description Z国的骑士团是一个很有势力的组织,帮会中汇聚了来自各地的精英.他们劫富济贫,惩恶扬善,受到社会各 界的赞扬.最近发生了一件可怕的事情,邪恶的Y国发动了一场针对Z国的侵略战争.战 ...

  3. CCNET+ProGet+Windows Batch搭建全自动的内部包打包和推送及管理平台

    所要用的工具: 1.CCNET(用于检测SVN有改动提交时自动构建,并运行nuget的自动打包和推送批处理) 2.ProGet(目前见到最好用的nuget内部包管理平台) 3.Windows Batc ...

  4. 自动存储管理 ASM (转)

    文章转自:http://www.itpub.net/thread-1342473-1-1.html 自动存储管理 (ASM) ASM 是 Oracle 数据库 10g 中一个非常出色的新特性,它以平台 ...

  5. [NOIP2014] 提高组 洛谷P1328 生活大爆炸版石头剪刀布

    题目描述 石头剪刀布是常见的猜拳游戏:石头胜剪刀,剪刀胜布,布胜石头.如果两个人出拳一样,则不分胜负.在<生活大爆炸>第二季第8 集中出现了一种石头剪刀布的升级版游戏. 升级版游戏在传统的 ...

  6. Sublime Text以及Package Control安装方法

    官方下载:Sublime Text 中国论坛:Sublime 论坛 Sublime Text 是一个代码编辑器,具有漂亮的用户界面和强大的功能,并且它还是一个跨平台的编辑器,同时支持Windows.L ...

  7. 淘宝npm镜像

    来源:https://cnodejs.org/topic/4f9904f9407edba21468f31e 镜像使用方法(三种办法任意一种都能解决问题,建议使用第三种,将配置写死,下次用的时候配置还在 ...

  8. [COCI2012Final]Pro1

    校内OJ上的题. 数据范围非常小,用暴搜就可以,加点剪枝阶乘级别的复杂度竟然可以跑得比$O(N^4)$的算法还要快QAQ. 我用的是Floyd,参考了别人的代码.大概就是先跑个Floyd把点点之间路径 ...

  9. 卸载自己编译的程序(ubuntu14.04)

    cd 源代码目录make clean./configuremake uninstall

  10. Docker入门教程(一)介绍

    http://dockone.io/article/101 Docker入门教程(一)介绍 [编者的话]DockerOne组织翻译了Flux7的Docker入门教程,本文是系列入门教程的第一篇,介绍了 ...