Introduction to Writing Functions in R
在R中编写函数
为啥要用函数
为了代码更加可读,减少重复代码==lazy
You can type less code, saving effort and making your analyses more readable.
You can reuse your code from project to project.
所有的可重复执行的操作均可以放到函数里面
in R function are a type of variable ,so you assign functions using left arrow,just like anything else
args(函数名)
查询函数的参数,可以等同于help
> args(rank)
function (x, na.last = TRUE, ties.method = c("average", "first",
"last", "random", "max", "min"))
创建一个函数的步骤

# Update the function so heads have probability p_head
toss_coin <- function(n_flips,p_head) {
coin_sides <- c("head", "tail")
# Define a vector of weights
weights <- c(p_head,1-p_head)
# Modify the sampling to be weighted
sample(coin_sides, n_flips, replace = TRUE,prob=weights)
}
# Generate 10 coin tosses
toss_coin(10,0.8)
think of function as verbs
1.default args
在定义函数的时候就已经赋值了
cut_by_quantile <- function(x, n=5, na.rm, labels, interval_type) {
probs <- seq(0, 1, length.out = n + 1)
qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the n argument from the call
cut_by_quantile(
n_visits,
na.rm = FALSE,
labels = c("very low", "low", "medium", "high", "very high"),
interval_type = "(lo, hi]"
)
n是默认参数
若默认参数存在了,那么调用的时候只给形参传值好了
# Set the categories for interval_type to "(lo, hi]" and "[lo, hi)"
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels = NULL,
interval_type = c("(lo, hi]", "[lo, hi)")) {
# Match the interval_type argument
interval_type <- match.arg(interval_type)
probs <- seq(0, 1, length.out = n + 1)
qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the interval_type argument from the call
cut_by_quantile(n_visits)
Passing arguments between functions
# From previous steps
get_reciprocal <- function(x) {
1 / x
}
calc_harmonic_mean <- function(x) {
x %>%
get_reciprocal() %>%
mean() %>%
get_reciprocal()
}
std_and_poor500 %>%
# Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio
summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio))
calc_harmonic_mean <- function(x, ...) {
x %>%
get_reciprocal() %>%
mean(...) %>%
get_reciprocal()
}
>
> std_and_poor500 %>%
# Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio
summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio,na.rm=TRUE))
# A tibble: 11 x 2
sector hmean_pe_ratio
<chr> <dbl>
1 Communication Services 17.5
2 Consumer Discretionary 15.2
3 Consumer Staples 19.8
4 Energy 13.7
5 Financials 12.9
6 Health Care 26.6
7 Industrials 18.2
8 Information Technology 21.6
9 Materials 16.3
10 Real Estate 32.5
11 Utilities 23.9
Checking arguments
You can use the assert_*() functions from assertive to check inputs and throw errors when they fail.
calc_harmonic_mean <- function(x, na.rm = FALSE) {
# Assert that x is numeric
assert_is_numeric(x)
x %>%
get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}
弹出报错
calc_harmonic_mean <- function(x, na.rm = FALSE) {
assert_is_numeric(x)
# Check if any values of x are non-positive
if(any(is_non_positive(x), na.rm = TRUE)) {
# Throw an error
stop("x contains non-positive values, so the harmonic mean makes no sense.")
}
x %>%
get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}
# See what happens when you pass it negative numbers
calc_harmonic_mean(std_and_poor500$pe_ratio - 20)
固定函数参数
一直存在的且默认
Returning values from functions
Sometimes, you don't need to run through the whole body of a function to get the answer. In that case you can return early from that function using return().
To check if x is divisible by n, you can use is_divisible_by(x, n) from assertive.
Alternatively, use the modulo operator, %%. x %% n gives the remainder when dividing x by n, so x %% n == 0 determines whether x is divisible by n. Try 1:10 %% 3 == 0 in the console.
To solve this exercise, you need to know that a leap year is every 400th year (like the year 2000) or every 4th year that isn't a century (like 1904 but not 1900 or 1905).
is_leap_year <- function(year) {
# If year is div. by 400 return TRUE
if(is_divisible_by(year, 400)) {
return(TRUE)
}
# If year is div. by 100 return FALSE
if(is_divisible_by(year, 100)) {
return(FALSE)
}
# If year is div. by 4 return TRUE
if(is_divisible_by(year, 4)) {
return(TRUE)
}
# Otherwise return FALSE
FALSE
}
# Define a pipeable plot fn with data and formula args
pipeable_plot <- function(data, formula) {
# Call plot() with the formula interface
plot(formula, data)
# Invisibly return the input dataset
invisible(data)
}
# Draw the scatter plot of dist vs. speed again
plt_dist_vs_speed <- cars %>%
pipeable_plot(dist ~ speed)
# Now the plot object has a value
plt_dist_vs_speed
augments
Given an R statistical model or other non-tidy object, add columns to the original dataset such as predictions, residuals and cluster assignments.
> # Use broom tools to get a list of 3 data frames
> list(
# Get model-level values
model = glance(model),
# Get coefficient-level values
coefficients = tidy(model),
# Get observation-level values
observations = augment(model)
)
$model
# A tibble: 1 x 7
null.deviance df.null logLik AIC BIC deviance df.residual
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
1 18850. 345 -6425. 12864. 12891. 11529. 339
$coefficients
# A tibble: 7 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.09 0.0279 146. 0.
2 genderfemale 0.374 0.0212 17.6 2.18e- 69
3 income($25k,$55k] -0.0199 0.0267 -0.746 4.56e- 1
4 income($55k,$95k] -0.581 0.0343 -16.9 3.28e- 64
5 income($95k,$Inf) -0.578 0.0310 -18.7 6.88e- 78
6 travel(0.25h,4h] -0.627 0.0217 -28.8 5.40e-183
7 travel(4h,Infh) -2.42 0.0492 -49.3 0.
$observations
# A tibble: 346 x 12
.rownames n_visits gender income travel .fitted .se.fit .resid .hat
<chr> <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 25 2 female ($95k~ (4h,I~ 1.46 0.0502 -1.24 0.0109
2 26 1 female ($95k~ (4h,I~ 1.46 0.0502 -1.92 0.0109
3 27 1 male ($95k~ (0.25~ 2.88 0.0269 -5.28 0.0129
4 29 1 male ($95k~ (4h,I~ 1.09 0.0490 -1.32 0.00711
5 30 1 female ($55k~ (4h,I~ 1.46 0.0531 -1.92 0.0121
6 31 1 male [$0,$~ [0h,0~ 4.09 0.0279 -10.4 0.0465
7 33 80 female [$0,$~ [0h,0~ 4.46 0.0235 -0.710 0.0479
8 34 104 female ($95k~ [0h,0~ 3.88 0.0261 6.90 0.0332
9 35 55 male ($25k~ (0.25~ 3.44 0.0222 3.85 0.0153
10 36 350 female ($25k~ [0h,0~ 4.44 0.0206 21.5 0.0360
# ... with 336 more rows, and 3 more variables: .sigma <dbl>, .cooksd <dbl>,
# .std.resid <dbl>
Magnificent multi-assignment! Returning many values is as easy as collecting them into a list. The groomed model has data frames that are easy to program against.
# From previous step
groom_model <- function(model) {
list(
model = glance(model),
coefficients = tidy(model),
observations = augment(model)
)
}
# Call groom_model on model, assigning to 3 variables
c(mdl, cff, obs) %<-% groom_model(model)
# See these individual variables
mdl; cff; obs
Environments
> # From previous step
> rsa_lst <- list(
capitals = capitals,
national_parks = national_parks,
population = population
)
>
> # Convert the list to an environment
> rsa_env <- list2env(rsa_lst )
>
> # List the structure of each variable
> ls.str(rsa_env)
capitals : Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 2 variables:
$ city : chr "Cape Town" "Bloemfontein" "Pretoria"
$ type_of_capital: chr "Legislative" "Judicial" "Administrative"
national_parks : chr [1:22] "Addo Elephant National Park" "Agulhas National Park" ...
population : Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900
查看父类的环境变量
> # From previous steps
> rsa_lst <- list(
capitals = capitals,
national_parks = national_parks,
population = population
)
> rsa_env <- list2env(rsa_lst)
>
> # Find the parent environment of rsa_env
> parent <- parent.env(rsa_env)
>
> # Print its name
> environmentName(parent )
[1] "R_GlobalEnv"
exists
查看是否定义了一个变量exists packages
> # Does population exist in rsa_env?
> exists("population", envir = rsa_env)
[1] TRUE
>
> # Does population exist in rsa_env, ignoring inheritance?
> exists("population", envir = rsa_env,inherits=FALSE)
[1] FALSE
局部变量和全局变量 这个解释可以参考之前的python学习笔记python 函数学习笔记
这里个人觉得编程语言是在某种程度上面是相通的,还是认真的学习好一门吧。不然最后都会放弃了,加油
单位换算
# Write a function to convert sq. meters to hectares
sq_meters_to_hectares <- function(sq_meters) {
sq_meters / 10000
}
# Write a function to convert acres to sq. yards
acres_to_sq_yards <- function(acres) {
4840 * acres
}
# Write a function to convert acres to sq. yards
acres_to_sq_yards <- function(acres) {
4840 * acres
}
常见农业单位换算
- There are 4840 square yards in an acre.
- There are 36 inches in a yard and one inch is 0.0254 meters.
- There are 10000 square meters in a hectare.
# Write a function to convert sq. yards to sq. meters
sq_yards_to_sq_meters <- function(sq_yards) {
sq_yards %>%
# Take the square root
sqrt() %>%
# Convert yards to meters
yards_to_meters() %>%
# Square it
raise_to_power(2)
}
单位换算有可以直接用的函数
# Load fns defined in previous steps
load_step3()
# Write a function to convert bushels to kg
bushels_to_kgs <- function(bushels, crop) {
bushels %>%
# Convert bushels to lbs for this crop
bushels_to_lbs(crop) %>%
# Convert lbs to kgs
lbs_to_kgs()
}
# Wrap this code into a function
fortify_with_metric_units <- function(data, crop) {
data %>%
mutate(
farmed_area_ha = acres_to_hectares(farmed_area_acres),
yield_kg_per_ha = bushels_per_acre_to_kgs_per_hectare(
yield_bushels_per_acre,
crop = crop
)
)
}
# Try it on the wheat dataset
fortify_with_metric_units(wheat, crop = "wheat")
绘图
# Using corn, plot yield (kg/ha) vs. year
> ggplot(corn, aes(year, yield_kg_per_ha)) +
# Add a line layer, grouped by state
geom_line(aes(group = state)) +
# Add a smooth trend layer
geom_smooth()
inner_join()
进行函数合并简书
inner_join连接后的记录数等于”共有的记录数,也就是求交集,作用和merge的作用是一样的
# Wrap this code into a function
fortify_with_census_region<-function(data){
data %>%
inner_join(usa_census_regions, by = "state")
}
# Try it on the wheat dataset
fortify_with_census_region(wheat)
# Wrap this code into a function
plot_yield_vs_year_by_region<-function(data){
plot_yield_vs_year(data) +
facet_wrap(vars(census_region))
}
# Try it on the wheat dataset
plot_yield_vs_year_by_region(wheat)
Modeling grain yields
# Wrap the model code into a function
run_gam_yield_vs_year_by_region<-function(data){
gam(yield_kg_per_ha ~ s(year) + census_region, data = corn)
}
# Try it on the wheat dataset
run_gam_yield_vs_year_by_region(wheat)


# Make predictions in 2050
predict_this <- data.frame(
year = 2050,
census_region = census_regions
)
# Predict the yield
pred_yield_kg_per_ha <- predict(corn_model, predict_this, type = "response")
predict_this %>%
# Add the prediction as a column of predict_this
mutate(pred_yield_kg_per_ha = pred_yield_kg_per_ha)
reference
Introduction to Writing Functions in R的更多相关文章
- 【翻译】Awesome R资源大全中文版来了,全球最火的R工具包一网打尽,超过300+工具,还在等什么?
0.前言 虽然很早就知道R被微软收购,也很早知道R在统计分析处理方面很强大,开始一直没有行动过...直到 直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和 ...
- R统计分析处理
[翻译]Awesome R资源大全中文版来了,全球最火的R工具包一网打尽,超过300+工具,还在等什么? 阅读目录 0.前言 1.集成开发环境 2.语法 3.数据操作 4.图形显示 5.HTML部件 ...
- R工具包一网打尽
这里有很多非常不错的R包和工具. 该想法来自于awesome-machine-learning. 这里是包的导航清单,看起来更方便 >>>导航清单 通过这些翻译了解这些工具包,以后干 ...
- Introduction to dnorm, pnorm, qnorm, and rnorm for new biostatisticians
原文:Introduction todnorm,pnorm,qnorm, andrnormfor new biostatisticians Today I was in Dan’s office ho ...
- Analytic Functions in Oracle
Contents Overview and IntroductionHow Analytic Functions WorkThe SyntaxExamplesCalculate a running T ...
- R工具包
直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和爬一些R的资料,看着看着就入迷了,这就是个大宝库了,以前怎么没发现,看来还是太狭隘了.直到前几天我看到这个 ...
- R2—《R in Nutshell》 读书笔记(连载)
R in Nutshell 前言 例子(nutshell包) 本书中的例子包括在nutshell的R包中,使用数据,需加载nutshell包 install.packages("nutshe ...
- a note of R software write Function
Functionals “To become significantly more reliable, code must become more transparent. In particular ...
- r testifying that your code will behave as you intend.
https://github.com/stretchr/testify Testify - Thou Shalt Write Tests Go code (golang) set of pack ...
随机推荐
- imx6ull+debian10 构建静态qt交叉编译环境
# PC:Ubuntu18.04# GCC: arm-linux-gnueabihf- 7.4.0# 开发板:# imx6ull+debian10(文件系统为野火提供的) # 此处需要编译的是带gst ...
- (未完成)【Android】MVP模式初见(一)
最近在阅读郭霖大神的公众号时,分类中架构引起了我的注意. 虽然是个人开发(水平很菜的那种),但最终都要向企业正式项目开发靠近.因此接下来一段时间,主要学习一下MVP架构.Retrofit以及RxJav ...
- 清北学堂—2020.1提高储备营—Day 3(图论初步(二))
qbxt Day 3 --2020.1.19 济南 主讲:李奥 目录一览 1.图论(kruskal算法,最短路径算法,拓扑排序) 总知识点:图论 一.kruskal算法 1.目的:求图的最小生成树 2 ...
- Java如何优雅地使用close()?
注:本文出自博主 Chloneda:个人博客 | 博客园 | Github | Gitee | 知乎 本文源链接:https://www.cnblogs.com/chloneda/p/java-clo ...
- 《C++Primer》第五版习题答案--第六章【学习笔记】
<C++Primer>第五版习题答案--第六章[学习笔记] ps:答案是个人在学习过程中书写,可能存在错漏之处,仅作参考. 作者:cosefy Date: 2020/1/16 第六章:函数 ...
- java设计模式学习笔记--单一职责原则
单一职责原则注意事项和细节 1.降低类的复杂度,一个类只负责一项职责 2.提高可读性,可维护性 3.降低变更引起的风险 4.通常情况下,我们应当遵守单一职责原则,只有逻辑足够简单,才可以在代码级违反单 ...
- 剑指offer-面试题55-平衡二叉树-递归
/* 题目: 判断二叉树是否为平衡二叉树. */ /* 思路: 判断二叉树的左子树和右子树高度相差是否为1. */ #include<iostream> #include<cstri ...
- 安装MongoDB到CentOS(YUM)
运行环境 系统版本:CentOS Linux release 7.3.1611 (Core) 软件版本:mongodb-org-4.0.8 硬件要求:无 安装过程 1.配置YUM-Mongodb存储库 ...
- 洛谷题解 P1592 【互质】
原题传送门 题目描述 输入两个正整数n和k,求与n互质的第k个正整数. 输入格式 仅一行,为两个正整数n(≤10^6)和k(≤10^8). 输出格式 一个正整数,表示与n互质的第k个正整数. 输入输出 ...
- laravel orWhere
场景描述 需要根据商品分类.商品名称和条形码这三个查询条件去取出结果集,这三者的关系是:商品分类 and (商品名称 or 条形码) 错误尝试 第一次写出来的代码是这样的: $goodsModel = ...