Introduction to Writing Functions in R
在R中编写函数
为啥要用函数
为了代码更加可读,减少重复代码==lazy
You can type less code, saving effort and making your analyses more readable.
You can reuse your code from project to project.
所有的可重复执行的操作均可以放到函数里面
in R function are a type of variable ,so you assign functions using left arrow,just like anything else
args(函数名)
查询函数的参数,可以等同于help
> args(rank)
function (x, na.last = TRUE, ties.method = c("average", "first",
"last", "random", "max", "min"))
创建一个函数的步骤
# Update the function so heads have probability p_head
toss_coin <- function(n_flips,p_head) {
coin_sides <- c("head", "tail")
# Define a vector of weights
weights <- c(p_head,1-p_head)
# Modify the sampling to be weighted
sample(coin_sides, n_flips, replace = TRUE,prob=weights)
}
# Generate 10 coin tosses
toss_coin(10,0.8)
think of function as verbs
1.default args
在定义函数的时候就已经赋值了
cut_by_quantile <- function(x, n=5, na.rm, labels, interval_type) {
probs <- seq(0, 1, length.out = n + 1)
qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the n argument from the call
cut_by_quantile(
n_visits,
na.rm = FALSE,
labels = c("very low", "low", "medium", "high", "very high"),
interval_type = "(lo, hi]"
)
n是默认参数
若默认参数存在了,那么调用的时候只给形参传值好了
# Set the categories for interval_type to "(lo, hi]" and "[lo, hi)"
cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels = NULL,
interval_type = c("(lo, hi]", "[lo, hi)")) {
# Match the interval_type argument
interval_type <- match.arg(interval_type)
probs <- seq(0, 1, length.out = n + 1)
qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)
right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the interval_type argument from the call
cut_by_quantile(n_visits)
Passing arguments between functions
# From previous steps
get_reciprocal <- function(x) {
1 / x
}
calc_harmonic_mean <- function(x) {
x %>%
get_reciprocal() %>%
mean() %>%
get_reciprocal()
}
std_and_poor500 %>%
# Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio
summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio))
calc_harmonic_mean <- function(x, ...) {
x %>%
get_reciprocal() %>%
mean(...) %>%
get_reciprocal()
}
>
> std_and_poor500 %>%
# Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio
summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio,na.rm=TRUE))
# A tibble: 11 x 2
sector hmean_pe_ratio
<chr> <dbl>
1 Communication Services 17.5
2 Consumer Discretionary 15.2
3 Consumer Staples 19.8
4 Energy 13.7
5 Financials 12.9
6 Health Care 26.6
7 Industrials 18.2
8 Information Technology 21.6
9 Materials 16.3
10 Real Estate 32.5
11 Utilities 23.9
Checking arguments
You can use the assert_*() functions from assertive to check inputs and throw errors when they fail.
calc_harmonic_mean <- function(x, na.rm = FALSE) {
# Assert that x is numeric
assert_is_numeric(x)
x %>%
get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}
弹出报错
calc_harmonic_mean <- function(x, na.rm = FALSE) {
assert_is_numeric(x)
# Check if any values of x are non-positive
if(any(is_non_positive(x), na.rm = TRUE)) {
# Throw an error
stop("x contains non-positive values, so the harmonic mean makes no sense.")
}
x %>%
get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}
# See what happens when you pass it negative numbers
calc_harmonic_mean(std_and_poor500$pe_ratio - 20)
固定函数参数
一直存在的且默认
Returning values from functions
Sometimes, you don't need to run through the whole body of a function to get the answer. In that case you can return early from that function using return().
To check if x is divisible by n, you can use is_divisible_by(x, n) from assertive.
Alternatively, use the modulo operator, %%. x %% n gives the remainder when dividing x by n, so x %% n == 0 determines whether x is divisible by n. Try 1:10 %% 3 == 0 in the console.
To solve this exercise, you need to know that a leap year is every 400th year (like the year 2000) or every 4th year that isn't a century (like 1904 but not 1900 or 1905).
is_leap_year <- function(year) {
# If year is div. by 400 return TRUE
if(is_divisible_by(year, 400)) {
return(TRUE)
}
# If year is div. by 100 return FALSE
if(is_divisible_by(year, 100)) {
return(FALSE)
}
# If year is div. by 4 return TRUE
if(is_divisible_by(year, 4)) {
return(TRUE)
}
# Otherwise return FALSE
FALSE
}
# Define a pipeable plot fn with data and formula args
pipeable_plot <- function(data, formula) {
# Call plot() with the formula interface
plot(formula, data)
# Invisibly return the input dataset
invisible(data)
}
# Draw the scatter plot of dist vs. speed again
plt_dist_vs_speed <- cars %>%
pipeable_plot(dist ~ speed)
# Now the plot object has a value
plt_dist_vs_speed
augments
Given an R statistical model or other non-tidy object, add columns to the original dataset such as predictions, residuals and cluster assignments.
> # Use broom tools to get a list of 3 data frames
> list(
# Get model-level values
model = glance(model),
# Get coefficient-level values
coefficients = tidy(model),
# Get observation-level values
observations = augment(model)
)
$model
# A tibble: 1 x 7
null.deviance df.null logLik AIC BIC deviance df.residual
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
1 18850. 345 -6425. 12864. 12891. 11529. 339
$coefficients
# A tibble: 7 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.09 0.0279 146. 0.
2 genderfemale 0.374 0.0212 17.6 2.18e- 69
3 income($25k,$55k] -0.0199 0.0267 -0.746 4.56e- 1
4 income($55k,$95k] -0.581 0.0343 -16.9 3.28e- 64
5 income($95k,$Inf) -0.578 0.0310 -18.7 6.88e- 78
6 travel(0.25h,4h] -0.627 0.0217 -28.8 5.40e-183
7 travel(4h,Infh) -2.42 0.0492 -49.3 0.
$observations
# A tibble: 346 x 12
.rownames n_visits gender income travel .fitted .se.fit .resid .hat
<chr> <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 25 2 female ($95k~ (4h,I~ 1.46 0.0502 -1.24 0.0109
2 26 1 female ($95k~ (4h,I~ 1.46 0.0502 -1.92 0.0109
3 27 1 male ($95k~ (0.25~ 2.88 0.0269 -5.28 0.0129
4 29 1 male ($95k~ (4h,I~ 1.09 0.0490 -1.32 0.00711
5 30 1 female ($55k~ (4h,I~ 1.46 0.0531 -1.92 0.0121
6 31 1 male [$0,$~ [0h,0~ 4.09 0.0279 -10.4 0.0465
7 33 80 female [$0,$~ [0h,0~ 4.46 0.0235 -0.710 0.0479
8 34 104 female ($95k~ [0h,0~ 3.88 0.0261 6.90 0.0332
9 35 55 male ($25k~ (0.25~ 3.44 0.0222 3.85 0.0153
10 36 350 female ($25k~ [0h,0~ 4.44 0.0206 21.5 0.0360
# ... with 336 more rows, and 3 more variables: .sigma <dbl>, .cooksd <dbl>,
# .std.resid <dbl>
Magnificent multi-assignment! Returning many values is as easy as collecting them into a list. The groomed model has data frames that are easy to program against.
# From previous step
groom_model <- function(model) {
list(
model = glance(model),
coefficients = tidy(model),
observations = augment(model)
)
}
# Call groom_model on model, assigning to 3 variables
c(mdl, cff, obs) %<-% groom_model(model)
# See these individual variables
mdl; cff; obs
Environments
> # From previous step
> rsa_lst <- list(
capitals = capitals,
national_parks = national_parks,
population = population
)
>
> # Convert the list to an environment
> rsa_env <- list2env(rsa_lst )
>
> # List the structure of each variable
> ls.str(rsa_env)
capitals : Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 2 variables:
$ city : chr "Cape Town" "Bloemfontein" "Pretoria"
$ type_of_capital: chr "Legislative" "Judicial" "Administrative"
national_parks : chr [1:22] "Addo Elephant National Park" "Agulhas National Park" ...
population : Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900
查看父类的环境变量
> # From previous steps
> rsa_lst <- list(
capitals = capitals,
national_parks = national_parks,
population = population
)
> rsa_env <- list2env(rsa_lst)
>
> # Find the parent environment of rsa_env
> parent <- parent.env(rsa_env)
>
> # Print its name
> environmentName(parent )
[1] "R_GlobalEnv"
exists
查看是否定义了一个变量exists packages
> # Does population exist in rsa_env?
> exists("population", envir = rsa_env)
[1] TRUE
>
> # Does population exist in rsa_env, ignoring inheritance?
> exists("population", envir = rsa_env,inherits=FALSE)
[1] FALSE
局部变量和全局变量 这个解释可以参考之前的python学习笔记python 函数学习笔记
这里个人觉得编程语言是在某种程度上面是相通的,还是认真的学习好一门吧。不然最后都会放弃了,加油
单位换算
# Write a function to convert sq. meters to hectares
sq_meters_to_hectares <- function(sq_meters) {
sq_meters / 10000
}
# Write a function to convert acres to sq. yards
acres_to_sq_yards <- function(acres) {
4840 * acres
}
# Write a function to convert acres to sq. yards
acres_to_sq_yards <- function(acres) {
4840 * acres
}
常见农业单位换算
- There are 4840 square yards in an acre.
- There are 36 inches in a yard and one inch is 0.0254 meters.
- There are 10000 square meters in a hectare.
# Write a function to convert sq. yards to sq. meters
sq_yards_to_sq_meters <- function(sq_yards) {
sq_yards %>%
# Take the square root
sqrt() %>%
# Convert yards to meters
yards_to_meters() %>%
# Square it
raise_to_power(2)
}
单位换算有可以直接用的函数
# Load fns defined in previous steps
load_step3()
# Write a function to convert bushels to kg
bushels_to_kgs <- function(bushels, crop) {
bushels %>%
# Convert bushels to lbs for this crop
bushels_to_lbs(crop) %>%
# Convert lbs to kgs
lbs_to_kgs()
}
# Wrap this code into a function
fortify_with_metric_units <- function(data, crop) {
data %>%
mutate(
farmed_area_ha = acres_to_hectares(farmed_area_acres),
yield_kg_per_ha = bushels_per_acre_to_kgs_per_hectare(
yield_bushels_per_acre,
crop = crop
)
)
}
# Try it on the wheat dataset
fortify_with_metric_units(wheat, crop = "wheat")
绘图
# Using corn, plot yield (kg/ha) vs. year
> ggplot(corn, aes(year, yield_kg_per_ha)) +
# Add a line layer, grouped by state
geom_line(aes(group = state)) +
# Add a smooth trend layer
geom_smooth()
inner_join()
进行函数合并简书
inner_join连接后的记录数等于”共有的记录数,也就是求交集,作用和merge的作用是一样的
# Wrap this code into a function
fortify_with_census_region<-function(data){
data %>%
inner_join(usa_census_regions, by = "state")
}
# Try it on the wheat dataset
fortify_with_census_region(wheat)
# Wrap this code into a function
plot_yield_vs_year_by_region<-function(data){
plot_yield_vs_year(data) +
facet_wrap(vars(census_region))
}
# Try it on the wheat dataset
plot_yield_vs_year_by_region(wheat)
Modeling grain yields
# Wrap the model code into a function
run_gam_yield_vs_year_by_region<-function(data){
gam(yield_kg_per_ha ~ s(year) + census_region, data = corn)
}
# Try it on the wheat dataset
run_gam_yield_vs_year_by_region(wheat)
# Make predictions in 2050
predict_this <- data.frame(
year = 2050,
census_region = census_regions
)
# Predict the yield
pred_yield_kg_per_ha <- predict(corn_model, predict_this, type = "response")
predict_this %>%
# Add the prediction as a column of predict_this
mutate(pred_yield_kg_per_ha = pred_yield_kg_per_ha)
reference
Introduction to Writing Functions in R的更多相关文章
- 【翻译】Awesome R资源大全中文版来了,全球最火的R工具包一网打尽,超过300+工具,还在等什么?
0.前言 虽然很早就知道R被微软收购,也很早知道R在统计分析处理方面很强大,开始一直没有行动过...直到 直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和 ...
- R统计分析处理
[翻译]Awesome R资源大全中文版来了,全球最火的R工具包一网打尽,超过300+工具,还在等什么? 阅读目录 0.前言 1.集成开发环境 2.语法 3.数据操作 4.图形显示 5.HTML部件 ...
- R工具包一网打尽
这里有很多非常不错的R包和工具. 该想法来自于awesome-machine-learning. 这里是包的导航清单,看起来更方便 >>>导航清单 通过这些翻译了解这些工具包,以后干 ...
- Introduction to dnorm, pnorm, qnorm, and rnorm for new biostatisticians
原文:Introduction todnorm,pnorm,qnorm, andrnormfor new biostatisticians Today I was in Dan’s office ho ...
- Analytic Functions in Oracle
Contents Overview and IntroductionHow Analytic Functions WorkThe SyntaxExamplesCalculate a running T ...
- R工具包
直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和爬一些R的资料,看着看着就入迷了,这就是个大宝库了,以前怎么没发现,看来还是太狭隘了.直到前几天我看到这个 ...
- R2—《R in Nutshell》 读书笔记(连载)
R in Nutshell 前言 例子(nutshell包) 本书中的例子包括在nutshell的R包中,使用数据,需加载nutshell包 install.packages("nutshe ...
- a note of R software write Function
Functionals “To become significantly more reliable, code must become more transparent. In particular ...
- r testifying that your code will behave as you intend.
https://github.com/stretchr/testify Testify - Thou Shalt Write Tests Go code (golang) set of pack ...
随机推荐
- 【python基础语法】函数的作用域和内置函数和关键字(第7天课堂笔记)
一.函数的作用域: 1.全局变量 2.局部变量 3.global 二.内置函数 1.常见的内置函数 print : 输出 input : 输入 type : 查看数据类型 id : 获取数据的内存地址 ...
- 硬件知识整理part2--电阻在反馈网络中的应用
学而不厌,诲人不倦,不知老之将至.--孔子 电阻作为电路中基本的元器件之一.在电路设计中,我们有时会使用欧姆定律来大致估计一下电阻值的大小,但是大多时候我们是不用去过多考虑电阻值的大小,像我这样大方的 ...
- 【daily】Java枚举 - fastjson对enum的处理
目的 1.枚举值转换成完全的json: 2.对象中的枚举成员完全转换成json: 3.枚举类的全部值转换成json: 枚举定义 public enum SongsEnum { SAFE_AND_SOU ...
- C语言 杂货整理
C语言 杂货整理 #define _CRT_SECURE_NO_WARNINGS #include <stdio.h> #include <string.h> #include ...
- CLion快捷键设置
CLion自动格式化的快捷键默认为Ctrl+Alt+L,代码前后跳转CRTL+ALT+Left或CRTL+ALT+Right 但有时会和系统的快捷键冲突,需要修改CLion里面的快捷键 如图:sett ...
- MY_0003:设置界面显示单位
1,设置单位
- Spring Aop和Spring Ioc(一)
Spring Aop Aop: 面向切面编程的本质:面向切面编程,指扩展功能不修改源代码,将功能代码从业务逻辑代码中分离出来. 1:主要功能:日志记录,性能统计,安全控制,事务处理,异常处理等等. 2 ...
- TChart-数据编辑(TChartListBox,TeeCommander)
先上图 功能代码: unit Unit1; interface uses Windows, Messages, SysUtils, Variants, Classes, Graphics, Contr ...
- JAVA->查询并显示输入根目录下全部的文件所在目录路径
public static boolean qf(File f,boolean a){ boolean b=false; if(a==true){ File[] fl=f.li ...
- open()和with open()的区别
open 1,打开文件 file=open("文件名",“读写模式”) 2,操作文件 *** 3,关闭文件 file.close() 注意事项: 使用open方法,文件操作完毕之后 ...