在R中编写函数

在R中编写函数

为啥要用函数

为了代码更加可读，减少重复代码==lazy

You can type less code, saving effort and making your analyses more readable.

You can reuse your code from project to project.

所有的可重复执行的操作均可以放到函数里面

in R function are a type of variable ,so you assign functions using left arrow,just like anything else

args（函数名）

查询函数的参数,可以等同于help

> args(rank)

function (x, na.last = TRUE, ties.method = c("average", "first",

    "last", "random", "max", "min"))

创建一个函数的步骤

# Update the function so heads have probability p_head

toss_coin <- function(n_flips,p_head) {

  coin_sides <- c("head", "tail")

  # Define a vector of weights

  weights <- c(p_head,1-p_head)

  # Modify the sampling to be weighted

  sample(coin_sides, n_flips, replace = TRUE,prob=weights)

}

# Generate 10 coin tosses

toss_coin(10,0.8)

think of function as verbs

1.default args

在定义函数的时候就已经赋值了

cut_by_quantile <- function(x, n=5, na.rm, labels, interval_type) {

  probs <- seq(0, 1, length.out = n + 1)

  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)

  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)

  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)

}

# Remove the n argument from the call

cut_by_quantile(

  n_visits, 

  na.rm = FALSE,

  labels = c("very low", "low", "medium", "high", "very high"),

  interval_type = "(lo, hi]"

)

n是默认参数

若默认参数存在了，那么调用的时候只给形参传值好了

# Set the categories for interval_type to "(lo, hi]" and "[lo, hi)"

cut_by_quantile <- function(x, n = 5, na.rm = FALSE, labels = NULL,

                            interval_type = c("(lo, hi]", "[lo, hi)")) {

  # Match the interval_type argument

  interval_type <- match.arg(interval_type)

  probs <- seq(0, 1, length.out = n + 1)

  qtiles <- quantile(x, probs, na.rm = na.rm, names = FALSE)

  right <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)

  cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)

}

# Remove the interval_type argument from the call

cut_by_quantile(n_visits)

Passing arguments between functions

# From previous steps

get_reciprocal <- function(x) {

  1 / x

}

calc_harmonic_mean <- function(x) {

  x %>%

    get_reciprocal() %>%

    mean() %>%

    get_reciprocal()

}

std_and_poor500 %>%

  # Group by sector

  group_by(sector) %>%

  # Summarize, calculating harmonic mean of P/E ratio

  summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio))

calc_harmonic_mean <- function(x, ...) {

    x %>%

      get_reciprocal() %>%

      mean(...) %>%

      get_reciprocal()

  }

>

> std_and_poor500 %>%

    # Group by sector

    group_by(sector) %>%

    # Summarize, calculating harmonic mean of P/E ratio

    summarize(hmean_pe_ratio = calc_harmonic_mean(pe_ratio,na.rm=TRUE))

# A tibble: 11 x 2

   sector                 hmean_pe_ratio

   <chr>                           <dbl>

 1 Communication Services           17.5

 2 Consumer Discretionary           15.2

 3 Consumer Staples                 19.8

 4 Energy                           13.7

 5 Financials                       12.9

 6 Health Care                      26.6

 7 Industrials                      18.2

 8 Information Technology           21.6

 9 Materials                        16.3

10 Real Estate                      32.5

11 Utilities                        23.9

Checking arguments

You can use the assert_*() functions from assertive to check inputs and throw errors when they fail.

calc_harmonic_mean <- function(x, na.rm = FALSE) {

  # Assert that x is numeric

  assert_is_numeric(x)

  x %>%

    get_reciprocal() %>%

    mean(na.rm = na.rm) %>%

    get_reciprocal()

}

弹出报错

calc_harmonic_mean <- function(x, na.rm = FALSE) {

  assert_is_numeric(x)

  # Check if any values of x are non-positive

  if(any(is_non_positive(x), na.rm = TRUE)) {

    # Throw an error

    stop("x contains non-positive values, so the harmonic mean makes no sense.")

  }

  x %>%

    get_reciprocal() %>%

    mean(na.rm = na.rm) %>%

    get_reciprocal()

}

# See what happens when you pass it negative numbers

calc_harmonic_mean(std_and_poor500$pe_ratio - 20)

固定函数参数

一直存在的且默认

Returning values from functions

Sometimes, you don't need to run through the whole body of a function to get the answer. In that case you can return early from that function using return().

To check if x is divisible by n, you can use is_divisible_by(x, n) from assertive.

Alternatively, use the modulo operator, %%. x %% n gives the remainder when dividing x by n, so x %% n == 0 determines whether x is divisible by n. Try 1:10 %% 3 == 0 in the console.

To solve this exercise, you need to know that a leap year is every 400th year (like the year 2000) or every 4th year that isn't a century (like 1904 but not 1900 or 1905).

is_leap_year <- function(year) {

  # If year is div. by 400 return TRUE

  if(is_divisible_by(year, 400)) {

    return(TRUE)

  }

  # If year is div. by 100 return FALSE

  if(is_divisible_by(year, 100)) {

    return(FALSE)

  }

  # If year is div. by 4 return TRUE

  if(is_divisible_by(year, 4)) {

    return(TRUE)

  }

  # Otherwise return FALSE

  FALSE

}

# Define a pipeable plot fn with data and formula args

pipeable_plot <- function(data, formula) {

  # Call plot() with the formula interface

  plot(formula, data)

  # Invisibly return the input dataset

  invisible(data)

}

# Draw the scatter plot of dist vs. speed again

plt_dist_vs_speed <- cars %>%

  pipeable_plot(dist ~ speed)

# Now the plot object has a value

plt_dist_vs_speed

augments

Given an R statistical model or other non-tidy object, add columns to the original dataset such as predictions, residuals and cluster assignments.

> # Use broom tools to get a list of 3 data frames

> list(

    # Get model-level values

    model = glance(model),

    # Get coefficient-level values

    coefficients = tidy(model),

    # Get observation-level values

    observations = augment(model)

  )

$model

# A tibble: 1 x 7

  null.deviance df.null logLik    AIC    BIC deviance df.residual

          <dbl>   <int>  <dbl>  <dbl>  <dbl>    <dbl>       <int>

1        18850.     345 -6425. 12864. 12891.   11529.         339

$coefficients

# A tibble: 7 x 5

  term              estimate std.error statistic   p.value

  <chr>                <dbl>     <dbl>     <dbl>     <dbl>

1 (Intercept)         4.09      0.0279   146.    0.

2 genderfemale        0.374     0.0212    17.6   2.18e- 69

3 income($25k,$55k]  -0.0199    0.0267    -0.746 4.56e-  1

4 income($55k,$95k]  -0.581     0.0343   -16.9   3.28e- 64

5 income($95k,$Inf)  -0.578     0.0310   -18.7   6.88e- 78

6 travel(0.25h,4h]   -0.627     0.0217   -28.8   5.40e-183

7 travel(4h,Infh)    -2.42      0.0492   -49.3   0.       

$observations

# A tibble: 346 x 12

   .rownames n_visits gender income travel .fitted .se.fit  .resid    .hat

   <chr>        <dbl> <fct>  <fct>  <fct>    <dbl>   <dbl>   <dbl>   <dbl>

 1 25               2 female ($95k~ (4h,I~    1.46  0.0502  -1.24  0.0109

 2 26               1 female ($95k~ (4h,I~    1.46  0.0502  -1.92  0.0109

 3 27               1 male   ($95k~ (0.25~    2.88  0.0269  -5.28  0.0129

 4 29               1 male   ($95k~ (4h,I~    1.09  0.0490  -1.32  0.00711

 5 30               1 female ($55k~ (4h,I~    1.46  0.0531  -1.92  0.0121

 6 31               1 male   [$0,$~ [0h,0~    4.09  0.0279 -10.4   0.0465

 7 33              80 female [$0,$~ [0h,0~    4.46  0.0235  -0.710 0.0479

 8 34             104 female ($95k~ [0h,0~    3.88  0.0261   6.90  0.0332

 9 35              55 male   ($25k~ (0.25~    3.44  0.0222   3.85  0.0153

10 36             350 female ($25k~ [0h,0~    4.44  0.0206  21.5   0.0360

# ... with 336 more rows, and 3 more variables: .sigma <dbl>, .cooksd <dbl>,

#   .std.resid <dbl>

Magnificent multi-assignment! Returning many values is as easy as collecting them into a list. The groomed model has data frames that are easy to program against.

# From previous step

groom_model <- function(model) {

  list(

    model = glance(model),

    coefficients = tidy(model),

    observations = augment(model)

  )

}

# Call groom_model on model, assigning to 3 variables

c(mdl, cff, obs) %<-% groom_model(model)

# See these individual variables

mdl; cff; obs

Environments

> # From previous step

> rsa_lst <- list(

    capitals = capitals,

    national_parks = national_parks,

    population = population

  )

>

> # Convert the list to an environment

> rsa_env <- list2env(rsa_lst )

>

> # List the structure of each variable

> ls.str(rsa_env)

capitals : Classes 'tbl_df', 'tbl' and 'data.frame':	3 obs. of  2 variables:

 $ city           : chr  "Cape Town" "Bloemfontein" "Pretoria"

 $ type_of_capital: chr  "Legislative" "Judicial" "Administrative"

national_parks :  chr [1:22] "Addo Elephant National Park" "Agulhas National Park" ...

population :  Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900

查看父类的环境变量

> # From previous steps

> rsa_lst <- list(

    capitals = capitals,

    national_parks = national_parks,

    population = population

  )

> rsa_env <- list2env(rsa_lst)

>

> # Find the parent environment of rsa_env

> parent <- parent.env(rsa_env)

>

> # Print its name

> environmentName(parent )

[1] "R_GlobalEnv"

exists

查看是否定义了一个变量exists packages

> # Does population exist in rsa_env?

> exists("population", envir = rsa_env)

[1] TRUE

>

> # Does population exist in rsa_env, ignoring inheritance?

> exists("population", envir = rsa_env,inherits=FALSE)

[1] FALSE

局部变量和全局变量这个解释可以参考之前的python学习笔记python 函数学习笔记

这里个人觉得编程语言是在某种程度上面是相通的，还是认真的学习好一门吧。不然最后都会放弃了，加油

单位换算

# Write a function to convert sq. meters to hectares

sq_meters_to_hectares <- function(sq_meters) {

  sq_meters / 10000

}

# Write a function to convert acres to sq. yards

acres_to_sq_yards <- function(acres) {

  4840 * acres

}

# Write a function to convert acres to sq. yards

acres_to_sq_yards <- function(acres) {

  4840 * acres

}

常见农业单位换算

There are 4840 square yards in an acre.
There are 36 inches in a yard and one inch is 0.0254 meters.
There are 10000 square meters in a hectare.

# Write a function to convert sq. yards to sq. meters

sq_yards_to_sq_meters <- function(sq_yards) {

  sq_yards %>%

    # Take the square root

    sqrt() %>%

    # Convert yards to meters

    yards_to_meters() %>%

    # Square it

    raise_to_power(2)

}

单位换算有可以直接用的函数

# Load fns defined in previous steps

load_step3()

# Write a function to convert bushels to kg

bushels_to_kgs <- function(bushels, crop) {

  bushels %>%

    # Convert bushels to lbs for this crop

    bushels_to_lbs(crop) %>%

    # Convert lbs to kgs

    lbs_to_kgs()

}

# Wrap this code into a function

fortify_with_metric_units <- function(data, crop) {

  data %>%

    mutate(

      farmed_area_ha = acres_to_hectares(farmed_area_acres),

      yield_kg_per_ha = bushels_per_acre_to_kgs_per_hectare(

        yield_bushels_per_acre,

        crop = crop

      )

    )

}

# Try it on the wheat dataset

fortify_with_metric_units(wheat, crop = "wheat")

绘图

# Using corn, plot yield (kg/ha) vs. year

> ggplot(corn, aes(year, yield_kg_per_ha)) +

    # Add a line layer, grouped by state

    geom_line(aes(group = state)) +

    # Add a smooth trend layer

    geom_smooth()

inner_join()

进行函数合并简书

inner_join连接后的记录数等于”共有的记录数,也就是求交集，作用和merge的作用是一样的

# Wrap this code into a function

fortify_with_census_region<-function(data){

  data %>%

    inner_join(usa_census_regions, by = "state")

}

# Try it on the wheat dataset

fortify_with_census_region(wheat)

# Wrap this code into a function

plot_yield_vs_year_by_region<-function(data){

  plot_yield_vs_year(data) +

    facet_wrap(vars(census_region))

}

# Try it on the wheat dataset

plot_yield_vs_year_by_region(wheat)

Modeling grain yields



# Wrap the model code into a function

run_gam_yield_vs_year_by_region<-function(data){

  gam(yield_kg_per_ha ~ s(year) + census_region, data = corn)

}

# Try it on the wheat dataset

run_gam_yield_vs_year_by_region(wheat)

# Make predictions in 2050

predict_this <- data.frame(

  year = 2050,

  census_region = census_regions

) 

# Predict the yield

pred_yield_kg_per_ha <- predict(corn_model, predict_this, type = "response")

predict_this %>%

  # Add the prediction as a column of predict_this

  mutate(pred_yield_kg_per_ha = pred_yield_kg_per_ha)

reference

Introduction to Writing Functions in R的更多相关文章

【翻译】Awesome R资源大全中文版来了，全球最火的R工具包一网打尽，超过300+工具，还在等什么？
0.前言虽然很早就知道R被微软收购,也很早知道R在统计分析处理方面很强大,开始一直没有行动过...直到直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和 ...
R统计分析处理
[翻译]Awesome R资源大全中文版来了,全球最火的R工具包一网打尽,超过300+工具,还在等什么? 阅读目录 0.前言 1.集成开发环境 2.语法 3.数据操作 4.图形显示 5.HTML部件 ...
R工具包一网打尽
这里有很多非常不错的R包和工具. 该想法来自于awesome-machine-learning. 这里是包的导航清单,看起来更方便 >>>导航清单通过这些翻译了解这些工具包,以后干 ...
Introduction to dnorm, pnorm, qnorm, and rnorm for new biostatisticians
原文:Introduction todnorm,pnorm,qnorm, andrnormfor new biostatisticians Today I was in Dan’s office ho ...
Analytic Functions in Oracle
Contents Overview and IntroductionHow Analytic Functions WorkThe SyntaxExamplesCalculate a running T ...
R工具包
直到12月初在微软技术大会,看到我软的工程师演示R的使用,我就震惊了,然后最近在网上到处了解和爬一些R的资料,看着看着就入迷了,这就是个大宝库了,以前怎么没发现,看来还是太狭隘了.直到前几天我看到这个 ...
R2—《R in Nutshell》读书笔记(连载)
R in Nutshell 前言例子(nutshell包) 本书中的例子包括在nutshell的R包中,使用数据,需加载nutshell包 install.packages("nutshe ...
a note of R software write Function
Functionals “To become significantly more reliable, code must become more transparent. In particular ...
r testifying that your code will behave as you intend.
https://github.com/stretchr/testify Testify - Thou Shalt Write Tests Go code (golang) set of pack ...

随机推荐

UESTC 1324 卿学姐与公主分块板子
#include<iostream> #include<cmath> using namespace std; ; //表示当前数在哪一块里面 int belong[maxn] ...
android中常用的布局管理器
Android中的几种常用的布局,主要介绍内容有: View视图 RelativeLayout 相对布局管理器 LinearLayout 线性布局管理器 FrameLayout ...
chorme输入框autocomplete(移动端)
输入框自动填充密码即使是type是text也别填充,尝试了 https://developer.mozilla.org/zh-CN/docs/Web/Security/Securing_your_si ...
Mac IDEA 2019.3.1下载及破解，可激活至2089年
背景目前IDEA已更新到2019.3.3,但是下载这个版本后使用目前网上的常见破解方法会出现各种问题.比如使用注册码方式提示license key is in legacy format,或者使用L ...
Enable VT-x in your BIOS security settings, ensure that HAXM is installed properly. Try disabling 3rd party security software if the problem still occurs
win7系统安装了了AndroidStudio3.5后用模拟器报错:Enable VT-x in your BIOS security settings, ensure that HAXM is in ...
Java-跳跃路线
题目: 小明参加了学校的趣味运动会,其中的一个项目是:跳格子.地上画着一些格子,每个格子里写一个字,如下所示: 从我做起振我做起振兴做起振兴中起振兴中华比赛时,先站在左上角的写着“从”字的格子里,可 ...
mybatis+maven自动生成代码框架
说明通过可配置化,通过数据库自动生成model,da和mapper文件,这对于可定制化开发来说是非常有用的,减少了很多重复的工作. 添加依赖 <properties> <proje ...
LAMP集成环境搭建
本教程是在centos6.8 64位系统下进行操作我们首先准备搭建LAMP环境 LAMP指的是Linux.Apache.MySQL 和 PHP 如果要安装PHP5.5 需要首先rpm -ivh h ...
树莓派查看ip地址（命令ifconfig）和退出ping
1.1树莓派查看ip地址用如下命令: ifconfig -a 结果如下图所示: 注意:树莓派查看ip地址是用命令ifconfig,而Windows的cmd命令查看ip地址是ipconfig.
Spring Batch 批处理原则与建议
Spring Batch 批处理原则与建议当我们构建一个批处理的过程时,必须注意以下原则: 通常情况下,批处理的过程对系统和架构的设计要够要求比较高,因此尽可能的使用通用架构来处理批量数据处理,降低 ...

Introduction to Writing Functions in R