In this blog post I share some lesser-known (at least I believe they are) tricks that use mainly functions from dplyr.

Removing unneeded columns

Did you know that you can use - in front of a column name to remove it from a data frame?

mtcars %>%
select(-disp) %>%
head()
##                    mpg cyl  hp drat    wt  qsec vs am gear carb
## Mazda RX4 21.0 6 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 105 2.76 3.460 20.22 1 0 3 1

Re-ordering columns

Still using select(), it is easy te re-order columns in your data frame:

mtcars %>%
select(cyl, disp, hp, everything()) %>%
head()
##                   cyl disp  hp  mpg drat    wt  qsec vs am gear carb
## Mazda RX4 6 160 110 21.0 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 6 160 110 21.0 3.90 2.875 17.02 0 1 4 4
## Datsun 710 4 108 93 22.8 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 6 258 110 21.4 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 8 360 175 18.7 3.15 3.440 17.02 0 0 3 2
## Valiant 6 225 105 18.1 2.76 3.460 20.22 1 0 3 1

As its name implies everything() simply means all the other columns.

Renaming columns with rename()

mtcars <- rename(mtcars, spam_mpg = mpg)
mtcars <- rename(mtcars, spam_disp = disp)
mtcars <- rename(mtcars, spam_hp = hp) head(mtcars)
##                   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
## gear carb
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Datsun 710 4 1
## Hornet 4 Drive 3 1
## Hornet Sportabout 3 2
## Valiant 3 1

Selecting columns with a regexp

It is easy to select the columns that start with “spam” with some helper functions:

mtcars %>%
select(contains("spam")) %>%
head()
##                   spam_mpg spam_disp spam_hp
## Mazda RX4 21.0 160 110
## Mazda RX4 Wag 21.0 160 110
## Datsun 710 22.8 108 93
## Hornet 4 Drive 21.4 258 110
## Hornet Sportabout 18.7 360 175
## Valiant 18.1 225 105

take also a look at starts_with()ends_with()contains()matches()num_range()one_of() and everything().

Create new columns with mutate() and if_else()

mtcars %>%
mutate(vs_new = if_else(
vs == 1,
"one",
"zero",
NA_character_)) %>%
head()
##   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am gear carb vs_new
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 zero
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 zero
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 one
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 one
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 zero
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 one

You might want to create a new variable conditionally on several values of another column:

mtcars %>%
mutate(carb_new = case_when(.$carb == 1 ~ "one",
.$carb == 2 ~ "two",
.$carb == 4 ~ "four",
TRUE ~ "other")) %>%
head(15)
##    spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am gear carb
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## carb_new
## 1 four
## 2 four
## 3 one
## 4 one
## 5 two
## 6 one
## 7 four
## 8 two
## 9 two
## 10 four
## 11 four
## 12 other
## 13 other
## 14 other
## 15 four

Mind the .$ before the variable carb. There is a github issue about this, and it is already fixed in the development version of dplyr, which means that in the next version of dplyrcase_when() will work as any other specialized dplyr function inside mutate().

Apply a function to certain columns only, by rows

mtcars %>%
select(am, gear, carb) %>%
purrr::by_row(sum, .collate = "cols", .to = "sum_am_gear_carb") -> mtcars2
head(mtcars2)
## # A tibble: 6 × 4
## am gear carb sum_am_gear_carb
## <dbl> <dbl> <dbl> <dbl>
## 1 1 4 4 9
## 2 1 4 4 9
## 3 1 4 1 6
## 4 0 3 1 4
## 5 0 3 2 5
## 6 0 3 1 4

For this, I had to use purrr’s by_row() function. You can then add this column to your original data frame:

mtcars <- cbind(mtcars, "sum_am_gear_carb" = mtcars2$sum_am_gear_carb)
head(mtcars)
##                   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
## gear carb sum_am_gear_carb
## Mazda RX4 4 4 9
## Mazda RX4 Wag 4 4 9
## Datsun 710 4 1 6
## Hornet 4 Drive 3 1 4
## Hornet Sportabout 3 2 5
## Valiant 3 1 4

Use do() to do any arbitrary operation

mtcars %>%
group_by(cyl) %>%
do(models = lm(spam_mpg ~ drat + wt, data = .)) %>%
broom::tidy(models)
## Source: local data frame [9 x 6]
## Groups: cyl [3]
##
## cyl term estimate std.error statistic p.value
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 4 (Intercept) 33.2493403 17.0987286 1.9445504 0.087727622
## 2 4 drat 1.3244329 3.4519717 0.3836743 0.711215433
## 3 4 wt -5.2400608 2.2150213 -2.3656932 0.045551615
## 4 6 (Intercept) 30.6544931 7.5141648 4.0795609 0.015103868
## 5 6 drat -0.4435744 1.1740862 -0.3778039 0.724768945
## 6 6 wt -2.9902720 1.5685053 -1.9064468 0.129274249
## 7 8 (Intercept) 29.6519180 7.0878976 4.1834574 0.001527613
## 8 8 drat -1.4698722 1.6285054 -0.9025897 0.386081744
## 9 8 wt -2.4518017 0.7985112 -3.0704664 0.010651044

do() is useful when you want to use any R function (user defined functions work too!) with dplyr functions. First I grouped the observations by cyl and then ran a linear model for each group. Then I converted the output to a tidy data frame usingbroom::tidy().

Using dplyr() functions inside your own functions

extract_vars <- function(data, some_string){

  data %>%
select_(lazyeval::interp(~contains(some_string))) -> data return(data)
} extract_vars(mtcars, "spam")
##                     spam_mpg spam_disp spam_hp
## Mazda RX4 21.0 160.0 110
## Mazda RX4 Wag 21.0 160.0 110
## Datsun 710 22.8 108.0 93
## Hornet 4 Drive 21.4 258.0 110
## Hornet Sportabout 18.7 360.0 175
## Valiant 18.1 225.0 105
## Duster 360 14.3 360.0 245
## Merc 240D 24.4 146.7 62
## Merc 230 22.8 140.8 95
## Merc 280 19.2 167.6 123
## Merc 280C 17.8 167.6 123
## Merc 450SE 16.4 275.8 180
## Merc 450SL 17.3 275.8 180
## Merc 450SLC 15.2 275.8 180
## Cadillac Fleetwood 10.4 472.0 205
## Lincoln Continental 10.4 460.0 215
## Chrysler Imperial 14.7 440.0 230
## Fiat 128 32.4 78.7 66
## Honda Civic 30.4 75.7 52
## Toyota Corolla 33.9 71.1 65
## Toyota Corona 21.5 120.1 97
## Dodge Challenger 15.5 318.0 150
## AMC Javelin 15.2 304.0 150
## Camaro Z28 13.3 350.0 245
## Pontiac Firebird 19.2 400.0 175
## Fiat X1-9 27.3 79.0 66
## Porsche 914-2 26.0 120.3 91
## Lotus Europa 30.4 95.1 113
## Ford Pantera L 15.8 351.0 264
## Ferrari Dino 19.7 145.0 175
## Maserati Bora 15.0 301.0 335
## Volvo 142E 21.4 121.0 109

About this last point, you can read more about it here.

Hope you liked this small list of tricks!

转自:http://www.brodrigues.co/blog/2017-02-17-lesser_known_tricks/

Lesser known dplyr tricks的更多相关文章

  1. Lesser known purrr tricks

    purrr is package that extends R's functional programming capabilities. It brings a lot of new stuff ...

  2. R语言数据处理包dplyr、tidyr笔记

    dplyr包是Hadley Wickham的新作,主要用于数据清洗和整理,该包专注dataframe数据格式,从而大幅提高了数据处理速度,并且提供了与其它数据库的接口:tidyr包的作者是Hadley ...

  3. testng 教程之使用参数的一些tricks配合使用reportng

    前两次的总结:testng annotation生命周期 http://www.cnblogs.com/tobecrazy/p/4579414.html testng.xml的使用和基本配置http: ...

  4. (转) How to Train a GAN? Tips and tricks to make GANs work

    How to Train a GAN? Tips and tricks to make GANs work 转自:https://github.com/soumith/ganhacks While r ...

  5. R语言数据处理利器——dplyr简介

    dplyr是由Hadley Wickham主持开发和维护的一个主要针对数据框快速计算.整合的函数包,同时提供一些常用函数的高速写法以及几个开源数据库的连接.此包是plyr包的深化功能包,其名字中的字母 ...

  6. Matlab tips and tricks

    matlab tips and tricks and ... page overview: I created this page as a vectorization helper but it g ...

  7. dplyr包--数据操作与清洗

    1.简介 在我们数据分析的实际应用中,我们可能会花费大量的时间在数据清洗上,而如果使用 R 里面自带的一些函数(base 包的 transform 等),可能会觉得力不从心,或者不是很人性化.好在我们 ...

  8. LoadRunner AJAX TruClient协议Tips and Tricks

    LoadRunner AJAX TruClient协议Tips and Trickshttp://automationqa.com/forum.php?mod=viewthread&tid=2 ...

  9. 【翻译】C# Tips & Tricks: Weak References - When and How to Use Them

    原文:C# Tips & Tricks: Weak References - When and How to Use Them Sometimes you have an object whi ...

随机推荐

  1. 1135: 零起点学算法42——多组测试数据(求和)IV

    1135: 零起点学算法42--多组测试数据(求和)IV Time Limit: 1 Sec  Memory Limit: 64 MB   64bit IO Format: %lldSubmitted ...

  2. 用 Python 编写剪刀、石头、布的小游戏(快速学习python语句)

    import random#定义手势类型allList = ['石头','剪刀','布']#定义获胜的情况winList = [['石头','剪刀'],['剪刀','布'],['步','石头']]pr ...

  3. POI框架实现创建Excel表、添加数据、读取数据

    public class TestPOI2Excel {//创建2003版本Excel用此方法 @Test public void testWrite03Excel() throws Exceptio ...

  4. tmux配置

    bind k selectp -U bind j selectp -D bind h selectp -L bind l selectp -R bind -r ^k resizep -U 5 bind ...

  5. 初学Java scirpt(判断、循环语句)

    在编写代码时,我们经常需要为不同的判断结果来执行不同的动作以及需要反复执行同一段代码,这时我们就需要使用判断和循环语句来实现. 1.判断语句(if) 判断语句经常用的有(if......else).( ...

  6. (转)什么是P问题、NP问题和NPC问题

    这或许是众多OIer最大的误区之一.    你会经常看到网上出现"这怎么做,这不是NP问题吗"."这个只有搜了,这已经被证明是NP问题了"之类的话.你要知道,大 ...

  7. 测试页面,页面里边一次加载50张不同的图片,每张5M以上,查看浏览器的内存使用情况

    测试页面 1.需要你写个测试页面,页面里边一次加载50张不同的图片,每张5M,查看浏览器的内存使用情况 2.可以10张 递增的方式测试 3.图片需要缩放,比如所有图片缩放成600*800的比例 目的 ...

  8. CSS开发框架技术OOCSS编写和管理CSS的方法

    目前最流行的CSS开发框架技术当属OOCSS,尽管还有其他类似技术(如BEM).这些方法试图对CSS采用面向对象的编程原则.样式语言与面向对象的设计原则在概念之间存在一定的问题.欠缺经验的人员可能不会 ...

  9. 玩转SSH端口转发

    SSH有三种端口转发模式,本地端口转发(Local Port Forwarding),**远程端口转发(Local Port Forwarding)**以及**动态端口转发(Dynamic Port ...

  10. 修改nagios密码和遇到的问题

    htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin 密码 密码 service httpd restart 由于本屌丢掉一个s使/ ...