In this blog post I share some lesser-known (at least I believe they are) tricks that use mainly functions from dplyr.

Removing unneeded columns

Did you know that you can use - in front of a column name to remove it from a data frame?

mtcars %>%
select(-disp) %>%
head()
##                    mpg cyl  hp drat    wt  qsec vs am gear carb
## Mazda RX4 21.0 6 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 105 2.76 3.460 20.22 1 0 3 1

Re-ordering columns

Still using select(), it is easy te re-order columns in your data frame:

mtcars %>%
select(cyl, disp, hp, everything()) %>%
head()
##                   cyl disp  hp  mpg drat    wt  qsec vs am gear carb
## Mazda RX4 6 160 110 21.0 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 6 160 110 21.0 3.90 2.875 17.02 0 1 4 4
## Datsun 710 4 108 93 22.8 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 6 258 110 21.4 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 8 360 175 18.7 3.15 3.440 17.02 0 0 3 2
## Valiant 6 225 105 18.1 2.76 3.460 20.22 1 0 3 1

As its name implies everything() simply means all the other columns.

Renaming columns with rename()

mtcars <- rename(mtcars, spam_mpg = mpg)
mtcars <- rename(mtcars, spam_disp = disp)
mtcars <- rename(mtcars, spam_hp = hp) head(mtcars)
##                   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
## gear carb
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Datsun 710 4 1
## Hornet 4 Drive 3 1
## Hornet Sportabout 3 2
## Valiant 3 1

Selecting columns with a regexp

It is easy to select the columns that start with “spam” with some helper functions:

mtcars %>%
select(contains("spam")) %>%
head()
##                   spam_mpg spam_disp spam_hp
## Mazda RX4 21.0 160 110
## Mazda RX4 Wag 21.0 160 110
## Datsun 710 22.8 108 93
## Hornet 4 Drive 21.4 258 110
## Hornet Sportabout 18.7 360 175
## Valiant 18.1 225 105

take also a look at starts_with()ends_with()contains()matches()num_range()one_of() and everything().

Create new columns with mutate() and if_else()

mtcars %>%
mutate(vs_new = if_else(
vs == 1,
"one",
"zero",
NA_character_)) %>%
head()
##   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am gear carb vs_new
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 zero
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 zero
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 one
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 one
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 zero
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 one

You might want to create a new variable conditionally on several values of another column:

mtcars %>%
mutate(carb_new = case_when(.$carb == 1 ~ "one",
.$carb == 2 ~ "two",
.$carb == 4 ~ "four",
TRUE ~ "other")) %>%
head(15)
##    spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am gear carb
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## carb_new
## 1 four
## 2 four
## 3 one
## 4 one
## 5 two
## 6 one
## 7 four
## 8 two
## 9 two
## 10 four
## 11 four
## 12 other
## 13 other
## 14 other
## 15 four

Mind the .$ before the variable carb. There is a github issue about this, and it is already fixed in the development version of dplyr, which means that in the next version of dplyrcase_when() will work as any other specialized dplyr function inside mutate().

Apply a function to certain columns only, by rows

mtcars %>%
select(am, gear, carb) %>%
purrr::by_row(sum, .collate = "cols", .to = "sum_am_gear_carb") -> mtcars2
head(mtcars2)
## # A tibble: 6 × 4
## am gear carb sum_am_gear_carb
## <dbl> <dbl> <dbl> <dbl>
## 1 1 4 4 9
## 2 1 4 4 9
## 3 1 4 1 6
## 4 0 3 1 4
## 5 0 3 2 5
## 6 0 3 1 4

For this, I had to use purrr’s by_row() function. You can then add this column to your original data frame:

mtcars <- cbind(mtcars, "sum_am_gear_carb" = mtcars2$sum_am_gear_carb)
head(mtcars)
##                   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
## gear carb sum_am_gear_carb
## Mazda RX4 4 4 9
## Mazda RX4 Wag 4 4 9
## Datsun 710 4 1 6
## Hornet 4 Drive 3 1 4
## Hornet Sportabout 3 2 5
## Valiant 3 1 4

Use do() to do any arbitrary operation

mtcars %>%
group_by(cyl) %>%
do(models = lm(spam_mpg ~ drat + wt, data = .)) %>%
broom::tidy(models)
## Source: local data frame [9 x 6]
## Groups: cyl [3]
##
## cyl term estimate std.error statistic p.value
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 4 (Intercept) 33.2493403 17.0987286 1.9445504 0.087727622
## 2 4 drat 1.3244329 3.4519717 0.3836743 0.711215433
## 3 4 wt -5.2400608 2.2150213 -2.3656932 0.045551615
## 4 6 (Intercept) 30.6544931 7.5141648 4.0795609 0.015103868
## 5 6 drat -0.4435744 1.1740862 -0.3778039 0.724768945
## 6 6 wt -2.9902720 1.5685053 -1.9064468 0.129274249
## 7 8 (Intercept) 29.6519180 7.0878976 4.1834574 0.001527613
## 8 8 drat -1.4698722 1.6285054 -0.9025897 0.386081744
## 9 8 wt -2.4518017 0.7985112 -3.0704664 0.010651044

do() is useful when you want to use any R function (user defined functions work too!) with dplyr functions. First I grouped the observations by cyl and then ran a linear model for each group. Then I converted the output to a tidy data frame usingbroom::tidy().

Using dplyr() functions inside your own functions

extract_vars <- function(data, some_string){

  data %>%
select_(lazyeval::interp(~contains(some_string))) -> data return(data)
} extract_vars(mtcars, "spam")
##                     spam_mpg spam_disp spam_hp
## Mazda RX4 21.0 160.0 110
## Mazda RX4 Wag 21.0 160.0 110
## Datsun 710 22.8 108.0 93
## Hornet 4 Drive 21.4 258.0 110
## Hornet Sportabout 18.7 360.0 175
## Valiant 18.1 225.0 105
## Duster 360 14.3 360.0 245
## Merc 240D 24.4 146.7 62
## Merc 230 22.8 140.8 95
## Merc 280 19.2 167.6 123
## Merc 280C 17.8 167.6 123
## Merc 450SE 16.4 275.8 180
## Merc 450SL 17.3 275.8 180
## Merc 450SLC 15.2 275.8 180
## Cadillac Fleetwood 10.4 472.0 205
## Lincoln Continental 10.4 460.0 215
## Chrysler Imperial 14.7 440.0 230
## Fiat 128 32.4 78.7 66
## Honda Civic 30.4 75.7 52
## Toyota Corolla 33.9 71.1 65
## Toyota Corona 21.5 120.1 97
## Dodge Challenger 15.5 318.0 150
## AMC Javelin 15.2 304.0 150
## Camaro Z28 13.3 350.0 245
## Pontiac Firebird 19.2 400.0 175
## Fiat X1-9 27.3 79.0 66
## Porsche 914-2 26.0 120.3 91
## Lotus Europa 30.4 95.1 113
## Ford Pantera L 15.8 351.0 264
## Ferrari Dino 19.7 145.0 175
## Maserati Bora 15.0 301.0 335
## Volvo 142E 21.4 121.0 109

About this last point, you can read more about it here.

Hope you liked this small list of tricks!

转自:http://www.brodrigues.co/blog/2017-02-17-lesser_known_tricks/

Lesser known dplyr tricks的更多相关文章

  1. Lesser known purrr tricks

    purrr is package that extends R's functional programming capabilities. It brings a lot of new stuff ...

  2. R语言数据处理包dplyr、tidyr笔记

    dplyr包是Hadley Wickham的新作,主要用于数据清洗和整理,该包专注dataframe数据格式,从而大幅提高了数据处理速度,并且提供了与其它数据库的接口:tidyr包的作者是Hadley ...

  3. testng 教程之使用参数的一些tricks配合使用reportng

    前两次的总结:testng annotation生命周期 http://www.cnblogs.com/tobecrazy/p/4579414.html testng.xml的使用和基本配置http: ...

  4. (转) How to Train a GAN? Tips and tricks to make GANs work

    How to Train a GAN? Tips and tricks to make GANs work 转自:https://github.com/soumith/ganhacks While r ...

  5. R语言数据处理利器——dplyr简介

    dplyr是由Hadley Wickham主持开发和维护的一个主要针对数据框快速计算.整合的函数包,同时提供一些常用函数的高速写法以及几个开源数据库的连接.此包是plyr包的深化功能包,其名字中的字母 ...

  6. Matlab tips and tricks

    matlab tips and tricks and ... page overview: I created this page as a vectorization helper but it g ...

  7. dplyr包--数据操作与清洗

    1.简介 在我们数据分析的实际应用中,我们可能会花费大量的时间在数据清洗上,而如果使用 R 里面自带的一些函数(base 包的 transform 等),可能会觉得力不从心,或者不是很人性化.好在我们 ...

  8. LoadRunner AJAX TruClient协议Tips and Tricks

    LoadRunner AJAX TruClient协议Tips and Trickshttp://automationqa.com/forum.php?mod=viewthread&tid=2 ...

  9. 【翻译】C# Tips & Tricks: Weak References - When and How to Use Them

    原文:C# Tips & Tricks: Weak References - When and How to Use Them Sometimes you have an object whi ...

随机推荐

  1. SystemVerilog搭建验证平台使用DPI时遇到的问题及解决方案

    本文目的在于分享一下把DPI稿能用了的过程,主要说一下平台其他部分搭建好之后,在完成DPI相关工作阶段遇到的问题,以及解决的办法. 工作环境:win10 64bit, Questasim 10.1b ...

  2. Maven工程webinfo下面的JSP页面无法加载.js、.css文件的解决方案

    --下面是我的工程路径 --我jsp的写法 -----启动工程,访问js文件的路径是这样的, href="http://localhost:8080/activiti/css/public. ...

  3. 一个只有99行代码的JS流程框架

    张镇圳,腾讯Web前端高级工程师,对内部系统前端建设有多年经验,喜欢钻研捣鼓各种前端组件和框架. 最近一直在想一个问题,如何能让js代码写起来更语义化和更具有可读性. 上周末的时候突发奇想,当代码在运 ...

  4. CTR预估中的贝叶斯平滑方法(二)参数估计和代码实现

    1. 前言 前面博客介绍了CTR预估中的贝叶斯平滑方法的原理http://www.cnblogs.com/bentuwuying/p/6389222.html. 这篇博客主要是介绍如何对贝叶斯平滑的参 ...

  5. Hbuilder开发移动App(1)

    奇妙的前端,奇妙的js 众所周知,自从js有nodejs后,前端人员可以华丽的转身,去开发高并发非阻塞的服务端程序, 随着html5的出现,伴随一些amazing的特性,h5开发app的技术越发的成熟 ...

  6. Apache日志分割

    1.cronolog安装 采用 cronolog 工具进行 apache 日志分割 http://download.chinaunix.net/download.php?id=3457&Res ...

  7. Vue.js动画在项目使用的两个示例

    欢迎大家关注腾讯云技术社区-博客园官方主页,我们将持续在博客园为大家推荐技术精品文章哦~ 李萌,16年毕业,Web前端开发从业者,目前就职于腾讯,喜欢node.js.vue.js等技术,热爱新技术,热 ...

  8. xmlplus 组件设计系列之四 - 列表

    列表组件是极其常用的一类组件,是许多视图组件系统的必须包含的.列表可以做的很简单,只显示简洁的内容.列表也可以做的很复杂,用于展示非常丰富的内容. 组成元素 列表离不开列表项以及包含列表项的容器.下面 ...

  9. LinkBar选中项字体颜色

    通过控制disabledColor样式来控制,选中项字体的颜色.

  10. CentOS 6.8下安装docker并使用

    Docker是一个开源的应用容器引擎,可以轻松的为任何应用创建一个轻量级的.可移植的.自给自足的容器.利用Linux的LXC.AUFS.Go语言.cgroup实现了资源的独立,可以很轻松的实现文件.资 ...