Is the “*apply” family really not vectorized?

Question:

So we are used to say to every R new user that "apply isn't vectorized, check out the Patrick Burns
R Inferno Circle 4" which says (I quote):

A common reflex is to use a function in the apply family. This is not vectorization, it is loop-hiding. The apply function has a for loop in its definition. The lapply function buries the loop, but execution times tend to be roughly equal
to an explicit for loop.

Indeed, a quick look on the apply source code reveals the loop:

grep("for", capture.output(getAnywhere("apply")), value = TRUE)

## [1] "        for (i in 1L:d2) {"  "    else for (i in 1L:d2) {"

Ok so far, but a look at lapply or vapply actually reveals a completely different picture:

lapply

## function (X, FUN, ...)

## {

##     FUN <- match.fun(FUN)

##     if (!is.vector(X) || is.object(X))

##        X <- as.list(X)

##     .Internal(lapply(X, FUN))

## }

## <bytecode: 0x000000000284b618>

## <environment: namespace:base>

So apparently there is no R for loop hiding there, rather they are calling internal C written function.

A quick look in the rabbit hole reveals pretty much the same picture

Moreover, let's take the colMeans function for example, which was never accused in not being vectorised

colMeans

# function (x, na.rm = FALSE, dims = 1L)

# {

#   if (is.data.frame(x))

#     x <- as.matrix(x)

#   if (!is.array(x) || length(dn <- dim(x)) < 2L)

#     stop("'x' must be an array of at least two dimensions")

#   if (dims < 1L || dims > length(dn) - 1L)

#     stop("invalid 'dims'")

#   n <- prod(dn[1L:dims])

#   dn <- dn[-(1L:dims)]

#   z <- if (is.complex(x))

#     .Internal(colMeans(Re(x), n, prod(dn), na.rm)) + (0+1i) *

#     .Internal(colMeans(Im(x), n, prod(dn), na.rm))

#   else .Internal(colMeans(x, n, prod(dn), na.rm))

#   if (length(dn) > 1L) {

#     dim(z) <- dn

#     dimnames(z) <- dimnames(x)[-(1L:dims)]

#   }

#   else names(z) <- dimnames(x)[[dims + 1]]

#   z

# }

# <bytecode: 0x0000000008f89d20>

#   <environment: namespace:base>

Huh? It also just calls .Internal(colMeans(... which we can also find in the
rabbit hole. So how is this different from .Internal(lapply(..?

Actually a quick benchmark reveals that sapply performs no worse than
colMeans and much better than a for loop for a big data set

m <- as.data.frame(matrix(1:1e7, ncol = 1e5))

system.time(colMeans(m))

# user  system elapsed

# 1.69    0.03    1.73

system.time(sapply(m, mean))

# user  system elapsed

# 1.50    0.03    1.60

system.time(apply(m, 2, mean))

# user  system elapsed

# 3.84    0.03    3.90

system.time(for(i in 1:ncol(m)) mean(m[, i]))

# user  system elapsed

# 13.78    0.01   13.93

In other words, is it correct to say that lapply and vapply
are actually vectorised (compared to apply which is a
for loop that also calls lapply) and what did Patrick Burns really mean to say?

Answer:

First of all, in your example you make tests on a "data.frame" which is not fair for
colMeans, apply and "[.data.frame" since they have an overhead:

system.time(as.matrix(m))  #called by `colMeans` and `apply`

#   user  system elapsed

#   1.03    0.00    1.05

system.time(for(i in 1:ncol(m)) m[, i])  #in the `for` loop

#   user  system elapsed

#  12.93    0.01   13.07

On a matrix, the picture is a bit different:

mm = as.matrix(m)

system.time(colMeans(mm))

#   user  system elapsed

#   0.01    0.00    0.01

system.time(apply(mm, 2, mean))

#   user  system elapsed

#   1.48    0.03    1.53

system.time(for(i in 1:ncol(mm)) mean(mm[, i]))

#   user  system elapsed

#   1.22    0.00    1.21

Regading the main part of the question, the main difference between lapply/mapply/etc and straightforward R-loops is where the looping is done. As Roland notes, both C and R loops need to evaluate an R function in each iteration
which is the most costly. The really fast C functions are those that do everything in C, so, I guess, this should be what "vectorised" is about? An example where we find the mean in each of a "list"s elements:

#all computations in C

all_C = inline::cfunction(sig = c(R_ls = "list"), body = '

    SEXP tmp, ans;

    PROTECT(ans = allocVector(REALSXP, LENGTH(R_ls)));

    double *ptmp, *pans = REAL(ans);

    for(int i = 0; i < LENGTH(R_ls); i++) {

        pans[i] = 0.0;

        PROTECT(tmp = coerceVector(VECTOR_ELT(R_ls, i), REALSXP));

        ptmp = REAL(tmp);

        for(int j = 0; j < LENGTH(tmp); j++) pans[i] += ptmp[j];

        pans[i] /= LENGTH(tmp);

        UNPROTECT(1);

    }

    UNPROTECT(1);

    return(ans);

')

#a very simple `lapply(x, mean)`

C_and_R = inline::cfunction(sig = c(R_ls = "list"), body = '

    SEXP call, ans, ret;

    PROTECT(call = allocList(2));

    SET_TYPEOF(call, LANGSXP);

    SETCAR(call, install("mean"));

    PROTECT(ans = allocVector(VECSXP, LENGTH(R_ls)));

    PROTECT(ret = allocVector(REALSXP, LENGTH(ans)));

    for(int i = 0; i < LENGTH(R_ls); i++) {

        SETCADR(call, VECTOR_ELT(R_ls, i));

        SET_VECTOR_ELT(ans, i, eval(call, R_GlobalEnv));

    }

    double *pret = REAL(ret);

    for(int i = 0; i < LENGTH(ans); i++) pret[i] = REAL(VECTOR_ELT(ans, i))[0];

    UNPROTECT(3);

    return(ret);

')                    

R_lapply = function(x) unlist(lapply(x, mean))                       

R_loop = function(x)

{

    ans = numeric(length(x))

    for(i in seq_along(x)) ans[i] = mean(x[[i]])

    return(ans)

} 

R_loopcmp = compiler::cmpfun(R_loop)

set.seed(007); myls = replicate(1e4, runif(1e3), simplify = FALSE)

all.equal(all_C(myls), C_and_R(myls))

#[1] TRUE

all.equal(all_C(myls), R_lapply(myls))

#[1] TRUE

all.equal(all_C(myls), R_loop(myls))

#[1] TRUE

all.equal(all_C(myls), R_loopcmp(myls))

#[1] TRUE

microbenchmark::microbenchmark(all_C(myls),

                               C_and_R(myls),

                               R_lapply(myls),

                               R_loop(myls),

                               R_loopcmp(myls),

                               times = 15)

#Unit: milliseconds

#            expr       min        lq    median        uq      max neval

#     all_C(myls)  37.29183  38.19107  38.69359  39.58083  41.3861    15

#   C_and_R(myls) 117.21457 123.22044 124.58148 130.85513 169.6822    15

#  R_lapply(myls)  98.48009 103.80717 106.55519 109.54890 116.3150    15

#    R_loop(myls) 122.40367 130.85061 132.61378 138.53664 178.5128    15

# R_loopcmp(myls) 105.63228 111.38340 112.16781 115.68909 128.1976    15

Is the “*apply” family really not vectorized?的更多相关文章

JS核心系列：浅谈 call apply 与 bind
在JavaScript 中,call.apply 和 bind 是 Function 对象自带的三个方法,这三个方法的主要作用是改变函数中的 this 指向,从而可以达到`接花移木`的效果.本文将对这 ...
SQL Server-聚焦APPLY运算符（二十七）
前言其实有些新的特性在SQL Server早就已经出现过,但是若非系统的去学习数据库你会发现在实际项目中别人的SQL其实是比较复杂的,其实利用新的SQL Server语法会更加方便和简洁,从本节开始 ...
利用apply()或者rest参数来实现用数组传递函数参数
关于call()和apply()的用法,MDN文档里写的非常清晰明白,在这里就不多做记录了. https://developer.mozilla.org/zh-CN/docs/Web/JavaScri ...
由js apply与call方法想到的js数据类型（原始类型和引用类型）
原文地址:由js apply与call方法想到的js数据类型(原始类型和引用类型) js的call方法与apply方法的区别在于第二个参数的不同,他们都有2个参数,第一个为对象(即需要用对象a继承b, ...
JavaScript学习笔记(二)——闭包、IIFE、apply、函数与对象
一.闭包(Closure) 1.1.闭包相关的问题请在页面中放10个div,每个div中放入字母a-j,当点击每一个div时显示索引号,如第1个div显示0,第10个显示9:方法:找到所有的div, ...
瞬间记住Javascript中apply与call的区别
关于Javascript函数的apply与call方法的用法,网上的文章很多,我就不多话了.apply和call的作用很相似,但使用方式有区别 apply与call的第一个参数都是一个对象,这个对象就 ...
scope.$apply是干嘛的
开始用angular做项目的时候,一定碰到过$scope.$apply()方法,表面上看,这像是一个帮助你进行数据更新的方法,那么,它为何存在,我们又该如何使用它呢. JavaScript执行顺序 J ...
JavaScript中的apply,call与this的纠缠
1.apply定义 apply:调用函数,并用指定对象替换函数的 this 值,同时用指定数组替换函数的参数. 语法:apply([thisObj[,argArray]]) thisObj 可选.要用 ...
jQuery之常用且重要方法梳理（siblings,nextAll,end,wrap,apply,call,each）-（二）
1.siblings() siblings() 获得匹配集合中每个元素的同胞,通过选择器进行筛选是可选的. <body> <div><span>Hello</ ...

随机推荐

git 包教包会
# Git全面解析版本控制工具:VSS.CVS.SVN.Git等,其中Git属于绝对霸主地位. 注意:一般版本控制工具包含两部分客户端(本地):本地编写内容以及版本记录服务端(网盘):将内容和版 ...
windows 批处理语言学习
程序员应该根植于心的一个理念是:重复的工作交给代码.windows上的批处理脚本就是这种理念的体现. 批处理bat能做的事很多,自动配置vs工程中的代码依赖环境,调用其它程序处理数据.自动编译代码等等 ...
微信小程序scroll-view 横向和纵向scroll-view组件
scroll-view为滚动视图,分为水平滚动和垂直滚动.注意滚动视图垂直滚动时一定要设置高度否则的话scroll-view不会生效.滚动视图常用的地方一般都是Item项比较多的界面,比如我的模块主 ...
在jsp的js和css里面使用EL表达式取值|style里面用$取值
众所周知,如果直接在jsp的js或者css语句块里面写${***}取值的话,程序会不识别这玩意,但是,我们有时候确实需要动态取值,比如,js为了获得对象的某一个值,不方便用js的getElementB ...
javaScript 物体多形态改变加回调函数
小方块同时改变 width height top left opacity(透明度) 加回调函数改变第二个方块. 效果如下: <!DOCTYPE html> <html lang ...
python买卖股票的最佳时机--贪心/蛮力算法简介
开始刷leetcode算法题今天做的是“买卖股票的最佳时机” 题目要求给定一个数组,它的第 i 个元素是一支给定股票第 i 天的价格. 设计一个算法来计算你所能获取的最大利润.你可以尽可能地完成更 ...
【xsy2332】Randomized Binary Search Tree DP+FFT
题目大意:给你一个$[0,1]$之间等概率随机序列,你需要把这个序列插入到一棵$treap$中,问这棵$treap$的期望深度,请对于$[1,n]$中的每个深度分别输出它的概率(实数,保留五位小数). ...
[EXP]CVE-2018-2628 Weblogic GetShell Exploit
漏洞简介漏洞威胁:RCE--远程代码执行漏洞组件:weblogic 影响版本:10.3.6.0.12.1.3.0.12.2.1.2.12.2.1.3 代码: # -*- coding: utf-8 ...
oracle中常见的对表、表空间和视图的操作
创建表:create table t1(key1 type default 0,key2 type not null) 删除表:drop table t1; 删除表数据:truncate table ...
课程五(Sequence Models)，第一周（Recurrent Neural Networks） —— 3.Programming assignments：Jazz improvisation with LSTM
Improvise a Jazz Solo with an LSTM Network Welcome to your final programming assignment of this week ...

Is the “*apply” family really not vectorized?

Is the “*apply” family really not vectorized?的更多相关文章

随机推荐

热门专题