Why is 'x' in ('x',) faster than 'x' == 'x'?

Question:

>>> timeit.timeit("'x' in ('x',)")

0.04869917374131205

>>> timeit.timeit("'x' == 'x'")

0.06144205736110564

Also works for multiple options, both seem to grow linearly:

>>> timeit.timeit("'x' in ('x', 'y')")

0.04866674801541748

>>> timeit.timeit("'x' == 'x' or 'x' == 'y'")

0.06565782838087131

>>> timeit.timeit("'x' in ('y', 'x')")

0.08975995576448526

>>> timeit.timeit("'x' == 'y' or 'x' == 'y'")

0.12992391047427532

Based on this, I think I should start using in everywhere
instead of == !!

Answer:

As I mentioned to David Wolever, there's more to this than meets the eye; both methods dispatch to is;
you can prove this by doing

min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))

#>>> 0.00045456900261342525

min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))

#>>> 0.5256857610074803

The first can only be so fast because it checks by identity.

To find out why one would take longer than the other, let's trace through execution.

They both start in ceval.c,
from COMPARE_OP since
that is the bytecode involved

TARGET(COMPARE_OP) {

    PyObject *right = POP();

    PyObject *left = TOP();

    PyObject *res = cmp_outcome(oparg, left, right);

    Py_DECREF(left);

    Py_DECREF(right);

    SET_TOP(res);

    if (res == NULL)

        goto error;

    PREDICT(POP_JUMP_IF_FALSE);

    PREDICT(POP_JUMP_IF_TRUE);

    DISPATCH();

}

This pops the values from the stack (technically it only pops one)

PyObject *right = POP();

PyObject *left = TOP();

and runs the compare:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome is
this:

static PyObject *

cmp_outcome(int op, PyObject *v, PyObject *w)

{

    int res = 0;

    switch (op) {

    case PyCmp_IS: ...

    case PyCmp_IS_NOT: ...

    case PyCmp_IN:

        res = PySequence_Contains(w, v);

        if (res < 0)

            return NULL;

        break;

    case PyCmp_NOT_IN: ...

    case PyCmp_EXC_MATCH: ...

    default:

        return PyObject_RichCompare(v, w, op);

    }

    v = res ? Py_True : Py_False;

    Py_INCREF(v);

    return v;

}

This is where the paths split. The PyCmp_IN branch
does

int

PySequence_Contains(PyObject *seq, PyObject *ob)

{

    Py_ssize_t result;

    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;

    if (sqm != NULL && sqm->sq_contains != NULL)

        return (*sqm->sq_contains)(seq, ob);

    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);

    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);

}

Note that a tuple is defined as

static PySequenceMethods tuple_as_sequence = {

    ...

    (objobjproc)tuplecontains,                  /* sq_contains */

};

PyTypeObject PyTuple_Type = {

    ...

    &tuple_as_sequence,                         /* tp_as_sequence */

    ...

};

So the branch

if (sqm != NULL && sqm->sq_contains != NULL)

will be taken and *sqm->sq_contains,
which is the function (objobjproc)tuplecontains,
will be taken.

This does

static int

tuplecontains(PyTupleObject *a, PyObject *el)

{

    Py_ssize_t i;

    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)

        cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),

                                           Py_EQ);

    return cmp;

}

...Wait, wasn't that PyObject_RichCompareBool what
the other branch took? Nope, that was PyObject_RichCompare.

That code path was short so it likely just comes down to the speed of these two. Let's compare.

int

PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)

{

    PyObject *res;

    int ok;

    /* Quick result when objects are the same.

       Guarantees that identity implies equality. */

    if (v == w) {

        if (op == Py_EQ)

            return 1;

        else if (op == Py_NE)

            return 0;

    }

    ...

}

The code path in PyObject_RichCompareBool pretty
much immediately terminates. For PyObject_RichCompare,
it does

PyObject *

PyObject_RichCompare(PyObject *v, PyObject *w, int op)

{

    PyObject *res;

    assert(Py_LT <= op && op <= Py_GE);

    if (v == NULL || w == NULL) { ... }

    if (Py_EnterRecursiveCall(" in comparison"))

        return NULL;

    res = do_richcompare(v, w, op);

    Py_LeaveRecursiveCall();

    return res;

}

The Py_EnterRecursiveCall/Py_LeaveRecursiveCall combo
are not taken in the previous path, but these are relatively quick macros that'll short-circuit after incrementing and decrementing some globals.

do_richcompare does:

static PyObject *

do_richcompare(PyObject *v, PyObject *w, int op)

{

    richcmpfunc f;

    PyObject *res;

    int checked_reverse_op = 0;

    if (v->ob_type != w->ob_type && ...) { ... }

    if ((f = v->ob_type->tp_richcompare) != NULL) {

        res = (*f)(v, w, op);

        if (res != Py_NotImplemented)

            return res;

        ...

    }

    ...

}

This does some quick checks to call v->ob_type->tp_richcompare which
is

PyTypeObject PyUnicode_Type = {

    ...

    PyUnicode_RichCompare,      /* tp_richcompare */

    ...

};

which does

PyObject *

PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)

{

    int result;

    PyObject *v;

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

        Py_RETURN_NOTIMPLEMENTED;

    if (PyUnicode_READY(left) == -1 ||

        PyUnicode_READY(right) == -1)

        return NULL;

    if (left == right) {

        switch (op) {

        case Py_EQ:

        case Py_LE:

        case Py_GE:

            /* a string is equal to itself */

            v = Py_True;

            break;

        case Py_NE:

        case Py_LT:

        case Py_GT:

            v = Py_False;

            break;

        default:

            ...

        }

    }

    else if (...) { ... }

    else { ...}

    Py_INCREF(v);

    return v;

}

Namely, this shortcuts on left == right... but only after doing

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||

        PyUnicode_READY(right) == -1)

All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)

POP()                           # Stack stuff

TOP()                           #

                                #

case PyCmp_IN:                  # Dispatch on operation

                                #

sqm != NULL                     # Dispatch to builtin op

sqm->sq_contains != NULL        #

*sqm->sq_contains               #

                                #

cmp == 0                        # Do comparison in loop

i < Py_SIZE(a)                  #

v == w                          #

op == Py_EQ                     #

++i                             #

cmp == 0                        #

                                #

res < 0                         # Convert to Python-space

res ? Py_True : Py_False        #

Py_INCREF(v)                    #

                                #

Py_DECREF(left)                 # Stack stuff

Py_DECREF(right)                #

SET_TOP(res)                    #

res == NULL                     #

DISPATCH()                      #

POP()                           # Stack stuff

TOP()                           #

                                #

default:                        # Dispatch on operation

                                #

Py_LT <= op                     # Checking operation

op <= Py_GE                     #

v == NULL                       #

w == NULL                       #

Py_EnterRecursiveCall(...)      # Recursive check

                                #

v->ob_type != w->ob_type        # More operation checks

f = v->ob_type->tp_richcompare  # Dispatch to builtin op

f != NULL                       #

                                #

!PyUnicode_Check(left)          # ...More checks

!PyUnicode_Check(right))        #

PyUnicode_READY(left) == -1     #

PyUnicode_READY(right) == -1    #

left == right                   # Finally, doing comparison

case Py_EQ:                     # Immediately short circuit

Py_INCREF(v);                   #

                                #

res != Py_NotImplemented        #

                                #

Py_LeaveRecursiveCall()         # Recursive check

                                #

Py_DECREF(left)                 # Stack stuff

Py_DECREF(right)                #

SET_TOP(res)                    #

res == NULL                     #

DISPATCH()                      #

Now, PyUnicode_Check and PyUnicode_READY are
pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.

TL;DR:

Both dispatch to if (left_pointer == right_pointer); the difference is just how much work they do to get there. in just
does less.

Why is 'x' in ('x',) faster than 'x' == 'x'?的更多相关文章

faster r-cnn 在CPU配置下训练自己的数据
因为没有GPU,所以在CPU下训练自己的数据,中间遇到了各种各样的坑,还好没有放弃,特以此文记录此过程. 1.在CPU下配置faster r-cnn,参考博客:http://blog.csdn.net ...
r-cnn学习系列（三）：从r-cnn到faster r-cnn
把r-cnn系列总结下,让整个流程更清晰. 整个系列是从r-cnn至spp-net到fast r-cnn再到faster r-cnn. RCNN 输入图像,使用selective search来构造 ...
faster with MyISAM tables than with InnoDB or NDB tables
http://dev.mysql.com/doc/refman/5.7/en/partitioning-limitations.html Performance considerations. So ...
situations where MyISAM will be faster than InnoDB
http://www.tocker.ca/categories/myisam Converting MyISAM to InnoDB and a lesson on variance I'm abou ...
Faster RNNLM (HS/NCE) toolkit
https://github.com/kjw0612/awesome-rnn Faster Recurrent Neural Network Language Modeling Toolkit wit ...
Faster R-CNN CPU环境搭建
操作系统: bigtop@bigtop-SdcOS-Hypervisor:~/py-faster-rcnn/tools$ cat /etc/issue Ubuntu LTS \n \l Python版 ...
Why is processing a sorted array faster than an unsorted array?
这是我在逛 Stack Overflow 时遇见的一个高分问题:Why is processing a sorted array faster than an unsorted array?,我觉得这 ...
Introducing the Accelerated Mobile Pages Project, for a faster, open mobile web
https://googleblog.blogspot.com/2015/10/introducing-accelerated-mobile-pages.html October 7, 2015 Sm ...
论文阅读之：Is Faster R-CNN Doing Well for Pedestrian Detection?
Is Faster R-CNN Doing Well for Pedestrian Detection? ECCV 2016 Liliang Zhang & Kaiming He 原文链接 ...
如何才能将Faster R-CNN训练起来？
如何才能将Faster R-CNN训练起来? 首先进入 Faster RCNN 的官网啦,即:https://github.com/rbgirshick/py-faster-rcnn#installa ...

随机推荐

基于UML的毕业选题系统建模研究
一.基本信息标题:基于UML的毕业选题系统建模研究时间:2018 出版源:电脑迷领域分类:UML建模技术二.研究背景问题定义:为了加强学生设计分析开发软件的相关能力,有效避免结构化模型存在的 ...
【repost】H5总结
1.新增的语义化标签: <nav>: 导航 <header>: 页眉 <footer>: 页脚 <section>:区块 <article> ...
Codeforces Round #539--1113B - Sasha and Magnetic Machines
https://codeforces.com/contest/1113/problem/B 思想不难,但是在比较大小的时候,我选择了很笨的方法,我用两个数变化之后的差值大小来进行选择,然后最后再进行数 ...
推荐使用OpenLiveWriter在cnblogs上写的Blog
这是我第一个使用OpenLiveWriter在cnblogs上写的Blog.不知道效果如何,但又很多功能我可以采用! 如表格功能: Open Live Writer Write on Web 优 ...
Centos7中docker开启远程访问
在作为docker远程服务的centos7机器中配置: 1.在/usr/lib/systemd/system/docker.service,配置远程访问.主要是在[Service]这个部分,加上下面两 ...
解决C#中调用WCF方法报错：远程服务器返回错误 (404) 未找到
IIS配置问题,解决方法: 1. 首先添加MIME类型扩展名“.svc”,MIME类型 “application/octet-stream” 2.处理程序映射--添加托管处理程序请求路径 “.sv ...
MyBatis 源码分析 - SQL 的执行过程
* 本文速览本篇文章较为详细的介绍了 MyBatis 执行 SQL 的过程.该过程本身比较复杂,牵涉到的技术点比较多.包括但不限于 Mapper 接口代理类的生成.接口方法的解析.SQL 语句的解析 ...
python 使用 sorted 对列表嵌套元组的数据进行排序
在开发的过程可能会遇到这么一个需求,存在一个列表嵌套元组的数据: data = [(1, 'a'),(2, 'b'),(5, 'c'),(3, 'd'),(4, 'e')] 需要将这个列表按照元组的第 ...
Java 实现将其他类型数据转换成 JSON 字符串工具类
这是网上一个大神实现的,具体出处已找不到,在这做个记录,方便以后使用. package com.wb.test; import java.beans.IntrospectionException; i ...
Spring MVC & Boot & Cloud 技术教程汇总（长期更新）
昨天我们发布了Java成神之路上的知识汇总,今天继续. Java成神之路技术整理(长期更新) 以下是Java技术栈微信公众号发布的关于 Spring/ Spring MVC/ Spring Boot/ ...

Why is 'x' in ('x',) faster than 'x' == 'x'?

TL;DR:

Why is 'x' in ('x',) faster than 'x' == 'x'?的更多相关文章

随机推荐

热门专题