Reference: https://impythonist.wordpress.com/2016/02/21/building-high-performance-django-systems/

The main motto of Django web framework is:

The web framework for perfectionists with deadlines

It is true. Django always gives a polished product within the time. Today all Django developers are racing to finish the project development with Python as their favorite choice. But evil of wrong development practices can slow down the project by significant amount.

These days perfectionism is falling for deadlines. The eagerness to finish task dominates the efficiency and optimization. People complain too much about Django’s code abstraction which makes it slow. But it is not true. I am going to prove my statement here. I will show how to optimize the Django code and where to optimize. We need to hit the sweet spot and do repair there.

The techniques those can improve our Django website performance:

Advanced & Correct Django ORM usage
Query caching
Django template caching
Non-blocking code
Alternate data stores
* DJANGO ORM (DOCTOR EVIL OF NEW COMERS)

Django ORM is the easiest thing to link an application and a database(MySQL, PostreSQL). For any web stack communication between web application and database is the slowest part. With bad ORM usage practices we are making it even much slower. Django is a very good framework which gives you full customization of how you define business logic. I am going to show here how we can fall into traps of ORM, which in turn turns our website not scalable.

* SELECT ALL ILLUSION

When a developer new to Django writes code, she usually have a bad habit of doing this.

from django.db import models

class Person(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
city = models.CharField(max_length=30)

# Find number of persons in DB. Very bad thing
>>> count = len(Person.objects.all())

# How an amateur should do that in a right way
>>> count = Person.objects.count()
Loading objects into memory and processing is a bad thing. SQL is an excellent querying language to filter and process data. There is no need for us to bring raw data and process them. If possible use the ORM functions which maps one-to-one to the SQL. If we see above example with one hundred thousand records in MySQL times will be

See the time difference for both ORM queries
See the time difference for both ORM queries
Journey from almost no time to nearly 9 seconds. If you insert the second query in 20 places, website will be dead slow even with high resources. There is a chance for experienced people not doing this silly mistake. But there is a “select * from db illusion” that got taught in our first database class and widely used. People even though need few fields, fetches objects with full data from DB to make overhead. It is like doing

mysql> select first_name from person
mysql> select * from person
Here we have only one additional field. But in reality we need 5 fields out of 40. Then querying all fields loads the memory with unnecessary data. There is a solution for this. Let us fetch only the first names of people who live in Hyderabad city.

# This query fetches only id, first_name from DB
>>> p1 = Person.objects.filter(city="Hyderabad").values("id","first_name")[0]
>>> print p1["first_name]

# This fetches all fields information
>>> p1 = Person.objects.filter(city="Hyderabad")[0]
>>> print p1["first_name]
This query only fetches two columns id, first_name instead of fetching all. It will save memory of unwanted fields from just filtering.

* REPETITIVE DATABASE CALLS

In SQL, joins are used to fetch data in a single shot from related tables. We can apply inner joins to combine results from multiple tables matching a criteria. Django provides advanced constructs like select_related and prefetch_related to optimize the related object queries. I will show here why we need to use them.

from django.db import models

class Author(models.Model):
name = models.CharField(max_length=30)
# ...

class Book(models.Model):
name = models.CharField(max_length=30)
author = models.ForeignKey(Author, on_delete=models.CASCADE)
# ...
Here Book has a foreign key of Author. So we can query books in this way.

# Hits the DB for first time
>>> book = Book.objects.get(id=1)
>>> book.name

# Hits the DB again
>>> book.author.name
If you are querying a set of books and then trying to access all their related authors, it is a bunch of queries suffocating the DB.

from django.utils import timezone

# Find all the authors who published books
authors = set()

for e in Book.objects.filter(pub_date__lt=timezone.now()):
# For each published book make a DB query to fetch author.
authors.add(e.author)
It means if there are 300 books, 300 queries are going to be hit.
What is Solution?
You should use select_related in that case. It fetches all related fields specified using Joins Similarly
>>> book = Book.objects.select_related('author').get(id=1)
# This won't cost another query
>>> book.author.name
Similarly you can use prefetch_related for many to many fields since select_related can only used for one to one field. For thorough inspection of how Django ORM is making SQL calls use connection.queries from django.db library
>>> from django import db

# It gives a list of raw SQL queries those executed by django on DB
>>> print db.connection.queries

# Clear that list and start listening to SQL
>>> db.reset_queries()
muftaba
For more advanced tips for optimization of ORM visit these official django docs.
https://docs.djangoproject.com/en/1.9/topics/performance/
* CACHING (SWISS KNIFE)

Caching is the best method to reduce the DB hits as many as possible. There are different kinds of caching implementations in Django.

cached property on model
template caching
query caching
CACHED PROPERTY ON MODEL

We all use properties on Django models. They are the functions which returns calculated properties from a particular model. For example let us have a fullName property which returns complete name by appending first_name + last_name. Each time you compute fullName on a model, some processing needs to be done on a model data.

from django.db import models

class Person(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)

@property
def fullName(self):
# Any expensive calculation on instance data
return self.first_name + " " + self.last_name

>>> naren = Person.objects.get(pk = 1)

# Now it calculates fullName from first_name and last_name data of instance
>>> naren.fullName
Naren Aryan
And if we call it in template, once again value is calculated from data.

<p> Name: {{ naren.fullName }} </p>
If you know that for a particular model instance, calculated property won’t change then you can cache that result instead of calculating it once again. So modify code to….
from django.utils.functional import cached_property
# ...
@cached_property
def fullName(self):
# Any expensive calculation on instance data
# This returning value is cached and not calculated again
return self.first_name + " " + self.last_name
Now if you call the fullName property model returns a cached value instead of returning computed first_name + last_name. You can invalidate the data by deleting the property on a model instance. Here appending first_name and last_name is a simple thing. It is very useful in optimizing a heavily computation task that processed in a property.
QUERY CACHING

Many times we call the same queries to fetch data. If data is not changing rapidly we can cache the QuerySet which is returned by a particular query. Caching systems generates a hash of SQL query and maps them to cached results. So whenever ORM tries to call the model query sets the cached query sets will be called. There are two good caching libraries in available in Django.

Johnny Cache
Cache Machine
Using cache machine with Redis as back-end store, we can cache the QuerySets. Usage is very simple. But invalidating data here is done by timeouts. Invalidating data and refreshing query set can also be done effectively using post-save data hook on a model. For example

from django.db import models

from caching.base import CachingManager, CachingMixin

class Person(CachingMixin, models.Model):
name = CharField(max_length=30)
objects = CachingManager()
We can cache all QuerySets generated for the Person model by simple syntax as above. It is a good feature if you have more reads over writes. And remember to invalidate a Query set when new data is saved. Use timeouts according to actual situation.
TEMPLATE CACHING

If you have web pages whose content won’t change for longer periods of time then cache the parts like sub menu page or navigation bar of website which remains constant. For a news website, content remains same on side pane etc. You can give time out for a particular fragment of template. Until the timeout happen, only cached page will be returned reducing the DB hits. We can use cache machine once again for doing this task. Django also has an inbuilt caching available. This is a small but effective step in optimizing the Django web site.

{% load cache %}
{% cache 500 sidebar %}
.. sidebar ..
{% endcache %}
For more information visit this link for Per-view caching and many more.

https://docs.djangoproject.com/en/1.9/topics/cache/
NON BLOCKING CODE

When your Django project size is growing and different teams are cluttering your code, the main problem comes with adding synchronous API calls in between the code. There is another case where Django code got blocked in doing “No Hurry” things (like sending email, converting invoice HTML to PDF) and instant necessities (show web page) are not being served .In both the cases you need to follow asynchronous task completion which removes burden from your main Django’s python interpreter. Use following

Messaging Queues + Worker management (Rabbit MQ + Celery)
Async IO – Python 3 (or) Python future-requests -Python 2.7
I wrote a practical guide of how to use celery and Redis to do that in my article.integrating Mailgun Email service with Django

* SCALING INFRASTRUCTURE

In additional to coding standards for optimization, stack also plays a vital role in scaling a Django website. But it is waste to set up huge stack with all bad practices. Here I am going to briefly show which stack allows us to scale.

django_sky1

But think of having all these when you really need them.The essential components those should be in your stack are:

Load Balancers (HAProxy)
Web accelarators (Varnish)
Caching backends (Redis)
JSON stores (PostgreSQL JSON store)
Caching back-end like Redis can be used for multiple purposes. Storing cache results from multiple caching systems and to store frequent data of small size for verifying the users etc. Varnish is a good static file caching system. You can have heartbeat based load balancers that shares load between multiple web application servers intelligently. There are lot of good open source tools available too for tuning a website and analyzing the week points. I prefer postgreSQL JSON store than Mongo DB for storing JSON documents.

All this proves that a Django website can live happily with minimal stack with correct ORM implementation standards. If actually needed, then right infrastructure will comes to the rescue. Many of these patterns are also applicable to other language web frameworks too.

If you have any query, comment below or mail me at narenarya@live.com

REFERENCES

https://docs.djangoproject.com/en/1.9/ref/utils/#django.utils.functional.cached_property

https://docs.djangoproject.com/en/1.9/topics/cache/

https://github.com/jmoiron/johnny-cache

http://blog.narenarya.in

https://highperformancedjango.com/

https://dzone.com/articles/milk-your-caching-all-its

Django performance的更多相关文章

  1. High Performance Django

    构建高性能Django站点   性能 可用 伸缩 扩展 安全 build 1.审慎引入第三方库(是否活跃.是否带入query.是否容易缓存) 2.db:减少query次数 减少耗时query 减小返回 ...

  2. Python自动化之django的ORM操作——Python源码

    """ The main QuerySet implementation. This provides the public API for the ORM. " ...

  3. Apache部署django项目

    在此之前,我们一直使用django的manage.py 的runserver 命令来运行django应用,但这只是我们的开发环境,当项目真正部署上线的时候这做就不可行了,必须将我们的项目部署到特定的w ...

  4. django - from django.db.models import F - class F

    F() 的执行不经过 python解释器,不经过本机内存,是生成 SQL语句的执行. # Tintin filed a news story! reporter = Reporters.objects ...

  5. win7下,使用django运行django-admin.py无法创建网站

    安装django的步骤: 1.安装python,选择默认安装在c盘即可.设置环境变量path,值添加python的安装路径. 2.下载ez_setup.py,下载地址:http://peak.tele ...

  6. Django Model field reference

    ===================== Model field reference ===================== .. module:: django.db.models.field ...

  7. Django访问量和页面点击数统计

    http://blog.csdn.net/pipisorry/article/details/47396311 下面是在模板中做一个简单的页面点击数统计.model阅读量统计.用户访问量统计的方法 简 ...

  8. Django的admin管理系统写入中文出错的解决方法/1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation ‘locate’

    Django的admin管理系统写入中文出错的解决方法 解决错误: 1267  Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and ( ...

  9. Django 详解 中间件Middleware

    Django中间件 还是涉及到django的请求生命周期.middle ware 请求穿过中间件到达url,再经过中间件返回给用户. 简单实例 django项目根目录新建一个Middle文件夹,再新建 ...

随机推荐

  1. 《Peering Inside the PE: A Tour of the Win32 Portable Executable File Format》阅读笔记二

    Common Sections The .text section is where all general-purpose code emitted by the compiler or assem ...

  2. PHP中将ip地址转成十进制数的两种实用方法

    As we all know that long2ip works as ip1.ip2.ip3.ip4 (123.131.231.212) long ip => (ip1 * 256 * 25 ...

  3. java网络之tcp

    简单tcp传输 package pack; /* 演示tcp传输. 1,tcp分客户端和服务端. 2,客户端对应的对象是Socket. 服务端对应的对象是ServerSocket. 客户端, 通过查阅 ...

  4. 款待奶牛(treat)

    款待奶牛(treat) 题目描述 FJ有n(1≤n≤2000)个美味的食物,他想卖掉它们来赚钱给奶牛.这些食物放在一些箱子里,它们有些有趣的特性:(1)这些食物被编号为1-n,每一天FJ可以从这排箱子 ...

  5. HDU 2444 The Accomodation of Students(二分图判定+最大匹配)

    这是一个基础的二分图,题意比较好理解,给出n个人,其中有m对互不了解的人,先让我们判断能不能把这n对分成两部分,这就用到的二分图的判断方法了,二分图是没有由奇数条边构成环的图,这里用bfs染色法就可以 ...

  6. Quartz总结(三):动态修改定时器一

    package com.mc.bsframe.job; import org.quartz.Scheduler; import org.quartz.SchedulerException; impor ...

  7. AIDL原理解析

    首先为什么需要aidl? 下面是不需要aidl 的binder的IPC通讯过程,表面上结构很简单,但是有个困难就是,客户端和服务端进行通讯,你得先将你的通讯请求转换成序列化的数据,然后调用transa ...

  8. swfobject.embedSWF属性与用法

    JS+flash的焦点幻灯片既能大方得体的展示焦点信息,也能美轮美奂的展示图片,越来越多的网站使用这种焦点幻灯的表现方法.很多童鞋在下载这方面的素材代码的时候,往往会因为展示出来的是flash,觉得难 ...

  9. LB 高可扩展性集群(负载均衡集群)

    一.什么是负载均衡 首先我们先介绍一下什么是负载均衡: 负载平衡(Load balancing)是一种计算机网络技术,用来在多个计算机(计算机集群).网络连接.CPU.磁盘驱动器或其他资源中分配负载, ...

  10. java 读excel xlsx

    http://bbs.csdn.net/topics/380257685 import java.io.File; import java.io.IOException; import java.io ...