keras 自适应分配显存 & 清理不用的变量释放 GPU 显存

Intro

Are you running out of GPU memory when using keras or tensorflow deep learning models, but only some of the time?

Are you curious about exactly how much GPU memory your tensorflow model uses during training?

Are you wondering if you can run two or more keras models on your GPU at the same time?

Background

By default, tensorflow pre-allocates nearly all of the available GPU memory, which is bad for a variety of use cases, especially production and memory profiling.

When keras uses tensorflow for its back-end, it inherits this behavior.

Setting tensorflow GPU memory options

For new models

Thankfully, tensorflow allows you to change how it allocates GPU memory, and to set a limit on how much GPU memory it is allowed to allocate.

Let’s set GPU options on keras‘s example Sequence classification with LSTM network

 
## keras example imports
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM ## extra imports to set GPU options
import tensorflow as tf
from keras import backend as k ###################################
# TensorFlow wizardry
config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated
#config.gpu_options.per_process_gpu_memory_fraction = 0.5 # Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))
################################### model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy']) model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)

After the above, when we create the sequence classification model, it won’t use half the GPU memory automatically, but rather will allocate GPU memory as-needed during the calls to model.fit() and model.evaluate().

Additionally, with the per_process_gpu_memory_fraction = 0.5tensorflow will only allocate a total of half the available GPU memory.

If it tries to allocate more than half of the total GPU memory, tensorflow will throw a ResourceExhaustedError, and you’ll get a lengthy stack trace.

If you have a Linux machine and an nvidia card, you can watch nvidia-smi to see how much GPU memory is in use, or can configure a monitoring tool like monitorix to generate graphs for you.

GPU memory usage, as shown in Monitorix for Linux

For a model that you’re loading

We can even set GPU memory management options for a model that’s already created and trained, and that we’re loading from disk for deployment or for further training.

For that, let’s tweak keras‘s load_model example:

 
# keras example imports
from keras.models import load_model ## extra imports to set GPU options
import tensorflow as tf
from keras import backend as k ###################################
# TensorFlow wizardry
config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated
config.gpu_options.per_process_gpu_memory_fraction = 0.5 # Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))
################################### # returns a compiled model
# identical to the previous one
model = load_model('my_model.h5') # TODO: classify all the things

Now, with your loaded model, you can open your favorite GPU monitoring tool and watch how the GPU memory usage changes under different loads.

Conclusion

Good news everyone! That sweet deep learning model you just made doesn’t actually need all that memory it usually claims!

And, now that you can tell tensorflow not to pre-allocate memory, you can get a much better idea of what kind of rig(s) you need in order to deploy your model into production.

Is this how you’re handling GPU memory management issues with tensorflow or keras?

Did I miss a better, cleaner way of handling GPU memory allocation with tensorflow and keras?

Let me know in the comments!

 
 
====================================================================================
 

How to remove stale models from GPU memory

import gc
m = Model(.....)
m.save(tmp_model_name)
del m
K.clear_session()
gc.collect()
m = load_model(tmp_model_name)

参考: https://michaelblogscode.wordpress.com/2017/10/10/reducing-and-profiling-gpu-memory-usage-in-keras-with-tensorflow-backend/

https://github.com/keras-team/keras/issues/5345

Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend的更多相关文章

  1. GPU Memory Usage占满而GPU-Util却为0的调试

    最近使用github上的一个开源项目训练基于CNN的翻译模型,使用THEANO_FLAGS='floatX=float32,device=gpu2,lib.cnmem=1' python run_nn ...

  2. Allowing GPU memory growth

    By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICE ...

  3. Redis: Reducing Memory Usage

    High Level Tips for Redis Most of Stream-Framework's users start out with Redis and eventually move ...

  4. Android 性能优化(21)*性能工具之「GPU呈现模式分析」Profiling GPU Rendering Walkthrough:分析View显示是否超标

    Profiling GPU Rendering Walkthrough 1.In this document Prerequisites Profile GPU Rendering $adb shel ...

  5. Memory usage of a Java process java Xms Xmx Xmn

    http://www.oracle.com/technetwork/java/javase/memleaks-137499.html 3.1 Meaning of OutOfMemoryError O ...

  6. Shell script for logging cpu and memory usage of a Linux process

    Shell script for logging cpu and memory usage of a Linux process http://www.unix.com/shell-programmi ...

  7. 5 commands to check memory usage on Linux

    Memory Usage On linux, there are commands for almost everything, because the gui might not be always ...

  8. SHELL:Find Memory Usage In Linux (统计每个程序内存使用情况)

    转载一个shell统计linux系统中每个程序的内存使用情况,因为内存结构非常复杂,不一定100%精确,此shell可以在Ghub上下载. [root@db231 ~]# ./memstat.sh P ...

  9. Why does the memory usage increase when I redeploy a web application?

    That is because your web application has a memory leak. A common issue are "PermGen" memor ...

随机推荐

  1. How to manage IntelliJ IDEA projects under Version Control Systems

    如何在版本控制系统中管理 IntelliJ IDEA 项目文件 IntelliJ IDEA 设置详细,功能强大.在实际工作中,我们有时会遇到跟同事共享项目文件的情况. 那么,有哪些项目文件应该加入到版 ...

  2. pymysql模块使用

    一.写函数的原因 写这个函数的原因就是为了能够不每次在用Python用数据库的时候还要在写一遍  做个通用函数做保留,也给大家做个小小的分享,函数不是最好的,希望有更好的代码的朋友能提出 互相学习 二 ...

  3. POJ 1102

    #include<iostream>// cheng da cai zi 11.14 using namespace std; int main() { int i; int j; int ...

  4. 剑指offer二十九之最小的K个数

    一.题目 输入n个整数,找出其中最小的K个数.例如输入4,5,1,6,2,7,3,8这8个数字,则最小的4个数字是1,2,3,4,. 二.思路 详解代码. 三.代码 import java.util. ...

  5. python基础笔记之注释三种方法

    ---恢复内容开始--- 1,,单行注释  用# 2,多行注释 用 “”” dddd""" 3,较长行虽然分行写但是只是注释,最终显示为一行:用 \ ---恢复内容结束- ...

  6. Windows环境下执行hadoop命令出现Error: JAVA_HOME is incorrectly set Please update D:\SoftWare\hadoop-2.6.0\conf\hadoop-env.cmd错误的解决办法(图文详解)

    不多说,直接上干货! 导读   win下安装hadoop 大家,别小看win下的安装大数据组件和使用  玩过dubbo和disconf的朋友们,都知道,在win下安装zookeeper是经常的事   ...

  7. java-TreeSet进行排序的2种方式

    TreeSet和HashSet的区别在于, TreeSet可以进行排序, 默认使用字典顺序排序, 也可以进行自定义排序 1, 自然排序 2, 比较器排序 自然排序: 1, 需要被排序的类实现Compa ...

  8. STL 容器简介

    一.概述 STL 对定义的通用容器分三类:顺序性容器.关联式容器和容器适配器. 顺序性容器是一种各元素之间有顺序关系的线性表.元素在顺序容器中保存元素置入容器时的逻辑顺序,除非用删除或插入的操作改变这 ...

  9. Oracle驱动classes12.jar 与ojdbc14.jar的区别

    简单的说,如果使用jdk1.2和jdk1.3就使用classes12.jar:如果使用的jdk1.4和jdk1.5的,就选用ojdbc14.jar. 驱动包classes12.jar用于JDK 1.2 ...

  10. 【Xmail】使用Xmail搭建局域网邮件服务器

    下载地址:  http://www.xmailserver.org/xmail-1.27.win32bin.zip,当前最新版本  1.27. 解压文件:xmail-1.27.win32bin.zip ...