Test with:

Keras: 2.2.4
Python: 3.6.9
Tensorflow: 1.12.0

==================

Problem:

Using code from https://github.com/matterport/Mask_RCNN

When setting GPU_COUNT > 1

enconter this error:

RuntimeError: It looks like you are subclassing `Model` and you forgot to call `super(YourClass, self).__init__()`. Always start with this line.
Traceback (most recent call last):
File "D:\Anaconda33\lib\site-packages\keras\engine\network.py", line 313, in __setattr__
is_graph_network = self._is_graph_network
File "parallel_model.py", line 46, in __getattribute__
return super(ParallelModel, self).__getattribute__(attrname)
AttributeError: 'ParallelModel' object has no attribute '_is_graph_network' During handling of the above exception, another exception occurred: Traceback (most recent call last):
File "parallel_model.py", line 159, in <module>
model = ParallelModel(model, GPU_COUNT)
File "parallel_model.py", line 35, in __init__
self.inner_model = keras_model
File "D:\Anaconda33\lib\site-packages\keras\engine\network.py", line 316, in __setattr__
'It looks like you are subclassing `Model` and you '
RuntimeError: It looks like you are subclassing `Model` and you forgot to call `super(YourClass, self).__init__()`. Always start with this line.

Solution 1:

changing code in mrcnn/parallel_model.py as the following:

class ParallelModel(KM.Model):
def __init__(self, keras_model, gpu_count):
"""Class constructor.
keras_model: The Keras model to parallelize
gpu_count: Number of GPUs. Must be > 1
"""
super(ParallelModel, self).__init__()
self.inner_model = keras_model
self.gpu_count = gpu_count
merged_outputs = self.make_parallel()
super(ParallelModel, self).__init__(inputs=self.inner_model.inputs,
outputs=merged_outputs)

When getting this error:

asking for two arguments: inputs and outputs

Just upgrade your Keras to 2.2.4

When getting this error:

No node-device colocations were active during op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0' creation.
Device assignments active during op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0' creation:
with tf.device(/gpu:1): <M:\new\mrcnn\parallel_model.py:70>

No node-device colocations were active during op 'anchors/Variable' creation.
No device assignments were active during op 'anchors/Variable' creation.

Traceback (most recent call last):
File "D:\Anaconda33\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _run_fn
self._extend_graph()
File "D:\Anaconda33\lib\site-packages\tensorflow\python\client\session.py", line 1352, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot colocate nodes {{colocation_node tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0}} and {{colocation_node anchors/Variable}}: Cannot merge devices with incompatible ids: '/device:GPU:0' and '/device:GPU:1'
[[{{node tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0}} = Identity[T=DT_FLOAT, _class=["loc:@anchors/Variable"], _device="/device:GPU:1"](tower_1/mask_rcnn/anchors/Variable/cond/Merge)]] During handling of the above exception, another exception occurred: Traceback (most recent call last):
File "train_mul.py", line 448, in <module>
"mrcnn_bbox", "mrcnn_mask"])
File "M:\new\mrcnn\model.py", line 2132, in load_weights
saving.load_weights_from_hdf5_group_by_name(f, layers)
File "D:\Anaconda33\lib\site-packages\keras\engine\saving.py", line 1022, in load_weights_from_hdf5_group_by_name
K.batch_set_value(weight_value_tuples)
File "D:\Anaconda33\lib\site-packages\keras\backend\tensorflow_backend.py", line 2440, in batch_set_value
get_session().run(assign_ops, feed_dict=feed_dict)
File "D:\Anaconda33\lib\site-packages\keras\backend\tensorflow_backend.py", line 197, in get_session
[tf.is_variable_initialized(v) for v in candidate_vars])
File "D:\Anaconda33\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot colocate nodes node tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0 (defined at M:\new\mrcnn\model.py:1936) having device Device assignments active during op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0' creation:
with tf.device(/gpu:1): <M:\new\mrcnn\parallel_model.py:70> and node anchors/Variable (defined at M:\new\mrcnn\model.py:1936) having device No device assignments were active during op 'anchors/Variable' creation. : Cannot merge devices with incompatible ids: '/device:GPU:0' and '/device:GPU:1'
[[node tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0 (defined at M:\new\mrcnn\model.py:1936) = Identity[T=DT_FLOAT, _class=["loc:@anchors/Variable"], _device="/device:GPU:1"](tower_1/mask_rcnn/anchors/Variable/cond/Merge)]] No node-device colocations were active during op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0' creation.
Device assignments active during op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0' creation:
with tf.device(/gpu:1): <M:\new\mrcnn\parallel_model.py:70> No node-device colocations were active during op 'anchors/Variable' creation.
No device assignments were active during op 'anchors/Variable' creation. Caused by op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0', defined at:
File "train_mul.py", line 417, in <module>
model_dir=MODEL_DIR)
File "M:\new\mrcnn\model.py", line 1839, in __init__
self.keras_model = self.build(mode=mode, config=config)
File "M:\new\mrcnn\model.py", line 2064, in build
model = ParallelModel(model, config.GPU_COUNT)
File "M:\new\mrcnn\parallel_model.py", line 36, in __init__
merged_outputs = self.make_parallel()
File "M:\new\mrcnn\parallel_model.py", line 80, in make_parallel
outputs = self.inner_model(inputs)
File "D:\Anaconda33\lib\site-packages\keras\engine\base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "D:\Anaconda33\lib\site-packages\keras\engine\network.py", line 570, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "D:\Anaconda33\lib\site-packages\keras\engine\network.py", line 724, in run_internal_graph
output_tensors = to_list(layer.call(computed_tensor, **kwargs))
File "D:\Anaconda33\lib\site-packages\keras\layers\core.py", line 682, in call
return self.function(inputs, **arguments)
File "M:\new\mrcnn\model.py", line 1936, in <lambda>
anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 183, in __call__
return cls._variable_v1_call(*args, **kwargs)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 146, in _variable_v1_call
aggregation=aggregation)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 125, in <lambda>
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2444, in default_variable_creator
expected_shape=expected_shape, import_scope=import_scope)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 187, in __call__
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 1329, in __init__
constraint=constraint)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 1480, in _init_from_args
self._initial_value),
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 2177, in _try_guard_against_uninitialized_dependencies
return self._safe_initial_value_from_tensor(initial_value, op_cache={})
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 2195, in _safe_initial_value_from_tensor
new_op = self._safe_initial_value_from_op(op, op_cache)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\ops\variables.py", line 2241, in _safe_initial_value_from_op
name=new_op_name, attrs=op.node_def.attr)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
op_def=op_def)
File "D:\Anaconda33\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): Cannot colocate nodes node tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0 (defined at M:\new\mrcnn\model.py:1936) having device Device assignments active during op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0' creation:
with tf.device(/gpu:1): <M:\new\mrcnn\parallel_model.py:70> and node anchors/Variable (defined at M:\new\mrcnn\model.py:1936) having device No device assignments were active during op 'anchors/Variable' creation. : Cannot merge devices with incompatible ids: '/device:GPU:0' and '/device:GPU:1'
[[node tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0 (defined at M:\new\mrcnn\model.py:1936) = Identity[T=DT_FLOAT, _class=["loc:@anchors/Variable"], _device="/device:GPU:1"](tower_1/mask_rcnn/anchors/Variable/cond/Merge)]] No node-device colocations were active during op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0' creation.
Device assignments active during op 'tower_1/mask_rcnn/anchors/Variable/anchors/Variable/read_tower_1/mask_rcnn/anchors/Variable_0' creation:
with tf.device(/gpu:1): <M:\new\mrcnn\parallel_model.py:70> No node-device colocations were active during op 'anchors/Variable' creation.
No device assignments were active during op 'anchors/Variable' creation.

Adding this line:

import keras.backend.tensorflow_backend as KTF

config = tf.ConfigProto()
config.allow_soft_placement=True
session = tf.Session(config=config)
KTF.set_session(session)

Solution 2:(not recommended)

downgrade Keras to 2.1.3:

conda install keras=2.1.3

(this works for someone but not works for me)

Reference:

https://github.com/matterport/Mask_RCNN/issues/921

https://github.com/tensorflow/tensorflow/issues/2285

Fix multiple GPUs fails in training Mask_RCNN的更多相关文章

  1. HDU 4913 Least common multiple(2014 Multi-University Training Contest 5)

    题意:求所有自己的最小公倍数的和. 该集合是  2^ai  * 3^bi 思路:线段树. 线段树中存的是  [3^b * f(b)]   f(b)表示 因子3 的最小公倍数3的部分  为 3^b的个数 ...

  2. Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles

    作者提出的方法是Algotithm 2.简单来说就是,训练的时候,在几个模型中,选取预测最准确的(也就是loss最低的)模型进行权重更新.

  3. CatBoost使用GPU实现决策树的快速梯度提升CatBoost Enables Fast Gradient Boosting on Decision Trees Using GPUs

    python机器学习-乳腺癌细胞挖掘(博主亲自录制视频)https://study.163.com/course/introduction.htm?courseId=1005269003&ut ...

  4. Training a classifier

    你已经学习了如何定义神经网络,计算损失和执行网络权重的更新. 现在你或许在思考. What about data? 通常当你需要处理图像,文本,音频,视频数据,你能够使用标准的python包将数据加载 ...

  5. 用matlab训练数字分类的深度神经网络Training a Deep Neural Network for Digit Classification

    This example shows how to use Neural Network Toolbox™ to train a deep neural network to classify ima ...

  6. PatentTips - Hierarchical RAID system including multiple RAIDs

    BACKGROUND OF THE INVENTION The present invention relates to a storage system offering large capacit ...

  7. [C4] Andrew Ng - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

    About this Course This course will teach you the "magic" of getting deep learning to work ...

  8. Deep Learning with Torch

    原文地址:https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb Deep Learn ...

  9. VGGNet论文翻译-Very Deep Convolutional Networks for Large-Scale Image Recognition

    Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan[‡] & Andrew Zi ...

随机推荐

  1. N-gram理解

    如何来理解这个概率呢? p( i love you) 如果是 =p(i)p(love)p(you) 就是只考虑单词出现的概率本身. 如果是  =p(i)p(love|i)p(you|love)  就是 ...

  2. Nginx反向代理其他使用方式

    Nginx反向代理在生产环境中使用很多的. 场景1: 域名没有备案,可以把域名解析到香港一台云主机上,在香港云主机做个代理,而网站数据是在大陆的服务器上. 示例1: server { listen 8 ...

  3. vue中进行窗口变化的监听

    今天vue项目中用到的元素的宽度依赖与窗口的宽度,所以在进行宽度设置的时候涉及到窗口的变化,因为元素的宽度要随着窗口变化 分成几个步骤来实现这一过程 1.首先元素的宽度依赖与窗口的宽度,就需要有接受窗 ...

  4. IVS_原理

    智能视频分析技术指计算机图像视觉分析技术,是人工智能研究的一个分支,它在图像及图像描述之间建立映射关系,从而使计算机能够通过数字图像处理和分析来理解视频画面中的内容.智能视频分析技术涉及到模式识别.机 ...

  5. 如何提交多个具有相同name属性的表单

    有的时候我们会遇到这样一个需求,一个用户页面中有多条履历信息,每条履历信息对应数据表中的一条记录,用户可以进行添加或修改,点击保存时同时提交到了后台.有两个难点:1.前端怎样一次性提交多条履历信息?2 ...

  6. Odoo仪表盘详解

    转载请注明原文地址:https://www.cnblogs.com/ygj0930/p/10826324.html 一:仪表盘与看板的区别 kanban:kanban是一种视图类型,卡片式视图.可以为 ...

  7. 提高用git下载代码时的成功率

    在用git clone下载一些比较大的仓库时,经常会遇到由于仓库体积过大,网络也不稳定,导致下了半截就中断了,可以参考如下的下载方法. 先用创建一个空目录,然后用git init初始化,然后用git ...

  8. 4.LVS的三种工作模式_DR模式

    1.DR模式(直接路由模式:Virtual Server via Direct Routing) DR模式是通过改写请求报文的目标MAC地址,将请求发给真实服务器的,而真实服务器响应后的处理结果直接返 ...

  9. centos7服务器部署django项目。

    用到的工具,xftp(文件互传),xshell(远程连接) aliyun服务器防火墙开启的端口.80,22(ssh),3306(mysql),8000,9090 部署项目: 1,安装nginx 1&g ...

  10. spring Cloud Feign作为HTTP客户端调用远程HTTP服务

    在Spring Cloud Netflix栈中,各个微服务都是以HTTP接口的形式暴露自身服务的,因此在调用远程服务时就必须使用HTTP客户端.我们可以使用JDK原生的URLConnection.Ap ...