一、模块概述

上节的最后,我们进行了如下操作获取了有限的proposal,

  1. # [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)]
  2. # IMAGES_PER_GPU取代了batch,之后说的batch都是IMAGES_PER_GPU
  3. rpn_rois = ProposalLayer(
  4. proposal_count=proposal_count,
  5. nms_threshold=config.RPN_NMS_THRESHOLD, # 0.7
  6. name="ROI",
  7. config=config)([rpn_class, rpn_bbox, anchors])

总结一下:

与 GT 的 IOU 大于0.7

与某一个 GT 的 IOU 最大的那个 anchor

进一步,我们需要按照RCNN的思路,使用proposal对共享特征进行ROI操作,在Mask-RCNN中这里有两个创新:

ROI使用ROI Align取代了之前的ROI Pooling

共享特征由之前的单层变换为了FPN得到的金字塔多层特征,即:mrcnn_feature_maps = [P2, P3, P4, P5]

其中创新点2意味着我们不同的proposal对应去ROI的特征层并不相同,所以,我们需要:

按照proposal的长宽,将不同的proposal对应给不同的特征层

在对应特征层上进行ROI操作

二、实现分析

下面会用到高维切片函数,这里先行给出讲解链接:『TensorFlow』高级高维切片gather_nd

接前文bulid函数代码,我们如下调入实现本节的功能,

  1. if mode == "training":
  2. ……
  3. else:
  4. # Network Heads
  5. # Proposal classifier and BBox regressor heads
  6. # output shapes:
  7. # mrcnn_class_logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
  8. # mrcnn_class: [batch, num_rois, NUM_CLASSES] classifier probabilities
  9. # mrcnn_bbox(deltas): [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
  10. mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
  11. fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, input_image_meta,
  12. config.POOL_SIZE, config.NUM_CLASSES,
  13. train_bn=config.TRAIN_BN,
  14. fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)

FPN特征层分类函数纵览如下,

  1. ############################################################
  2. # Feature Pyramid Network Heads
  3. ############################################################
  4.  
  5. def fpn_classifier_graph(rois, feature_maps, image_meta,
  6. pool_size, num_classes, train_bn=True,
  7. fc_layers_size=1024):
  8. """Builds the computation graph of the feature pyramid network classifier
  9. and regressor heads.
  10.  
  11. rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
  12. coordinates.
  13. feature_maps: List of feature maps from different layers of the pyramid,
  14. [P2, P3, P4, P5]. Each has a different resolution.
  15. image_meta: [batch, (meta data)] Image details. See compose_image_meta()
  16. pool_size: The width of the square feature map generated from ROI Pooling.
  17. num_classes: number of classes, which determines the depth of the results
  18. train_bn: Boolean. Train or freeze Batch Norm layers
  19. fc_layers_size: Size of the 2 FC layers
  20.  
  21. Returns:
  22. logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
  23. probs: [batch, num_rois, NUM_CLASSES] classifier probabilities
  24. bbox_deltas: [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] Deltas to apply to
  25. proposal boxes
  26. """
  27. # ROI Pooling
  28. # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]
  29. x = PyramidROIAlign([pool_size, pool_size],
  30. name="roi_align_classifier")([rois, image_meta] + feature_maps)
  31.  
  32. # Two 1024 FC layers (implemented with Conv2D for consistency)
  33. # TimeDistributed拆分了输入数据的第1维(从0开始),将完全一样的模型独立的应用于拆分后的输入数据,具体到下行,
  34. # 就是将num_rois个卷积应用到num_rois个维度为[batch, POOL_SIZE, POOL_SIZE, channels]的输入,结果合并
  35. x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),
  36. name="mrcnn_class_conv1")(x) # [batch, num_rois, 1, 1, 1024]
  37. x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn)
  38. x = KL.Activation('relu')(x)
  39. x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),
  40. name="mrcnn_class_conv2")(x)
  41. x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn)
  42. x = KL.Activation('relu')(x)
  43.  
  44. shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
  45. name="pool_squeeze")(x) # [batch, num_rois, 1024]
  46.  
  47. # Classifier head
  48. mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
  49. name='mrcnn_class_logits')(shared)
  50. mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
  51. name="mrcnn_class")(mrcnn_class_logits)
  52.  
  53. # BBox head
  54. # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]
  55. x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
  56. name='mrcnn_bbox_fc')(shared)
  57. # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
  58. s = K.int_shape(x)
  59. # 下行源码:K.reshape(inputs, (K.shape(inputs)[0],) + self.target_shape)
  60. mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
  61.  
  62. return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox

下面我们来分析一下该函数。进入函数,首先调用了PyramidROI,

  1. # ROI Pooling
  2. # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]
  3. x = PyramidROIAlign([pool_size, pool_size],
  4. name="roi_align_classifier")([rois, image_meta] + feature_maps)

这个class基本实现了我们开篇所说的全部功能,即特征层分类并ROI。

ROIAlign类

首先我们依据『计算机视觉』FPN特征金字塔网络中第三节所讲,对proposal进行分类,注意的是我们使用于网络中的hw是归一化了的(以原图hw为单位长度),所以计算时需要还原(对于公式而言:w,h分别表示宽度和高度;k是分配RoI的level;是w,h=224,224时映射的level)。

注意两个操作节点:level_boxes和box_indices,第一个记录了对应的level特征层中分配到的每个box的坐标,第二个则记录了每个box对应的图片在batch中的索引(一个记录了候选框索引对应的图片即下文中的两个大块,一个记录了候选框的索引对应其坐标即小黑框的坐标),两者结合可以索引到下面每个黑色小框的坐标信息。

至于ROI Align本身,实际就是双线性插值,使用内置API实现即可。

这里属于RPN网络和RCNN网络的分界线,level_boxes和box_indices本身属于RPN计算出来结果,但是两者作用于feature后的输出Tensor却是RCNN部分的输入,但是两部分的梯度不能相互流通的,所以需要tf.stop_gradient()截断梯度传播。

  1. ############################################################
  2. # ROIAlign Layer
  3. ############################################################
  4.  
  5. def log2_graph(x):
  6. """Implementation of Log2. TF doesn't have a native implementation."""
  7. return tf.log(x) / tf.log(2.0)
  8.  
  9. class PyramidROIAlign(KE.Layer):
  10. """Implements ROI Pooling on multiple levels of the feature pyramid.
  11.  
  12. Params:
  13. - pool_shape: [pool_height, pool_width] of the output pooled regions. Usually [7, 7]
  14.  
  15. Inputs:
  16. - boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized
  17. coordinates. Possibly padded with zeros if not enough
  18. boxes to fill the array.
  19. - image_meta: [batch, (meta data)] Image details. See compose_image_meta()
  20. - feature_maps: List of feature maps from different levels of the pyramid.
  21. Each is [batch, height, width, channels]
  22.  
  23. Output:
  24. Pooled regions in the shape: [batch, num_boxes, pool_height, pool_width, channels].
  25. The width and height are those specific in the pool_shape in the layer
  26. constructor.
  27. """
  28.  
  29. def __init__(self, pool_shape, **kwargs):
  30. super(PyramidROIAlign, self).__init__(**kwargs)
  31. self.pool_shape = tuple(pool_shape)
  32.  
  33. def call(self, inputs):
  34. # num_boxes指的是proposal数目,它们均会作用于每张图片上,只是不同的proposal作用于图片
  35. # 的特征级别不同,我通过循环特征层寻找符合的proposal,应用ROIAlign
  36. # Crop boxes [batch, num_boxes, (y1, x1, y2, x2)] in normalized coords
  37. boxes = inputs[0]
  38.  
  39. # Image meta
  40. # Holds details about the image. See compose_image_meta()
  41. image_meta = inputs[1]
  42.  
  43. # Feature Maps. List of feature maps from different level of the
  44. # feature pyramid. Each is [batch, height, width, channels]
  45. feature_maps = inputs[2:]
  46.  
  47. # Assign each ROI to a level in the pyramid based on the ROI area.
  48. y1, x1, y2, x2 = tf.split(boxes, 4, axis=2)
  49. h = y2 - y1
  50. w = x2 - x1
  51. # Use shape of first image. Images in a batch must have the same size.
  52. image_shape = parse_image_meta_graph(image_meta)['image_shape'][0] # h, w, c
  53. # Equation 1 in the Feature Pyramid Networks paper. Account for
  54. # the fact that our coordinates are normalized here.
  55. # e.g. a 224x224 ROI (in pixels) maps to P4
  56. image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32)
  57. roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area))) # h、w已经归一化
  58. roi_level = tf.minimum(5, tf.maximum(
  59. 2, 4 + tf.cast(tf.round(roi_level), tf.int32))) # 确保值位于2到5之间
  60. roi_level = tf.squeeze(roi_level, 2) # [batch, num_boxes]
  61.  
  62. # Loop through levels and apply ROI pooling to each. P2 to P5.
  63. pooled = []
  64. box_to_level = []
  65. for i, level in enumerate(range(2, 6)):
  66. # tf.where 返回值格式 [坐标1, 坐标2……]
  67. # np.where 返回值格式 [[坐标1.x, 坐标2.x……], [坐标1.y, 坐标2.y……]]
  68. ix = tf.where(tf.equal(roi_level, level)) # 返回坐标表示:第n张图片的第i个proposal
  69. level_boxes = tf.gather_nd(boxes, ix) # [本level的proposal数目, 4]
  70.  
  71. # Box indices for crop_and_resize.
  72. box_indices = tf.cast(ix[:, 0], tf.int32) # 记录每个propose对应图片序号
  73.  
  74. # Keep track of which box is mapped to which level
  75. box_to_level.append(ix)
  76.  
  77. # Stop gradient propogation to ROI proposals
  78. level_boxes = tf.stop_gradient(level_boxes)
  79. box_indices = tf.stop_gradient(box_indices)
  80.  
  81. # Crop and Resize
  82. # From Mask R-CNN paper: "We sample four regular locations, so
  83. # that we can evaluate either max or average pooling. In fact,
  84. # interpolating only a single value at each bin center (without
  85. # pooling) is nearly as effective."
  86. #
  87. # Here we use the simplified approach of a single value per bin,
  88. # which is how it's done in tf.crop_and_resize()
  89. # Result: [this_level_num_boxes, pool_height, pool_width, channels]
  90. pooled.append(tf.image.crop_and_resize(
  91. feature_maps[i], level_boxes, box_indices, self.pool_shape,
  92. method="bilinear"))
  93. # 输入参数shape:
  94. # [batch, image_height, image_width, channels]
  95. # [this_level_num_boxes, 4]
  96. # [this_level_num_boxes]
  97. # [height, pool_width]
  98.  
  99. # Pack pooled features into one tensor
  100. pooled = tf.concat(pooled, axis=0) # [batch*num_boxes, pool_height, pool_width, channels]
  101.  
  102. # Pack box_to_level mapping into one array and add another
  103. # column representing the order of pooled boxes
  104. box_to_level = tf.concat(box_to_level, axis=0) # [batch*num_boxes, 2]
  105. box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1) # [batch*num_boxes, 1]
  106. box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range],
  107. axis=1) # [batch*num_boxes, 3]
  108.  
  109. # 截止到目前,我们获取了记录全部ROIAlign结果feat集合的张量pooled,和记录这些feat相关信息的张量box_to_level,
  110. # 由于提取方法的原因,此时的feat并不是按照原始顺序排序(先按batch然后按box index排序),下面我们设法将之恢复顺
  111. # 序(ROIAlign作用于对应图片的对应proposal生成feat)
  112. # Rearrange pooled features to match the order of the original boxes
  113. # Sort box_to_level by batch then box index
  114. # TF doesn't have a way to sort by two columns, so merge them and sort.
  115. # box_to_level[i, 0]表示的是当前feat隶属的图片索引,box_to_level[i, 1]表示的是其box序号
  116. sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1] # [batch*num_boxes]
  117. ix = tf.nn.top_k(sorting_tensor, k=tf.shape(
  118. box_to_level)[0]).indices[::-1]
  119. ix = tf.gather(box_to_level[:, 2], ix)
  120. pooled = tf.gather(pooled, ix)
  121.  
  122. # Re-add the batch dimension
  123. # [batch, num_boxes, (y1, x1, y2, x2)], [batch*num_boxes, pool_height, pool_width, channels]
  124. shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0)
  125. pooled = tf.reshape(pooled, shape)
  126. return pooled # [batch, num_boxes, pool_height, pool_width, channels]

初步分类/回归

经过ROI之后,我们获取了众多shape一致的小feat,为了获取他们的分类回归信息,我们构建一系列并行的网络进行处理,

  1. # Two 1024 FC layers (implemented with Conv2D for consistency)
  2. # TimeDistributed拆分了输入数据的第1维(从0开始),将完全一样的模型独立的应用于拆分后的输入数据,具体到下行,
  3. # 就是将num_rois个卷积应用到num_rois个维度为[batch, POOL_SIZE, POOL_SIZE, channels]的输入,结果合并
  4. x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),
  5. name="mrcnn_class_conv1")(x) # [batch, num_rois, 1, 1, 1024]
  6. x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn)
  7. x = KL.Activation('relu')(x)
  8. x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),
  9. name="mrcnn_class_conv2")(x)
  10. x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn)
  11. x = KL.Activation('relu')(x)
  12.  
  13. shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
  14. name="pool_squeeze")(x) # [batch, num_rois, 1024]
  15.  
  16. # Classifier head
  17. mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
  18. name='mrcnn_class_logits')(shared)
  19. mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
  20. name="mrcnn_class")(mrcnn_class_logits)
  21.  
  22. # BBox head
  23. # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]
  24. x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
  25. name='mrcnn_bbox_fc')(shared)
  26. # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
  27. s = K.int_shape(x)
  28. # 下行源码:K.reshape(inputs, (K.shape(inputs)[0],) + self.target_shape)
  29. mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
  30.  
  31. return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox

返回如下:

mrcnn_class_logits:      [batch, num_rois, NUM_CLASSES]    classifier logits (before softmax)
mrcnn_class:                [batch, num_rois, NUM_CLASSES]    classifier probabilities
mrcnn_bbox(deltas):    [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]

KL.TimeDistributed实现建立一系列同样架构的的并行网络结构(dim0个),将[dim0, dim1, ……]中的每个[dim1, ……]作为输入,并行的计算输出。

附、build函数总览

  1. def build(self, mode, config):
  2. """Build Mask R-CNN architecture.
  3. input_shape: The shape of the input image.
  4. mode: Either "training" or "inference". The inputs and
  5. outputs of the model differ accordingly.
  6. """
  7. assert mode in ['training', 'inference']
  8.  
  9. # Image size must be dividable by 2 multiple times
  10. h, w = config.IMAGE_SHAPE[:2] # [1024 1024 3]
  11. if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6): # 这里就限定了下采样不会产生坐标误差
  12. raise Exception("Image size must be dividable by 2 at least 6 times "
  13. "to avoid fractions when downscaling and upscaling."
  14. "For example, use 256, 320, 384, 448, 512, ... etc. ")
  15.  
  16. # Inputs
  17. input_image = KL.Input(
  18. shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")
  19. input_image_meta = KL.Input(shape=[config.IMAGE_META_SIZE],
  20. name="input_image_meta")
  21. if mode == "training":
  22. # RPN GT
  23. input_rpn_match = KL.Input(
  24. shape=[None, 1], name="input_rpn_match", dtype=tf.int32)
  25. input_rpn_bbox = KL.Input(
  26. shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)
  27.  
  28. # Detection GT (class IDs, bounding boxes, and masks)
  29. # 1. GT Class IDs (zero padded)
  30. input_gt_class_ids = KL.Input(
  31. shape=[None], name="input_gt_class_ids", dtype=tf.int32)
  32. # 2. GT Boxes in pixels (zero padded)
  33. # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates
  34. input_gt_boxes = KL.Input(
  35. shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)
  36. # Normalize coordinates
  37. gt_boxes = KL.Lambda(lambda x: norm_boxes_graph(
  38. x, K.shape(input_image)[1:3]))(input_gt_boxes)
  39. # 3. GT Masks (zero padded)
  40. # [batch, height, width, MAX_GT_INSTANCES]
  41. if config.USE_MINI_MASK:
  42. input_gt_masks = KL.Input(
  43. shape=[config.MINI_MASK_SHAPE[0],
  44. config.MINI_MASK_SHAPE[1], None],
  45. name="input_gt_masks", dtype=bool)
  46. else:
  47. input_gt_masks = KL.Input(
  48. shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],
  49. name="input_gt_masks", dtype=bool)
  50. elif mode == "inference":
  51. # Anchors in normalized coordinates
  52. input_anchors = KL.Input(shape=[None, 4], name="input_anchors")
  53.  
  54. # Build the shared convolutional layers.
  55. # Bottom-up Layers
  56. # Returns a list of the last layers of each stage, 5 in total.
  57. # Don't create the thead (stage 5), so we pick the 4th item in the list.
  58. if callable(config.BACKBONE):
  59. _, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,
  60. train_bn=config.TRAIN_BN)
  61. else:
  62. _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,
  63. stage5=True, train_bn=config.TRAIN_BN)
  64. # Top-down Layers
  65. # TODO: add assert to varify feature map sizes match what's in config
  66. P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5) # 256
  67. P4 = KL.Add(name="fpn_p4add")([
  68. KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
  69. KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])
  70. P3 = KL.Add(name="fpn_p3add")([
  71. KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
  72. KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])
  73. P2 = KL.Add(name="fpn_p2add")([
  74. KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
  75. KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])
  76. # Attach 3x3 conv to all P layers to get the final feature maps.
  77. P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
  78. P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
  79. P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
  80. P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
  81. # P6 is used for the 5th anchor scale in RPN. Generated by
  82. # subsampling from P5 with stride of 2.
  83. P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)
  84.  
  85. # Note that P6 is used in RPN, but not in the classifier heads.
  86. rpn_feature_maps = [P2, P3, P4, P5, P6]
  87. mrcnn_feature_maps = [P2, P3, P4, P5]
  88.  
  89. # Anchors
  90. if mode == "training":
  91. anchors = self.get_anchors(config.IMAGE_SHAPE)
  92. # Duplicate across the batch dimension because Keras requires it
  93. # TODO: can this be optimized to avoid duplicating the anchors?
  94. anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
  95. # A hack to get around Keras's bad support for constants
  96. anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
  97. else:
  98. anchors = input_anchors
  99.  
  100. # RPN Model, 返回的是keras的Module对象, 注意keras中的Module对象是可call的
  101. rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, # 1 3 256
  102. len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)
  103. # Loop through pyramid layers
  104. layer_outputs = [] # list of lists
  105. for p in rpn_feature_maps:
  106. layer_outputs.append(rpn([p])) # 保存各pyramid特征经过RPN之后的结果
  107. # Concatenate layer outputs
  108. # Convert from list of lists of level outputs to list of lists
  109. # of outputs across levels.
  110. # e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]
  111. output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
  112. outputs = list(zip(*layer_outputs)) # [[logits2,……6], [class2,……6], [bbox2,……6]]
  113. outputs = [KL.Concatenate(axis=1, name=n)(list(o))
  114. for o, n in zip(outputs, output_names)]
  115.  
  116. # [batch, num_anchors, 2/4]
  117. # 其中num_anchors指的是全部特征层上的anchors总数
  118. rpn_class_logits, rpn_class, rpn_bbox = outputs
  119.  
  120. # Generate proposals
  121. # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
  122. # and zero padded.
  123. # POST_NMS_ROIS_INFERENCE = 1000
  124. # POST_NMS_ROIS_TRAINING = 2000
  125. proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
  126. else config.POST_NMS_ROIS_INFERENCE
  127. # [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)]
  128. # IMAGES_PER_GPU取代了batch,之后说的batch都是IMAGES_PER_GPU
  129. rpn_rois = ProposalLayer(
  130. proposal_count=proposal_count,
  131. nms_threshold=config.RPN_NMS_THRESHOLD, # 0.7
  132. name="ROI",
  133. config=config)([rpn_class, rpn_bbox, anchors])
  134.  
  135. if mode == "training":
  136. # Class ID mask to mark class IDs supported by the dataset the image
  137. # came from.
  138. active_class_ids = KL.Lambda(
  139. lambda x: parse_image_meta_graph(x)["active_class_ids"]
  140. )(input_image_meta)
  141.  
  142. if not config.USE_RPN_ROIS:
  143. # Ignore predicted ROIs and use ROIs provided as an input.
  144. input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4],
  145. name="input_roi", dtype=np.int32)
  146. # Normalize coordinates
  147. target_rois = KL.Lambda(lambda x: norm_boxes_graph(
  148. x, K.shape(input_image)[1:3]))(input_rois)
  149. else:
  150. target_rois = rpn_rois
  151.  
  152. # Generate detection targets
  153. # Subsamples proposals and generates target outputs for training
  154. # Note that proposal class IDs, gt_boxes, and gt_masks are zero
  155. # padded. Equally, returned rois and targets are zero padded.
  156. rois, target_class_ids, target_bbox, target_mask =\
  157. DetectionTargetLayer(config, name="proposal_targets")([
  158. target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])
  159.  
  160. # Network Heads
  161. # TODO: verify that this handles zero padded ROIs
  162. mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
  163. fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,
  164. config.POOL_SIZE, config.NUM_CLASSES,
  165. train_bn=config.TRAIN_BN,
  166. fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
  167.  
  168. mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps,
  169. input_image_meta,
  170. config.MASK_POOL_SIZE,
  171. config.NUM_CLASSES,
  172. train_bn=config.TRAIN_BN)
  173.  
  174. # TODO: clean up (use tf.identify if necessary)
  175. output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois)
  176.  
  177. # Losses
  178. rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
  179. [input_rpn_match, rpn_class_logits])
  180. rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
  181. [input_rpn_bbox, input_rpn_match, rpn_bbox])
  182. class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
  183. [target_class_ids, mrcnn_class_logits, active_class_ids])
  184. bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
  185. [target_bbox, target_class_ids, mrcnn_bbox])
  186. mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
  187. [target_mask, target_class_ids, mrcnn_mask])
  188.  
  189. # Model
  190. inputs = [input_image, input_image_meta,
  191. input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]
  192. if not config.USE_RPN_ROIS:
  193. inputs.append(input_rois)
  194. outputs = [rpn_class_logits, rpn_class, rpn_bbox,
  195. mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask,
  196. rpn_rois, output_rois,
  197. rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]
  198. model = KM.Model(inputs, outputs, name='mask_rcnn')
  199. else:
  200. # Network Heads
  201. # Proposal classifier and BBox regressor heads
  202. # output shapes:
  203. # mrcnn_class_logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
  204. # mrcnn_class: [batch, num_rois, NUM_CLASSES] classifier probabilities
  205. # mrcnn_bbox(deltas): [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
  206. mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
  207. fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, input_image_meta,
  208. config.POOL_SIZE, config.NUM_CLASSES,
  209. train_bn=config.TRAIN_BN,
  210. fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
  211.  
  212. # Detections
  213. # output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in
  214. # normalized coordinates
  215. detections = DetectionLayer(config, name="mrcnn_detection")(
  216. [rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta])
  217.  
  218. # Create masks for detections
  219. detection_boxes = KL.Lambda(lambda x: x[..., :4])(detections)
  220. mrcnn_mask = build_fpn_mask_graph(detection_boxes, mrcnn_feature_maps,
  221. input_image_meta,
  222. config.MASK_POOL_SIZE,
  223. config.NUM_CLASSES,
  224. train_bn=config.TRAIN_BN)
  225.  
  226. model = KM.Model([input_image, input_image_meta, input_anchors],
  227. [detections, mrcnn_class, mrcnn_bbox,
  228. mrcnn_mask, rpn_rois, rpn_class, rpn_bbox],
  229. name='mask_rcnn')
  230.  
  231. # Add multi-GPU support.
  232. if config.GPU_COUNT > 1:
  233. from mrcnn.parallel_model import ParallelModel
  234. model = ParallelModel(model, config.GPU_COUNT)
  235.  
  236. return model

『计算机视觉』Mask-RCNN_推断网络其四:FPN和ROIAlign的耦合的更多相关文章

  1. 『计算机视觉』经典RCNN_其二:Faster-RCNN

    项目源码 一.Faster-RCNN简介 『cs231n』Faster_RCNN 『计算机视觉』Faster-RCNN学习_其一:目标检测及RCNN谱系 一篇讲的非常明白的文章:一文读懂Faster ...

  2. 『计算机视觉』经典RCNN_其一:从RCNN到Faster-RCNN

    RCNN介绍 目标检测-RCNN系列 一文读懂Faster RCNN 一.目标检测 1.两个任务 目标检测可以拆分成两个任务:识别和定位 图像识别(classification)输入:图片输出:物体的 ...

  3. 『计算机视觉』Mask-RCNN

    一.Mask-RCNN流程 Mask R-CNN是一个实例分割(Instance segmentation)算法,通过增加不同的分支,可以完成目标分类.目标检测.语义分割.实例分割.人体姿势识别等多种 ...

  4. 『计算机视觉』Mask-RCNN_推断网络其六:Mask生成

    一.Mask生成概览 上一节的末尾,我们已经获取了待检测图片的分类回归信息,我们将回归信息(即待检测目标的边框信息)单独提取出来,结合金字塔特征mrcnn_feature_maps,进行Mask生成工 ...

  5. 『计算机视觉』Mask-RCNN_推断网络终篇:使用detect方法进行推断

    一.detect和build 前面多节中我们花了大量笔墨介绍build方法的inference分支,这节我们看看它是如何被调用的. 在dimo.ipynb中,涉及model的操作我们简单进行一下汇总, ...

  6. 『计算机视觉』Mask-RCNN_推断网络其二:基于ReNet101的FPN共享网络暨TensorFlow和Keras交互简介

    零.参考资料 有关FPN的介绍见『计算机视觉』FPN特征金字塔网络. 网络构架部分代码见Mask_RCNN/mrcnn/model.py中class MaskRCNN的build方法的"in ...

  7. 『计算机视觉』Mask-RCNN_推断网络其三:RPN锚框处理和Proposal生成

    一.RPN锚框信息生成 上文的最后,我们生成了用于计算锚框信息的特征(源代码在inference模式中不进行锚框生成,而是外部生成好feed进网络,training模式下在向前传播时直接生成锚框,不过 ...

  8. 『计算机视觉』Mask-RCNN_训练网络其三:训练Model

    Github地址:Mask_RCNN 『计算机视觉』Mask-RCNN_论文学习 『计算机视觉』Mask-RCNN_项目文档翻译 『计算机视觉』Mask-RCNN_推断网络其一:总览 『计算机视觉』M ...

  9. 『计算机视觉』Mask-RCNN_训练网络其二:train网络结构&损失函数

    Github地址:Mask_RCNN 『计算机视觉』Mask-RCNN_论文学习 『计算机视觉』Mask-RCNN_项目文档翻译 『计算机视觉』Mask-RCNN_推断网络其一:总览 『计算机视觉』M ...

随机推荐

  1. JPA原理与实践、多数据源配置

    参考博客: https://segmentfault.com/a/1190000015047290?utm_source=Weibo&utm_medium=shareLink&utm_ ...

  2. Elasticsearch 异常处理

    cluster_block_exception https://stackoverflow.com/questions/50609417/elasticsearch-error-cluster-blo ...

  3. PKM(个人知识管理)类软件收集(偶尔更新列表)

    evernote(印象笔记) Wiz 有道云 麦库 leanote GoogleKeep OneNote SimpleNote(wp家的,免费) pocket(稍后读的软件,同类的还有Instapap ...

  4. 【索引失效】什么情况下会引起MySQL索引失效

    索引并不是时时都会生效的,比如以下几种情况,将导致索引失效: 1.如果条件中有or,即使其中有条件带索引也不会使用(这也是为什么尽量少用or的原因) 注意:要想使用or,又想让索引生效,只能将or条件 ...

  5. Python入门 值内存管理与所有的关键字

    值内存管理 Python采用的是基于值得内存管理方式,如果为不同变量赋值为相同值,这个值在内存中只有一份,多个变量指向同一块内存地址. id(x) : 用于返回变量所指值的内存地址 x = 3 pri ...

  6. 两个线程分别打印 1- 100,A 打印偶数, B打印奇数。

    1. 直接用CAS中的AtomicInteger package concurency.chapter13; import java.util.concurrent.atomic.AtomicInte ...

  7. spring boot + session+redis解决session共享问题

    自己没有亲自试过,不过看了下这个例子感觉靠谱,以后做了测试,在加以说明. PS:后期经验证,上面例子可行.我们平时存session里面的值,直接存在了redis里面了.

  8. Visual Question Answering with Memory-Augmented Networks

    Visual Question Answering with Memory-Augmented Networks 2018-05-15 20:15:03 Motivation: 虽然 VQA 已经取得 ...

  9. 论文笔记之:Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach

    Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach  2017.11.28 Introductio ...

  10. [转载]Linux中的网络接口及LO回环接口

    转自:https://blog.csdn.net/weixin_39863747/article/details/80564358 Linux中的网络接口及LO回环接口 2018年06月04日 10: ...