maskrcnn_benchmark代码分析(2)

maskrcnn_benchmark训练过程

->训练命令：

python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

->调用train_net.py，在train()函数中建立模型，优化器，dataloader,checkpointerd等，进入trainer.py核心训练代码：

def do_train(

    model,

    data_loader,

    optimizer,

    scheduler,

    checkpointer,

    device,

    checkpoint_period,

    arguments,

):

    logger = logging.getLogger("maskrcnn_benchmark.trainer")

    logger.info("Start training")

    meters = MetricLogger(delimiter="  ")

    max_iter = len(data_loader)

    start_iter = arguments["iteration"]

    model.train()

    start_training_time = time.time()

    end = time.time()

    for iteration, (images, targets, _) in enumerate(data_loader, start_iter):

        data_time = time.time() - end

        arguments["iteration"] = iteration

        scheduler.step()

        images = images.to(device)

        targets = [target.to(device) for target in targets]

        loss_dict = model(images, targets)

        ipdb.set_trace()

        losses = sum(loss for loss in loss_dict.values())

        # reduce losses over all GPUs for logging purposes

        loss_dict_reduced = reduce_loss_dict(loss_dict)

        losses_reduced = sum(loss for loss in loss_dict_reduced.values())

        meters.update(loss=losses_reduced, **loss_dict_reduced)

        optimizer.zero_grad()

        losses.backward()

        optimizer.step()

        batch_time = time.time() - end

        end = time.time()

        meters.update(time=batch_time, data=data_time)

        eta_seconds = meters.time.global_avg * (max_iter - iteration)

        eta_string = str(datetime.timedelta(seconds=int(eta_seconds)))

        if iteration % 20 == 0 or iteration == (max_iter - 1):

            logger.info(

                meters.delimiter.join(

                    [

                        "eta: {eta}",

                        "iter: {iter}",

                        "{meters}",

                        "lr: {lr:.6f}",

                        "max mem: {memory:.0f}",

                    ]

                ).format(

                    eta=eta_string,

                    iter=iteration,

                    meters=str(meters),

                    lr=optimizer.param_groups[0]["lr"],

                    memory=torch.cuda.max_memory_allocated() / 1024.0 / 1024.0,

                )

            )

        if iteration % checkpoint_period == 0 and iteration > 0:

            checkpointer.save("model_{:07d}".format(iteration), **arguments)

    checkpointer.save("model_{:07d}".format(iteration), **arguments)

    total_training_time = time.time() - start_training_time

    total_time_str = str(datetime.timedelta(seconds=total_training_time))

    logger.info(

        "Total training time: {} ({:.4f} s / it)".format(

            total_time_str, total_training_time / (max_iter)

        )

    )

->输出一次迭代，变量过程，target为batch=2的gt图像：

ipdb> loss_dict

{'loss_box_reg': tensor(0.1005, device='cuda:0', grad_fn=<DivBackward0>), 'loss_rpn_box_reg': tensor(0.0486, device='cuda:0', grad_fn=<DivBackward0>), 'loss_objectness': tensor(0.0165, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_classifier': tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_mask': tensor(0.2332, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)}

ipdb> images

<maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f9cb9190668>

ipdb> targets

[BoxList(num_boxes=3, image_width=1066, image_height=800, mode=xyxy), BoxList(num_boxes=17, image_width=1199, image_height=800, mode=xyxy)]

进入model内部进行：

->在generalized_rcnn.py中经过backbone网络提取特征feature:features = self.backbone(images.tensors)

ipdb> features[0].size()

torch.Size([2, 256, 200, 336])

ipdb> features[1].size()

torch.Size([2, 256, 100, 168])

ipdb> features[2].size()

torch.Size([2, 256, 50, 84])

ipdb> features[3].size()

torch.Size([2, 256, 25, 42])

ipdb> features[4].size()

torch.Size([2, 256, 13, 21])

RNP网络

->proposals, proposal_losses = self.rpn(images, features, targets)

    def forward(self, images, features, targets=None):

        """

        Arguments:

            images (ImageList): images for which we want to compute the predictions

            features (list[Tensor]): features computed from the images that are

                used for computing the predictions. Each tensor in the list

                correspond to different feature levels

            targets (list[BoxList): ground-truth boxes present in the image (optional)

        Returns:

            boxes (list[BoxList]): the predicted boxes from the RPN, one BoxList per

                image.

            losses (dict[Tensor]): the losses for the model during training. During

                testing, it is an empty dict.

        """

        objectness, rpn_box_regression = self.head(features)

        anchors = self.anchor_generator(images, features)

        if self.training:

            return self._forward_train(anchors, objectness, rpn_box_regression, targets)

        else:

            return self._forward_test(anchors, objectness, rpn_box_regression)

   def _forward_train(self, anchors, objectness, rpn_box_regression, targets):
    if self.cfg.MODEL.RPN_ONLY:
        # When training an RPN-only model, the loss is determined by the
        # predicted objectness and rpn_box_regression values and there is
        # no need to transform the anchors into predicted boxes; this is an
        # optimization that avoids the unnecessary transformation.
        boxes = anchors
    else:
        # For end-to-end models, anchors must be transformed into boxes and
        # sampled into a training batch.
        with torch.no_grad():
            boxes = self.box_selector_train(
                anchors, objectness, rpn_box_regression, targets
            )
    loss_objectness, loss_rpn_box_reg = self.loss_evaluator(
        anchors, objectness, rpn_box_regression, targets
    )
    losses = {
        "loss_objectness": loss_objectness,
        "loss_rpn_box_reg": loss_rpn_box_reg,
    }
    return boxes, losses

->首先所有feature通过rpn_head网络（3×3+1×1分类与回归）得到结果;然后和生成的anchor进行算loss

->objectness, rpn_box_regression = self.head(features)返回5个stage下回归和分类的结果，每个等级3个anchor

ipdb> objectness[0].size()

torch.Size([2, 3, 200, 336]) =200*336*3=201600

ipdb> objectness[1].size()

torch.Size([2, 3, 100, 168])

ipdb> objectness[2].size()

torch.Size([2, 3, 50, 84])

ipdb> objectness[3].size()

torch.Size([2, 3, 25, 42])

ipdb> objectness[4].size()

torch.Size([2, 3, 13, 21])

ipdb> objectness[5].size()

*** IndexError: list index out of range

ipdb> rpn_box_regression[0].size()

torch.Size([2, 12, 200, 336])

ipdb> rpn_box_regression[4].size()

torch.Size([2, 12, 13, 21])

-> anchors = self.anchor_generator(images, features)生成anchor

ipdb> anchors[1][0]

BoxList(num_boxes=201600, image_width=1204, image_height=800, mode=xyxy)

ipdb> anchors[1][1]

BoxList(num_boxes=50400, image_width=1204, image_height=800, mode=xyxy)

ipdb> anchors[0][1]

BoxList(num_boxes=50400, image_width=1333, image_height=794, mode=xyxy)

ipdb> anchors[1][2]

BoxList(num_boxes=12600, image_width=1204, image_height=800, mode=xyxy)

ipdb> anchors[1][3]

BoxList(num_boxes=3150, image_width=1204, image_height=800, mode=xyxy)

ipdb> anchors[1][4]

BoxList(num_boxes=819, image_width=1204, image_height=800, mode=xyxy)

->boxes = self.box_selector_train(anchors, objectness, rpn_box_regression, targets）选择boxes去训练fast rcnn,这一步不需要梯度更新

ipdb> boxes

[BoxList(num_boxes=316, image_width=1333, image_height=794, mode=xyxy), BoxList(num_boxes=1696, image_width=1204, image_height=800, mode=xyxy)]

-> loss_objectness, loss_rpn_box_reg = self.loss_evaluator(anchors, objectness, rpn_box_regression, targets) 算loss时候选择正负1：1的anchor进行训练rpn网络

->这里选择512个样本训练;_C.MODEL.RPN.BATCH_SIZE_PER_IMAGE = 256；两张图像

ipdb> sampled_pos_inds

tensor([ 16477,  16480,  16483,  16486,  17485,  17488,  17491,  17494,  18493,

         18496,  18499,  18502,  19501,  19504,  19507,  19510, 217452, 217453,

        217455, 217456, 217458, 217459, 217960, 268151, 529150, 534017, 534020,

        534143, 534146, 534586, 534607, 534712, 534733, 534838, 534859, 535356,

        535359, 535362, 535365, 535368, 536602, 536652, 536655, 536658, 536661,

        536664, 536667, 536670, 536715, 536718, 536721, 536724, 536727, 536730,

        536733, 536778, 536781, 536784, 536787, 536790, 536793, 536796, 536841,

        536844, 536847, 536850, 536853, 536856, 536859], device='cuda:0')

ipdb> sampled_neg_inds

tensor([  3045,   4275,   5323,   6555,   7538,   8406,   8469,   9761,  11316,

         11684,  12319,  13195,  13354,  15405,  20431,  25105,  26405,  26786,

         27324,  30698,  33503,  38168,  39244,  40064,  40535,  41046,  41162,

         41203,  41864,  43170,  44060,  44416,  44905,  45161,  47299,  48043,

         49890,  49900,  50992,  51248,  52082,  52236,  52371,  52568,  54079,

         54207,  55251,  56973,  57135,  58376,  59816,  61509,  62473,  62942,

         64722,  65548,  66681,  67925,  68650,  71368,  72610,  73268,  74727,

         75655,  77795,  78937,  79115,  80101,  80808,  81001,  83846,  87064,

         89891,  91207,  92579,  92771,  93113,  94118,  94526,  94586,  95822,

         96850,  97256,  97303,  97500,  98194,  98338, 101724, 102082, 103835,

        103947, 104678, 105168, 105630, 106132, 108751, 108933, 109684, 110552,

        111373, 111965, 114691, 114736, 115213, 115468, 120710, 121785, 123138,

        126383, 126957, 128197, 128282, 129449, 130472, 132269, 133131, 133384,

        135197, 135926, 136468, 137306, 137620, 138671, 141848, 142643, 145618,

        147402, 148283, 148353, 149313, 150389, 150528, 151949, 154413, 156156,

        157155, 158716, 160001, 160227, 160428, 160496, 160920, 161023, 162605,

        163131, 166371, 166561, 167200, 171280, 174531, 175690, 175957, 175996,

        179025, 179766, 180781, 182893, 182980, 183152, 183159, 183531, 183785,

        184531, 185565, 186520, 187194, 187772, 188100, 191068, 191289, 191419,

        192022, 193388, 194892, 196902, 204682, 206878, 207981, 208066, 208366,

        210761, 210862, 211624, 213567, 213627, 214601, 214651, 214770, 215032,

        216806, 218299, 220127, 220221, 221133, 222489, 223512, 224844, 225115,

        225225, 225337, 228044, 228580, 228691, 229787, 231390, 231405, 231666,

        233068, 233379, 233416, 234464, 236145, 238078, 239161, 239633, 240260,

        240492, 241033, 241702, 241758, 242546, 243372, 244102, 248078, 248632,

        255377, 256325, 257079, 258010, 259857, 260872, 261896, 271659, 274495,

        275822, 276450, 276728, 278865, 279179, 279338, 279735, 280208, 280216,

        282300, 283240, 283717, 285074, 285157, 287528, 287804, 288191, 289901,

        290179, 294877, 296999, 298420, 301631, 301890, 303575, 304982, 305983,

        305992, 307922, 312438, 313507, 314289, 316348, 318599, 319751, 321304,

        321735, 321748, 326308, 326315, 327131, 327290, 327671, 328439, 332674,

        333130, 333144, 334633, 336337, 337399, 340980, 341619, 347289, 347364,

        347579, 353057, 353309, 354001, 355039, 355271, 355597, 356617, 359064,

        359068, 360402, 362098, 362652, 363356, 363741, 364744, 365997, 370109,

        370949, 372977, 373248, 373992, 374786, 375293, 376785, 377661, 377761,

        378991, 379663, 380167, 380817, 382269, 383560, 387387, 388389, 389665,

        389862, 390138, 391941, 394183, 399113, 400423, 402411, 404907, 405436,

        406457, 407348, 408005, 408356, 409728, 411376, 411571, 412210, 412426,

        415363, 415453, 415601, 418159, 418174, 418928, 419064, 419394, 419783,

        421039, 421405, 423287, 426369, 429895, 430293, 431338, 432330, 432745,

        433529, 433699, 433738, 435389, 437567, 438410, 439164, 440481, 442532,

        445424, 446074, 446146, 446550, 447703, 449683, 450601, 451138, 452505,

        455922, 457464, 460557, 461150, 461431, 462641, 463544, 471945, 472032,

        473327, 474938, 475450, 477505, 477917, 478033, 479038, 480127, 481613,

        482384, 484433, 484542, 484556, 484588, 487380, 490897, 492173, 493279,

        493464, 494139, 498077, 498172, 498426, 499201, 500289, 500739, 503145,

        506227, 506661, 509266, 509355, 509382, 509556, 510331, 510346, 511426,

        511604, 512428, 512560, 513306, 514096, 515320, 516682, 516949, 517815,

        517984, 524421, 525174, 525384, 525697, 526692, 527047, 527576, 532272,

        535005, 535582], device='cuda:0')

ipdb> sampled_pos_inds.size()

torch.Size([69])

ipdb> sampled_neg_inds.size()

torch.Size([443])

-> 调用rpn/loss.py: class RPNLossComputation(object):

    def __call__(self, anchors, objectness, box_regression, targets):

        """

        Arguments:

            anchors (list[BoxList])

            objectness (list[Tensor])

            box_regression (list[Tensor])

            targets (list[BoxList])

        Returns:

            objectness_loss (Tensor)

            box_loss (Tensor

        """

        anchors = [cat_boxlist(anchors_per_image) for anchors_per_image in anchors]

        labels, regression_targets = self.prepare_targets(anchors, targets)

        sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)

        sampled_pos_inds = torch.nonzero(torch.cat(sampled_pos_inds, dim=0)).squeeze(1)

        sampled_neg_inds = torch.nonzero(torch.cat(sampled_neg_inds, dim=0)).squeeze(1)

        sampled_inds = torch.cat([sampled_pos_inds, sampled_neg_inds], dim=0)

        objectness_flattened = []

        box_regression_flattened = []

        # for each feature level, permute the outputs to make them be in the

        # same format as the labels. Note that the labels are computed for

        # all feature levels concatenated, so we keep the same representation

        # for the objectness and the box_regression

        for objectness_per_level, box_regression_per_level in zip(

            objectness, box_regression

        ):

            N, A, H, W = objectness_per_level.shape

            objectness_per_level = objectness_per_level.permute(0, 2, 3, 1).reshape(

                N, -1

            )

            box_regression_per_level = box_regression_per_level.view(N, -1, 4, H, W)

            box_regression_per_level = box_regression_per_level.permute(0, 3, 4, 1, 2)

            box_regression_per_level = box_regression_per_level.reshape(N, -1, 4)

            objectness_flattened.append(objectness_per_level)

            box_regression_flattened.append(box_regression_per_level)

        # concatenate on the first dimension (representing the feature levels), to

        # take into account the way the labels were generated (with all feature maps

        # being concatenated as well)

        objectness = cat(objectness_flattened, dim=1).reshape(-1)

        box_regression = cat(box_regression_flattened, dim=1).reshape(-1, 4)

        labels = torch.cat(labels, dim=0)

        regression_targets = torch.cat(regression_targets, dim=0)

        box_loss = smooth_l1_loss(

            box_regression[sampled_pos_inds],

            regression_targets[sampled_pos_inds],

            beta=1.0 / 9,

            size_average=False,

        ) / (sampled_inds.numel())

        objectness_loss = F.binary_cross_entropy_with_logits(

            objectness[sampled_inds], labels[sampled_inds]

        )

        return objectness_loss, box_loss

->变量打印：最后只使用选中的sampled_inds进行rpn的loss计算：

ipdb> objectness

tensor([-1.7661,  1.3304, -3.6243,  ...,  0.0558,  1.1206,  0.6639],

       device='cuda:0', grad_fn=<AsStridedBackward>)

ipdb> objectness.shape

torch.Size([537138])

ipdb> labels

tensor([-1., -1., -1.,  ..., -1., -1., -1.], device='cuda:0')

ipdb> labels.shape

torch.Size([537138])

ipdb> box_regression

tensor([[-0.1721, -0.2121,  0.1083, -0.5830],

        [-0.1728, -0.0665, -0.6760, -0.8508],

        [-0.0958, -0.0096, -0.1450,  0.2591],

        ...,

        [-0.0041,  0.0209,  0.2075, -0.0639],

        [ 0.0016,  0.0539, -0.1746, -0.1428],

        [ 0.0038, -0.0308, -0.0916,  0.0726]], device='cuda:0',

       grad_fn=<AsStridedBackward>)

ipdb> box_regression.shape

torch.Size([537138, 4])

ipdb> regression_targets

tensor([[10.3858, 12.5126,  1.8582,  3.0168],

        [15.5788,  9.3845,  2.2637,  2.7292],

        [20.7717,  6.2563,  2.5514,  2.3237],

        ...,

        [-1.0482, -1.0875, -1.2006, -0.7158],

        [-1.4904, -0.7816, -0.8487, -1.0460],

        [-2.1197, -0.5558, -0.4964, -1.3870]], device='cuda:0')

ipdb> regression_targets.shape

torch.Size([537138, 4])

-> 最后rpn网络返回：

ipdb> loss_objectness

tensor(0.0268, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)

ipdb> loss_rpn_box_reg

tensor(0.0690, device='cuda:0', grad_fn=<DivBackward0>)

ipdb> boxes

[BoxList(num_boxes=316, image_width=1333, image_height=794, mode=xyxy), BoxList(num_boxes=1696, image_width=1204, image_height=800, mode=xyxy)]

Fast RCNN+Mask

->generalized_rcnn.py文件: x, result, detector_losses = self.roi_heads(features, proposals, targets)

->重新换的图像rpn网络输出信息：

ipdb> proposals

[BoxList(num_boxes=571, image_width=1201, image_height=800, mode=xyxy), BoxList(num_boxes=1468, image_width=1199, image_height=800, mode=xyxy)]

ipdb> proposal_losses

{'loss_objectness': tensor(0.0656, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_box_reg': tensor(0.2036, device='cuda:0', grad_fn=<DivBackward0>)}

->roi_heads.py分box和mask两部分：

->这里用FPN网络，所以在box和mask进行特征抽取（进行roipool）的时候,进行每个层级上的pool操作，这里还可以进行特征抽取时参数共享;

-> 所以输入mask分支的mask_features是原始的backbone网络的features,只不过在上面去box分支出来的detections区域进行loss计算;

    def forward(self, features, proposals, targets=None):

        losses = {}

        # TODO rename x to roi_box_features, if it doesn't increase memory consumption

        x, detections, loss_box = self.box(features, proposals, targets)

        losses.update(loss_box)

        if self.cfg.MODEL.MASK_ON:

            mask_features = features

            # optimization: during training, if we share the feature extractor between

            # the box and the mask heads, then we can reuse the features already computed

            if (

                self.training

                and self.cfg.MODEL.ROI_MASK_HEAD.SHARE_BOX_FEATURE_EXTRACTOR

            ):

                mask_features = x

            # During training, self.box() will return the unaltered proposals as "detections"

            # this makes the API consistent during training and testing

            x, detections, loss_mask = self.mask(mask_features, detections, targets)

            losses.update(loss_mask)

        return x, detections, losses

->x, detections, loss_box = self.box(features, proposals, targets) fast rcnn的分类与回归部分：

->x = self.feature_extractor(features, proposals)这里的特征提取分roipooling和抽取成roipool_feature,可以和mask分支共享，然后再分（分类+回归，mask）两个loss分支;

    def forward(self, features, proposals, targets=None):

        """

        Arguments:

            features (list[Tensor]): feature-maps from possibly several levels

            proposals (list[BoxList]): proposal boxes

            targets (list[BoxList], optional): the ground-truth targets.

        Returns:

            x (Tensor): the result of the feature extractor

            proposals (list[BoxList]): during training, the subsampled proposals

                are returned. During testing, the predicted boxlists are returned

            losses (dict[Tensor]): During training, returns the losses for the

                head. During testing, returns an empty dict.

        """

        if self.training:

            # Faster R-CNN subsamples during training the proposals with a fixed

            # positive / negative ratio

            with torch.no_grad():

                proposals = self.loss_evaluator.subsample(proposals, targets)

        # extract features that will be fed to the final classifier. The

        # feature_extractor generally corresponds to the pooler + heads

        x = self.feature_extractor(features, proposals)

        # final classifier that converts the features into predictions

        class_logits, box_regression = self.predictor(x)

        if not self.training:

            result = self.post_processor((class_logits, box_regression), proposals)

            return x, result, {}

        loss_classifier, loss_box_reg = self.loss_evaluator(

            [class_logits], [box_regression]

        )

        return (

            x,

            proposals,

            dict(loss_classifier=loss_classifier, loss_box_reg=loss_box_reg),

        )

->训练的时候每张图选择512个box训练，输出([1024, 81])类别; ([1024, 324])回归坐标81×4=324;

ipdb> x.shape

torch.Size([1024, 1024])

ipdb> proposals

[BoxList(num_boxes=512, image_width=1201, image_height=800, mode=xyxy), BoxList(num_boxes=512, image_width=1199, image_height=800, mode=xyxy)]

ipdb> class_logits.shape

torch.Size([1024, 81])

ipdb> box_regression

tensor([[ 1.2481e-02, -1.5032e-02,  2.6849e-03,  ...,  2.6986e-03,

          1.4723e-01, -5.2207e-01],

        [-5.7448e-03, -7.5938e-03, -2.6571e-03,  ...,  1.3588e-01,

         -3.1587e-01,  6.2171e-01],

        [-6.6426e-03, -3.4121e-03, -9.5814e-04,  ..., -4.7817e-01,

         -2.8117e-03,  1.6653e-01],

        ...,

        [-1.1446e-02, -4.6574e-03, -8.0981e-04,  ..., -5.0460e-01,

          6.2465e-01, -4.1426e-01],

        [ 6.0940e-05, -1.2032e-02, -5.0753e-03,  ...,  1.0396e+00,

         -1.9913e-01, -1.2819e+00],

        [-4.9718e-03, -6.6546e-03, -2.5202e-03,  ...,  3.9986e-02,

         -6.0675e-02, -1.1396e-01]], device='cuda:0', grad_fn=<AddmmBackward>)

ipdb> box_regression.shape

torch.Size([1024, 324])

ipdb> loss_classifier

tensor(0.3894, device='cuda:0', grad_fn=<NllLossBackward>)

ipdb> loss_box_reg

tensor(0.1674, device='cuda:0', grad_fn=<DivBackward0>)

->整体x, detections, loss_box = self.box(features, proposals, targets)输出，x为box和mask分支的特征;选择512个box计算loss并传入mask分支

ipdb> x.shape

torch.Size([1024, 1024])

ipdb> proposals

[BoxList(num_boxes=571, image_width=1201, image_height=800, mode=xyxy), BoxList(num_boxes=1468, image_width=1199, image_height=800, mode=xyxy)]

ipdb> detections

[BoxList(num_boxes=512, image_width=1201, image_height=800, mode=xyxy), BoxList(num_boxes=512, image_width=1199, image_height=800, mode=xyxy)]

ipdb> loss_box

{'loss_box_reg': tensor(0.1674, device='cuda:0', grad_fn=<DivBackward0>), 'loss_classifier': tensor(0.3894, device='cuda:0', grad_fn=<NllLossBackward>)}

->x, detections, loss_mask = self.mask(mask_features, detections, targets) mask分支：

-> 仅利用检测出来的proposals中有目标的positive_inds;

    def forward(self, features, proposals, targets=None):

        """

        Arguments:

            features (list[Tensor]): feature-maps from possibly several levels

            proposals (list[BoxList]): proposal boxes

            targets (list[BoxList], optional): the ground-truth targets.

        Returns:

            x (Tensor): the result of the feature extractor

            proposals (list[BoxList]): during training, the original proposals

                are returned. During testing, the predicted boxlists are returned

                with the `mask` field set

            losses (dict[Tensor]): During training, returns the losses for the

                head. During testing, returns an empty dict.

        """

        if self.training:

            # during training, only focus on positive boxes

            all_proposals = proposals

            proposals, positive_inds = keep_only_positive_boxes(proposals)

        if self.training and self.cfg.MODEL.ROI_MASK_HEAD.SHARE_BOX_FEATURE_EXTRACTOR:

            x = features

            x = x[torch.cat(positive_inds, dim=0)]

        else:

            x = self.feature_extractor(features, proposals)

        mask_logits = self.predictor(x)

        if not self.training:

            result = self.post_processor(mask_logits, proposals)

            return x, result, {}

        loss_mask = self.loss_evaluator(proposals, mask_logits, targets)

        return x, all_proposals, dict(loss_mask=loss_mask)

-> 变量结果：只把正例进行loss计算，变少很多; 然后pool后的特征维度([171, 256, 14, 14])（由于选的box只有43+128=171）

->训练时，真正有用的返回就是loss_mask;测试的时候返回的是经过后处理的result;

ipdb> all_proposals

[BoxList(num_boxes=512, image_width=1201, image_height=800, mode=xyxy), BoxList(num_boxes=512, image_width=1199, image_height=800, mode=xyxy)]

ipdb> proposals

[BoxList(num_boxes=43, image_width=1201, image_height=800, mode=xyxy), BoxList(num_boxes=128, image_width=1199, image_height=800, mode=xyxy)]

ipdb> positive_inds.shape

*** AttributeError: 'list' object has no attribute 'shape'

ipdb> positive_inds[0].shape

torch.Size([512])

ipdb> x.shape

torch.Size([171, 256, 14, 14])

ipdb> mask_logits.shape

torch.Size([171, 81, 28, 28])

ipdb> targets[0]

BoxList(num_boxes=4, image_width=1201, image_height=800, mode=xyxy)

ipdb> targets[1]

BoxList(num_boxes=35, image_width=1199, image_height=800, mode=xyxy)

ipdb> loss_mask

tensor(0.3287, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)

-> 至此真个训练loss完成; 进行迭代...

总结：

1. 在模型中已经很好的区分训练和测试部分，处理后返回的结果也不一样；

2. 后续对一些数据结构，数据细节处理在看看！