Single Shot Multibox Detection (SSD)实战(下)

2. Training


2.1. Data Reading and Initialization


batch_size = 32

train_iter, _ = d2l.load_data_pikachu(batch_size)


ctx, net = d2l.try_gpu(), TinySSD(num_classes=1)

net.initialize(init=init.Xavier(), ctx=ctx)

trainer = gluon.Trainer(net.collect_params(), 'sgd',

{'learning_rate': 0.2, 'wd': 5e-4})

2.2. Defining Loss and Evaluation Functions


cls_loss = gluon.loss.SoftmaxCrossEntropyLoss()

bbox_loss = gluon.loss.L1Loss()

def calc_loss(cls_preds, cls_labels, bbox_preds, bbox_labels, bbox_masks):

cls = cls_loss(cls_preds, cls_labels)

bbox = bbox_loss(bbox_preds * bbox_masks, bbox_labels * bbox_masks)

return cls + bbox


def cls_eval(cls_preds, cls_labels):

# Because the category prediction results are placed in the final

# dimension, argmax must specify this dimension

return float((cls_preds.argmax(axis=-1) == cls_labels).sum())

def bbox_eval(bbox_preds, bbox_labels, bbox_masks):

return float((np.abs((bbox_labels - bbox_preds) * bbox_masks)).sum())

2.3. Training the Model


num_epochs, timer = 20, d2l.Timer()

animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],

legend=['class error', 'bbox mae'])

for epoch in range(num_epochs):

# accuracy_sum, mae_sum, num_examples, num_labels

metric = d2l.Accumulator(4)

train_iter.reset()  # Read data from the start.

for batch in train_iter:


X =[0].as_in_ctx(ctx)

Y = batch.label[0].as_in_ctx(ctx)

with autograd.record():

# Generate multiscale anchor boxes and predict the category and

# offset of each

anchors, cls_preds, bbox_preds = net(X)

# Label the category and offset of each anchor box

bbox_labels, bbox_masks, cls_labels = npx.multibox_target(

anchors, Y, cls_preds.transpose(0, 2, 1))

# Calculate the loss function using the predicted and labeled

# category and offset values

l = calc_loss(cls_preds, cls_labels, bbox_preds, bbox_labels,




metric.add(cls_eval(cls_preds, cls_labels), cls_labels.size,

bbox_eval(bbox_preds, bbox_labels, bbox_masks),


cls_err, bbox_mae = 1-metric[0]/metric[1], metric[2]/metric[3]

animator.add(epoch+1, (cls_err, bbox_mae))

print('class err %.2e, bbox mae %.2e' % (cls_err, bbox_mae))

print('%.1f examples/sec on %s' % (train_iter.num_image/timer.stop(), ctx))

class err 2.35e-03, bbox mae 2.68e-03

4315.5 examples/sec on gpu(0)

3. Prediction


img = image.imread('../img/pikachu.jpg')

feature = image.imresize(img, 256, 256).astype('float32')

X = np.expand_dims(feature.transpose(2,
0, 1), axis=0)


def predict(X):

anchors, cls_preds, bbox_preds =

cls_probs = npx.softmax(cls_preds).transpose(0,
2, 1)

output = npx.multibox_detection(cls_probs,
bbox_preds, anchors)

idx = [i for i, row in
enumerate(output[0]) if row[0] != -1]

return output[0, idx]

output = predict(X)


def display(img, output, threshold):

d2l.set_figsize((5, 5))

fig = d2l.plt.imshow(img.asnumpy())

for row in output:

score = float(row[1])

if score < threshold:


h, w = img.shape[0:2]

bbox = [row[2:6] * np.array((w,
h, w, h), ctx=row.ctx)]

d2l.show_bboxes(fig.axes, bbox,
'%.2f' % score, 'w')

display(img, output, threshold=0.3)

4. Loss


For the predicted offsets, replace L1L1 norm loss with L1L1 regularization loss. This
loss function uses a square function around zero for greater smoothness. This
is the regularized area controlled by the hyperparameter σσ:

When σσ is large, this loss is similar
to the L1L1 norm loss. When the value is
small, the loss function is smoother.

sigmas = [10, 1, 0.5]

lines = ['-', '--', '-.']

x = np.arange(-2, 2, 0.1)


for l, s in zip(lines, sigmas):

y = npx.smooth_l1(x, scalar=s)

d2l.plt.plot(x.asnumpy(), y.asnumpy(), l, label='sigma=%.1f' % s)


def focal_loss(gamma, x):

return -(1 - x) ** gamma * np.log(x)

x = np.arange(0.01, 1, 0.01)

for l, gamma in zip(lines, [0, 1, 5]):

y = d2l.plt.plot(x.asnumpy(), focal_loss(gamma, x).asnumpy(), l,

label='gamma=%.1f' % gamma)


and Prediction

an object is relatively large compared to the image, the model normally adopts
a larger input image size.

This generally produces a large
number of negative anchor boxes when labeling anchor box categories. We can
sample the negative anchor boxes to better balance the data categories. To do
this, we can set the MultiBoxTarget function’s negative_mining_ratio parameter.

Assign hyper-parameters with different weights to the
anchor box category loss and positive anchor box offset loss in the loss

Refer to the SSD paper. What methods
can be used to evaluate the precision of object detection models?

5. Summary

  • SSD is a multiscale object detection model. This model generates different
    numbers of anchor boxes of different sizes based on the base network block
    and each multiscale feature block and predicts the categories and offsets
    of the anchor boxes to detect objects of different sizes.
  • During SSD model training, the loss function is calculated using the
    predicted and labeled category and offset values.

