cs231n assignment 2

20210913 - 20211005。

cs231n assignment 2

fully-connected nets

基本思想

把各种layer封装起来，就可以modular programming了。

封装一个forward，输入是computational graph节点的输入，输出是节点的输出+需要缓存的信息。

封装一个backward，输入是upstream的derivative即计算图节点输出的derivative，输出是各个计算图节点输入的derivative。

backward时可以根据链式法则，按照维度无脑矩阵乘法算偏导数。

编程细节

x_rsp = x.reshape(x.shape[0], -1) # N*d1*d2*... -> N*D，一行一个数据

A = B.dot(C) # 矩阵乘法

dx = dx.reshape(x.shape) # 把我reshape成你的shape

out = x * (x >= 0) # relu：保留≥0的值，精简numpy写法

dx = (x > 0) * dout # relu的backprop

关于fully-connected layer中的w维度：

layer_input_dim = input_dim

for i, hd in enumerate(hidden_dims):

    self.params['W%d'%(i+1)] = weight_scale * np.random.randn(layer_input_dim, hd)

    self.params['b%d'%(i+1)] = np.zeros(hd)

    if self.use_batchnorm:

        self.params['gamma%d'%(i+1)] = np.ones(hd)

        self.params['beta%d'%(i+1)] = np.zeros(hd)

    layer_input_dim = hd

self.params['W%d'%(self.num_layers)] = weight_scale * np.random.randn(layer_input_dim, num_classes)

self.params['b%d'%(self.num_layers)] = np.zeros(num_classes)

带momentum的stochastic gradient descent：

v = config['momentum'] * v - config['learning_rate'] * dw

# 速度衰减0.9，再加上加速度的方向

next_w = w + v # 用速度更新W

config['velocity'] = v # 记录更新后的速度

RMSProp：

config['cache'] = config['decay_rate'] * config['cache'] + (1 - config['decay_rate']) * (dx**2)

# cache：是以decay_rate为权重的，【原来cache】与【dx平方】的加权平均

next_x = x - config['learning_rate'] * dx / (np.sqrt(config['cache']) + config['epsilon'])

# next_x：走learning_rate的步长，方向为负的 dx除sqrt(cache)+小epsilon（防止除0）。

Adam：

config['t'] += 1

# t：每次更新W都++，用来牵制mb和vb的增长速度

config['m'] = config['beta1'] * config['m'] + (1 - config['beta1']) * dx

# m：是以beta1为权重的，【原来m】与【dx】的加权平均

config['v'] = config['beta2'] * config['v'] + (1 - config['beta2']) * (dx**2)

# v：是以beta2为权重的，【原来v】与【dx平方】的加权平均

mb = config['m'] / (1 - config['beta1']**config['t'])

# mb：原来m 除 1-第一个β参数的t次方，变大了一点点。随着t越来越大，β1**t越来越小，1-β1**t越来越大，除以它就越来越小。因此mb的增加速率越来越小。

vb = config['v'] / (1 - config['beta2']**config['t'])

# vb：原来v 除 1-第二个β参数的t次方，变大了一点点。与上面一样。

next_x = x - config['learning_rate'] * mb / (np.sqrt(vb) + config['epsilon'])

# next_x：走learning_rate的步长，方向为负的 mb除sqrt(vb)+小epsilon。

Adam是怎么一回事呢，就是：

我们要stochastic gradient descent，就要瞄准一个下降方向，走learning rate的步长。
瞄准什么方向呢，瞄准 mb 除 sqrt(vb)+epsilon 的方向。
mb是干啥的呢，它是 m 除 (1-β1^t)。
- 除(1-β1^{t)是用来缓慢减小mb值的，随着t累加，β1}t减小，1-β1^t增大，除它又减小，因此除它是用来缓慢减小mb值的。
- m是干啥的呢，它其实是momentum，更新公式是原m与现dx的加权平均。
那vb是干啥的呢，它是 v 除 (1-β2^t)。
- 除(1-β2^t)啊，估计也是用来缓慢减小vb值的。虽然vb最后要放在前进方向的分母上，好矛盾诶。
- v是干啥的呢，是RMSProp的奇妙操作，更新公式是原v与现dx²的加权平均。
因此，Adam综合了momentum和RMSProp，又沿着momentum方向前进，又除平方dx，同时还奇妙地用【除(1-β^t)】牵制两者。

复习multiclass svm loss和softmax loss

multiclass svm loss & derivative

好像又被叫做hinge loss。

N = x.shape[0]

correct_class_scores = x[np.arange(N), y] # 正确类别的分数，N*1的向量

margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)

margins[np.arange(N), y] = 0 # 我们只计算错误类别

loss = np.sum(margins) / N # 对N个样本求loss，然后做平均作为最后的loss

num_pos = np.sum(margins > 0, axis=1)

dx = np.zeros_like(x) # x形状的全0矩阵

dx[margins > 0] = 1 # loss增大方向：错误类别分数增加

dx[np.arange(N), y] -= num_pos

# loss增大方向：每个【错误类别分数增加】都对应一个【正确类别分数减小】

dx /= N # 对N个样本效果的平均

return loss, dx

softmax loss & derivative

又被叫做cross entropy loss。

shifted_logits = x - np.max(x, axis=1, keepdims=True)

# 相当于exp(x)/exp(max(x))

Z = np.sum(np.exp(shifted_logits), axis=1, keepdims=True)

# 相当于对exp(x)/exp(max(x))求sum，sum(exp(x))/exp(max(x))

# 也就是sum(exp(x))再除exp的max(x)

log_probs = shifted_logits - np.log(Z)

# 相当于exp(x)/sum(exp(x))的log，也就是概率的log，这样算省了很多exp

# 关于貌似无用的“减去max(x)”：https://zhuanlan.zhihu.com/p/92714192

probs = np.exp(log_probs) # 这是概率

N = x.shape[0]

loss = -np.sum(log_probs[np.arange(N), y]) / N

# loss就是-log(正确概率)，最后对N个样本取平均

dx = probs.copy() # 首先dx=算出来的概率

dx[np.arange(N), y] -= 1 # 然后所有正确分类的概率-=1

# 不知道为什么反正就这么算

dx /= N # 最后对N个样本做平均，因为每个样本对loss只贡献了1/N？

return loss, dx

batch normalization

基本思想

先把一个minibatch的数据变成0均值1方差，然后再乘γ加β。这是一个特殊的层。

它一般被用在ReLU层前面。

编程细节

forward：

sample_mean = np.mean(x,axis=0)

sample_var = np.var(x,axis=0)

x_hat = (x - sample_mean) / (np.sqrt(sample_var+eps))

out = gamma * x_hat + beta

cache = (gamma, x, sample_mean, sample_var, x_hat)

running_mean = momentum * running_mean + (1-momentum) * sample_mean

running_var = momentum * running_var + (1-momentum) * sample_var

# test的时候

scale = gamma / (np.sqrt(running_var  + eps))

out = x * scale + (beta - running_mean * scale)

# 其实没什么区别，只是这样好像计算量小一点，能用标量尽量不用向量

backward：

# 估计我下次看也看不懂了

# 大意就是，x若变化，均值和方差也会变，求导时也要考虑这个。

gamma, x, sample_mean, sample_var, eps, x_hat = cache

N = x.shape[0]

dbeta = np.sum(dout, axis=0) # 是的，是sum，把每一个样本的影响累加

dgamma = np.sum(dout*x_hat, axis=0)

dy_wrt_dmean = -gamma / np.sqrt(sample_var+eps) * dout

dy_wrt_dvar = -0.5 * gamma * np.power(sample_var+eps,-1.5)

dmean_wrt_dx = 1.0 / N # 是的，每个人都贡献了1/N。直接用1可能会整数除法？

dvar_wrt_dx = 2.0 / N * (x-sample_mean) # 根据方差的计算公式

dy_wrt_dx = gamma / np.sqrt(sample_var+eps) * dout

dx = dy_wrt_dx + dy_wrt_dmean * dmean_wrt_dx + dy_wrt_dvar * dvar_wrt_dx

# 正确性存疑，虽然抄的别人的代码，但是有误差

方差计算公式：

\[s^2=\frac{(x_1-\mu)^2+(x_2-\mu)^2+\cdots+(x_N-\mu)^2}{N}
\]

dropout

基本思想

原dropout：train的时候以p的概率随机把neuron赋0，test的时候把整层的输出乘(1-p)。

inverted dropout：train的时候以p的概率随机把neuron赋0，也就是保留了(1-p)的原数值，然后再把所有数值除(1-p)（就像做平均一样），试图通过放大留下的(1-p)个人的影响，假装什么都没发生。test的时候，不需要做任何事情。

网络结构：affine - [batch norm] - relu - [dropout]。

编程细节

# forward

mask = (np.random.rand(*x.shape) >= p) / (1-p)

out = x * mask

# backward

dx = dout * mask

在【fully connect - batch norm - relu - dropout】结构中添加dropout：forward时，在最后把输出dropout一下；backward时，把上一层的输出先做一个dropout backward。

convolutional networks

基本思想

convolution

input的shape是(N, C, H, W)，其中N是样本数量，C是channel个数（RGB），HW是高和宽。

filter的shape是(F, C, HH, WW)，F是卷积核个数，HH是卷积核高，WW是卷积核宽。

output的shape是(N, F, H_out, W_out)，对每一个样本用F个filter 做卷积操作，因此第一个dimension是N，第二个是F。H_out和W_out是卷积后的高和宽。

卷积还有一个biases参数，是长度为F的向量，负责整体平移卷积后的map。

还有两个超参数：stride步长、pad填充。

H_out和W_out这样计算：

H_out = 1 + (H + 2 * pad - HH) // stride

W_out = 1 + (W + 2 * pad - WW) // stride

算卷积结果的时候，这样写：（naive）

out[:, f, i, j] = np.sum(x_masked * w[f,:,:,:], axis=(1,2,3))

max pooling

input的shape是(N, C, H, W)，pooling的参数有HH、WW和stride。

我们每次考虑HH*WW的方形区域，记录该区域的最大值，每次走stride的步长。

输出的shape是(N, C, H_out, W_out)，其中H_out和W_out这样计算（同卷积）：

H_out = 1 + (H - HH) // stride

W_out = 1 + (W - WW) // stride

计算max的时候，使用np.max(x_masked, axis=(2,3))。

spatial batch normalization

设input为四维矩阵 (N, C, H, W)。在cnn中，我们把每个 feather map 看成是一个特征处理（一个神经元），因此在使用 spatial batchnorm 的时候，mini-batch size 就是：N*H*W，于是对于每个特征图都只有两个可学习参数：γ、β。

也就是说，求取所有样本的某一个特征图的【所有】神经元的均值方差，然后对这个特征图神经元做归一化。

https://blog.csdn.net/hjimce/article/details/50866313

编程细节

convolution：

# forward

N, C, H, W = x.shape

F, _, HH, WW = w.shape

stride, pad = conv_param['stride'], conv_param['pad']

H_out = 1 + (H + 2 * pad - HH) // stride

W_out = 1 + (W + 2 * pad - WW) // stride

out = np.zeros((N, F, H_out, W_out))

x_pad = np.pad(x, ((0,0), (0,0), (pad,pad), (pad,pad)), mode='constant',constant_values=0)

"""

np.pad：填充数组的边缘，就是一个padding操作。

第一个参数是需要填充的数组。

第二个参数是填充大小，格式为((before_1, after_1), … (before_N, after_N))，其中(before_1, after_1)表示第1轴两边缘分别填充before_1个和after_1个数值。

最后一个参数表示填充的方式。

"""

for i in range(H_out):

    for j in range(W_out):

        x_pad_masked = x_pad[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW]

        for k in range(F):

            out[:, k, i, j] = np.sum(x_pad_masked * w[k, :, :, :], axis=(1,2,3))

for k in range(F):

    out[:, k, :, :] += b[k]

# backward

x, w, b, conv_param = cache

N, C, H, W = x.shape

F, _, HH, WW = w.shape

stride, pad = conv_param['stride'], conv_param['pad']

N, F, H_out, W_out = dout.shape

x_pad = np.pad(x, ((0,0), (0,0), (pad,pad), (pad,pad)), mode='constant', constant_values=0)

dx = np.zeros_like(x)

dx_pad = np.zeros_like(x_pad)

dw = np.zeros_like(w)

db = np.sum(dout, axis=(0,2,3))

for i in range(H_out):

    for j in range(W_out):

        x_pad_masked = x_pad[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW]

        for k in range(F): # compute dw

            dw[k,:,:,:] += np.sum(x_pad_masked * (dout[:,k,i,j])[:, None, None, None], axis=0)

            # 对每个filter，sum用来累加N个样本的影响

		for n in range(N): # compute dx_pad

            dx_pad[n, :, i*stride:i*stride+HH, j*stride:j*stride+WW] += np.sum((w[:,:,:,:] * (dout[n, :, i, j])[:,None ,None, None]), axis=0)

            # 对每个样本，sum用来累加F个filter带来的梯度

dx = dx_pad[:,:,pad:-pad,pad:-pad]

max pooling：

# forward

N, C, H, W = x.shape

HH, WW, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']

H_out = (H - HH) // stride + 1

W_out = (W - WW) // stride + 1

out = np.zeros((N, C, H_out, W_out))

for i in range(H_out):

    for j in range(W_out):

        x_masked = x[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW]

        out[:,:,i,j] = np.max(x_masked, axis=(2,3))

# backward

x, pool_param = cache

N, C, H, W = x.shape

HH, WW, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']

N, C, H_out, W_out = dout.shape

dx = np.zeros_like(x)

for i in range(H_out):

    for j in range(W_out):

        x_masked = x[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW]

        max_x_masked = np.max(x_masked,axis=(2,3))

        temp_binary_mask = (x_masked == (max_x_masked)[:,:,None,None])

        # 如果出现多个数同时为max，那么这多个数都要继承梯度

        dx[:, :, i*stride:i*stride+HH, j*stride:j*stride+WW] += temp_binary_mask * (dout[:,:,i,j])[:,:,None,None]

spatial batch normalization：

# forward

N, C, H, W = x.shape

temp_output, cache = batchnorm_forward(x.transpose(0,3,2,1).reshape((N*H*W,C)), gamma, beta, bn_param)

out = temp_output.reshape(N,W,H,C).transpose(0,3,2,1)

# backward

PyTorch quick start

首先，import一堆东西：

import torch

import torch.nn as nn

import torch.optim as optim

from torch.autograd import Variable

from torch.utils.data import DataLoader

from torch.utils.data import sampler

import torchvision.datasets as dset

import torchvision.transforms as T

import numpy as np

import timeit

然后，因为本人没有GPU，所以把数据类型定义成CPU的数据类型：

dtype = torch.FloatTensor # the CPU datatype

torch.cuda.is_available() # 用这个来看有没有GPU，如果有的话会返回True

gpu_dtype = torch.cuda.FloatTensor # the GPU datatype

然后，我们定义一个flatten，它用来把 shape 为 N*C*H*W 的输入展开成 N*?? 的shape，就是一个np.reshape(x,(x.shape[0],-1))操作。

class Flatten(nn.Module):

    def forward(self, x):

        N, C, H, W = x.size() # read in N, C, H, W

        return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

接下来，我们定义模型：

'''

architecture:

[conv - ReLU - BatchNorm - MaxPool] -

[conv - ReLU - BatchNorm - MaxPool] -

[affine - BatchNorm - ReLU] -

[affine - softmax]

'''

model_base = nn.Sequential(nn.Conv2d(in_channels=3,out_channels=16, kernel_size=5, stride=1),

                           nn.ReLU(inplace=True),

                           nn.BatchNorm2d(num_features=16),

                           nn.MaxPool2d(kernel_size=2,stride=2),

                           nn.Conv2d(in_channels=16,out_channels=32, kernel_size=3, stride=1),

                           nn.ReLU(inplace=True),

                           nn.BatchNorm2d(num_features=32),

                           nn.MaxPool2d(kernel_size=2,stride=2),

                           Flatten(),

                           nn.Linear(1152,200),  # 1152=32*6*6 input size

                           nn.BatchNorm1d(num_features=200),

                           nn.ReLU(inplace=True),

                           nn.Linear(200, 10), # affine layer

                          )

model = model_base.type(dtype) # 先定义base，再把具体数据类型套到base上

loss_fn = nn.CrossEntropyLoss().type(dtype)

optimizer = optim.Adam(model.parameters(), lr=1e-3)

cs231n提供了训练和check accuracy的函数，我们直接抄过来：

def train(model, loss_fn, optimizer, num_epochs = 1):

    for epoch in range(num_epochs):

        print('Starting epoch %d / %d' % (epoch + 1, num_epochs))

        model.train()

        for t, (x, y) in enumerate(loader_train):

            x_var = Variable(x.type(dtype))

            y_var = Variable(y.type(dtype).long())

            scores = model(x_var)

            loss = loss_fn(scores, y_var)

            if (t + 1) % print_every == 0:

                print('t = %d, loss = %.4f' % (t + 1, loss.item()))

            optimizer.zero_grad()

            loss.backward()

            optimizer.step()

def check_accuracy(model, loader):

    if loader.dataset.train:

        print('Checking accuracy on validation set')

    else:

        print('Checking accuracy on test set')

    num_correct = 0

    num_samples = 0

    model.eval() # Put the model in test mode (the opposite of model.train(), essentially)

    for x, y in loader:

        with torch.no_grad():

            x_var = Variable(x.type(dtype))

        scores = model(x_var)

        _, preds = scores.data.cpu().max(1)

        num_correct += (preds == y).sum()

        num_samples += preds.size(0)

    acc = float(num_correct) / num_samples

    print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

然后我们开始训练：

train(model, loss_fn, optimizer, num_epochs=10)

check_accuracy(model, loader_val) # validation

check_accuracy(best_model, loader_test) # test

TensorFlow quick start

首先import一堆东西：

import tensorflow.compat.v1 as tf

tf.compat.v1.disable_eager_execution()

import numpy as np

import math

import timeit

import matplotlib.pyplot as plt

%matplotlib inline

接下来我们用placeholder（占位符）声明X和y。

X = tf.placeholder(tf.float32, [None, 32, 32, 3])

y = tf.placeholder(tf.int64, [None])

is_training = tf.placeholder(tf.bool) # batchnorm时，train和test不一样，因此要记录一下

声明模型：

def my_model(X,y,is_training):

    # Conv-Relu-BN

    conv1act = tf.layers.conv2d(inputs=X, filters=32, padding='same', kernel_size=3, strides=1, activation=tf.nn.relu)

    bn1act = tf.layers.batch_normalization(inputs=conv1act, training=is_training)

    # Conv-Relu-BN

    conv2act = tf.layers.conv2d(inputs=bn1act, filters=64, padding='same', kernel_size=3, strides=1,

                                activation=tf.nn.relu)

    bn2act = tf.layers.batch_normalization(inputs=conv2act, training=is_training)

    # Maxpool

    maxpool1act = tf.layers.max_pooling2d(inputs=bn2act, pool_size=2, strides=2)

    # Flatten

    flatten1 = tf.reshape(maxpool1act,[-1,16384])

    # FC-Relu-BN

    fc1 = tf.layers.dense(inputs=flatten1, units=1024, activation=tf.nn.relu)

    bn3act = tf.layers.batch_normalization(inputs=fc1, training=is_training)

    # Output FC

    y_out = tf.layers.dense(inputs=bn3act, units=10, activation=None)

    return y_out

接下来，声明loss和optimizer。

# clear old variables

tf.reset_default_graph()

y_out = my_model(X,y,is_training)

mean_loss = tf.losses.softmax_cross_entropy(logits=y_out, onehot_labels=tf.one_hot(y,10))

optimizer = tf.train.AdamOptimizer(learning_rate=0.001)

# batch normalization in tensorflow requires this extra dependency，好像是一个依赖的意思

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

with tf.control_dependencies(extra_update_ops):

    train_step = optimizer.minimize(mean_loss)

cs231n给出了训练的函数，直接粘过来：

def run_model(session, predict, loss_val, Xd, yd,

              epochs=1, batch_size=64, print_every=100,

              training=None, plot_losses=False):

    # have tensorflow compute accuracy

    correct_prediction = tf.equal(tf.argmax(predict,1), y)

    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # shuffle indicies

    train_indicies = np.arange(Xd.shape[0])

    np.random.shuffle(train_indicies)

    training_now = training is not None

    # setting up variables we want to compute (and optimizing)

    # if we have a training function, add that to things we compute

    variables = [mean_loss,correct_prediction,accuracy]

    if training_now:

        variables[-1] = training

    # counter

    iter_cnt = 0

    for e in range(epochs):

        # keep track of losses and accuracy

        correct = 0

        losses = []

        # make sure we iterate over the dataset once

        for i in range(int(math.ceil(Xd.shape[0]/batch_size))):

            # generate indicies for the batch

            start_idx = (i*batch_size)%Xd.shape[0]

            idx = train_indicies[start_idx:start_idx+batch_size]

            # create a feed dictionary for this batch

            feed_dict = {X: Xd[idx,:],

                         y: yd[idx],

                         is_training: training_now }

            # get batch size

            actual_batch_size = yd[idx].shape[0]

            # have tensorflow compute loss and correct predictions

            # and (if given) perform a training step

            loss, corr, _ = session.run(variables,feed_dict=feed_dict)

            # aggregate performance stats

            losses.append(loss*actual_batch_size)

            correct += np.sum(corr)

            # print every now and then

            if training_now and (iter_cnt % print_every) == 0:

                print("Iteration {0}: with minibatch training loss = {1:.3g} and accuracy of {2:.2g}"\

                      .format(iter_cnt,loss,np.sum(corr)/actual_batch_size))

            iter_cnt += 1

        total_correct = correct/Xd.shape[0]

        total_loss = np.sum(losses)/Xd.shape[0]

        print("Epoch {2}, Overall loss = {0:.3g} and accuracy of {1:.3g}"\

              .format(total_loss,total_correct,e+1))

        if plot_losses:

            plt.plot(losses)

            plt.grid(True)

            plt.title('Epoch {} Loss'.format(e+1))

            plt.xlabel('minibatch number')

            plt.ylabel('minibatch loss')

            plt.show()

    return total_loss,total_correct

我们开始训练吧：

sess = tf.Session() # session封装了compute graph的状态和相关控制

sess.run(tf.global_variables_initializer())

print('Training')

run_model(sess,y_out,mean_loss,X_train,y_train,10,64,100,train_step,True)

print('Validation')

run_model(sess,y_out,mean_loss,X_val,y_val,1,64)

test一下：

print('Test')

run_model(sess,y_out,mean_loss,X_test,y_test,1,64)

DL基础：cs231n assignment 2的更多相关文章

DL基础：cs231n assignment 1
cs231n assignment 1 20210804 - 20210808. 目录 cs231n assignment 1 总结 KNN 思想 cross-validation 编程细节 SVM ...
Java基础-赋值运算符Assignment Operators与条件运算符Condition Operators
Java基础-赋值运算符Assignment Operators与条件运算符Condition Operators 作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.赋值运算符表 ...
【DL基础】GridSearch网格搜索
前言参考 1. 调参必备---GridSearch网格搜索: 完
普通程序员如何转向AI方向
眼下,人工智能已经成为越来越火的一个方向.普通程序员,如何转向人工智能方向,是知乎上的一个问题.本文是我对此问题的一个回答的归档版.相比原回答有所内容增加. 一. 目的本文的目的是给出一个简单的,平 ...

随机推荐

java提前工作、第一个程序
java提前工作我们学习编程肯定会运用到相应的软件在这里我个人推荐 eclipse.idea 这里的软件呢都是用我们的java编程出来的,那它也需要用java来支持他的开发环境这里就运用到 ...
【Golang】程序如何优雅的退出？
1. 背景项目开发过程中,随着需求的迭代,代码的发布会频繁进行,在发布过程中,如何让程序做到优雅的退出? 为什么需要优雅的退出? 你的 http 服务,监听端口没有关闭,客户的请求发过来了,但处理了 ...
ebook下载 | 灵雀云发布《企业高管IT战略指南——为何选择容器与Kubernetes》
发送关键词[高管指南]至灵雀云公众号,立即下载完整版电子书 "本书将提供企业领导者/IT高管应该了解的,所有关于容器技术和Kubernetes的基础认知和关键概念,突破技术语言屏障,全面梳理 ...
5种在TypeScript中使用的类型保护
摘要:在本文中,回顾了TypeScript中几个最有用的类型保护,并通过几个例子来了解它们的实际应用. 本文分享自华为云社区<如何在TypeScript中使用类型保护>,作者:Ocean2 ...
3D可视化在化工领域的应用及案例分享
2020年,中办.国办印发的<关于全面加强危险化学品安全生产工作的意见>中重点提出应加快"推进化工园区安全生产信息化.智能化平台建设,实现对园区内企业.重点场所.重大危险源.基础 ...
（win环境）使用Electron打造一个桌面应用翻译小工具
初始化项目 npm init 修改package.json {"name": "trans","version": "1.0.0& ...
Could not transfer artifact xxx from/to xxx解决方案
maven中默认的镜像加载是这个在setting.xml文件中 <mirror> <id>nexus</id> <mirrorOf>*</mir ...
NC20012 [HEOI2014]南园满地堆轻絮
NC20012 [HEOI2014]南园满地堆轻絮题目题目描述小 Z 是 ZRP(Zombies' Republic of Poetry,僵尸诗歌共和国)的一名诗歌爱好者,最近他研究起了诗词音律 ...
9.2 DAG上的动态规划
在有向无环图上的动态规划是学习动态规划的基础,很多问题都可以转化为DAG上的最长路,最短路或路径计数问题 9.2.1 DAG模型嵌套矩形问题: 矩形之间的可嵌套关系是一种典型的二元关系,二元关系可以 ...
&&与||的优先级比较
&&与||的优先级比较类似于一种思维体操,更多的是造成矛盾,使得两者因为先后顺序的不同而造成的不同结果,当然有时候需要注意c语言中的短路运算. 方法1. 代码如下: 点击查看代码 #i ...

DL基础：cs231n assignment 2

cs231n assignment 2

fully-connected nets

基本思想

编程细节

复习multiclass svm loss和softmax loss

multiclass svm loss & derivative

softmax loss & derivative

batch normalization

基本思想

编程细节

dropout

基本思想

编程细节

convolutional networks

基本思想

convolution

max pooling

spatial batch normalization

编程细节

PyTorch quick start

TensorFlow quick start

DL基础：cs231n assignment 2的更多相关文章

随机推荐

热门专题