PaperReading20200223】的更多相关文章

CanChen ggchen@mail.ustc.edu.cn   AdaBatch Motivation: Current stochastic gradient descend methods use fixed batchsize. Small batchsize with small learning rate leads to fast convergence while large batchsize offers more parallelism. This paper propo…