PaperReading20200223

CanChen ggchen@mail.ustc.edu.cn AdaBatch Motivation: Current stochastic gradient descend methods use fixed batchsize. Small batchsize with small learning rate leads to fast convergence while large batchsize offers more parallelism. This paper propo…

【PaperReading20200223】的更多相关文章

PaperReading20200223