批量梯度下降的逻辑回归可以参考这篇文章:http://blog.csdn.net/pakko/article/details/37878837

看了一些Scala语法后,打算看看MlLib的机器学习算法的并行化,那就是逻辑回归,找到package org.apache.spark.mllib.classification下的LogisticRegressionWithSGD这个类,直接搜train()函数。

  def train(
input: RDD[LabeledPoint],
numIterations: Int,
stepSize: Double,
miniBatchFraction: Double,
initialWeights: Vector): LogisticRegressionModel = {
new LogisticRegressionWithSGD(stepSize, numIterations, 0.0, miniBatchFraction)
.run(input, initialWeights)
}

发现它调用了GeneralizedLinearAlgorithm下的一个run函数,这个类GeneralizedLinearAlgorithm是个抽象类,并且在GeneralizedLinearAlgorithm.scala文件下,并且类LogisticRegressionWithSGD是继承了GeneralizedLinearAlgorithm

  def run(input: RDD[LabeledPoint], initialWeights: Vector): M = {

    if (numFeatures < 0) {
numFeatures = input.map(_.features.size).first()
} if (input.getStorageLevel == StorageLevel.NONE) {
logWarning("The input data is not directly cached, which may hurt performance if its"
+ " parent RDDs are also uncached.")
} // Check the data properties before running the optimizer
if (validateData && !validators.forall(func => func(input))) {
throw new SparkException("Input validation failed.")
} /**
* Scaling columns to unit variance as a heuristic to reduce the condition number:
*
* During the optimization process, the convergence (rate) depends on the condition number of
* the training dataset. Scaling the variables often reduces this condition number
* heuristically, thus improving the convergence rate. Without reducing the condition number,
* some training datasets mixing the columns with different scales may not be able to converge.
*
* GLMNET and LIBSVM packages perform the scaling to reduce the condition number, and return
* the weights in the original scale.
* See page 9 in http://cran.r-project.org/web/packages/glmnet/glmnet.pdf
*
* Here, if useFeatureScaling is enabled, we will standardize the training features by dividing
* the variance of each column (without subtracting the mean), and train the model in the
* scaled space. Then we transform the coefficients from the scaled space to the original scale
* as GLMNET and LIBSVM do.
*
* Currently, it's only enabled in LogisticRegressionWithLBFGS
*/
val scaler = if (useFeatureScaling) {
new StandardScaler(withStd = true, withMean = false).fit(input.map(_.features))
} else {
null
} // Prepend an extra variable consisting of all 1.0's for the intercept.
// TODO: Apply feature scaling to the weight vector instead of input data.
val data =
if (addIntercept) {
if (useFeatureScaling) {
input.map(lp => (lp.label, appendBias(scaler.transform(lp.features)))).cache()
} else {
input.map(lp => (lp.label, appendBias(lp.features))).cache()
}
} else {
if (useFeatureScaling) {
input.map(lp => (lp.label, scaler.transform(lp.features))).cache()
} else {
input.map(lp => (lp.label, lp.features))
}
} /**
* TODO: For better convergence, in logistic regression, the intercepts should be computed
* from the prior probability distribution of the outcomes; for linear regression,
* the intercept should be set as the average of response.
*/
val initialWeightsWithIntercept = if (addIntercept && numOfLinearPredictor == 1) {
appendBias(initialWeights)
} else {
/** If `numOfLinearPredictor > 1`, initialWeights already contains intercepts. */
initialWeights
} val weightsWithIntercept = optimizer.optimize(data, initialWeightsWithIntercept) //这里进入优化 val intercept = if (addIntercept && numOfLinearPredictor == 1) {
weightsWithIntercept(weightsWithIntercept.size - 1)
} else {
0.0
} var weights = if (addIntercept && numOfLinearPredictor == 1) {
Vectors.dense(weightsWithIntercept.toArray.slice(0, weightsWithIntercept.size - 1))
} else {
weightsWithIntercept
} /**
* The weights and intercept are trained in the scaled space; we're converting them back to
* the original scale.
*
* Math shows that if we only perform standardization without subtracting means, the intercept
* will not be changed. w_i = w_i' / v_i where w_i' is the coefficient in the scaled space, w_i
* is the coefficient in the original space, and v_i is the variance of the column i.
*/
if (useFeatureScaling) {
if (numOfLinearPredictor == 1) {
weights = scaler.transform(weights)
} else {
/**
* For `numOfLinearPredictor > 1`, we have to transform the weights back to the original
* scale for each set of linear predictor. Note that the intercepts have to be explicitly
* excluded when `addIntercept == true` since the intercepts are part of weights now.
*/
var i = 0
val n = weights.size / numOfLinearPredictor
val weightsArray = weights.toArray
while (i < numOfLinearPredictor) {
val start = i * n
val end = (i + 1) * n - { if (addIntercept) 1 else 0 } val partialWeightsArray = scaler.transform(
Vectors.dense(weightsArray.slice(start, end))).toArray System.arraycopy(partialWeightsArray, 0, weightsArray, start, partialWeightsArray.size)
i += 1
}
weights = Vectors.dense(weightsArray)
}
} // Warn at the end of the run as well, for increased visibility.
if (input.getStorageLevel == StorageLevel.NONE) {
logWarning("The input data was not directly cached, which may hurt performance if its"
+ " parent RDDs are also uncached.")
} // Unpersist cached data
if (data.getStorageLevel != StorageLevel.NONE) {
data.unpersist(false)
} createModel(weights, intercept)
}

在上面代码中的optimizer.optimize,传入了数据data和初始化的theta,然后optimizer在LogisticRegressionWithSGD中被初始化为:

class LogisticRegressionWithSGD private[mllib] (
private var stepSize: Double,
private var numIterations: Int,
private var regParam: Double,
private var miniBatchFraction: Double)
extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable { private val gradient = new LogisticGradient()
private val updater = new SquaredL2Updater()
@Since("0.8.0")
override val optimizer = new GradientDescent(gradient, updater)
.setStepSize(stepSize)
.setNumIterations(numIterations)
.setRegParam(regParam)
.setMiniBatchFraction(miniBatchFraction)
override protected val validators = List(DataValidators.binaryLabelValidator) /**
* Construct a LogisticRegression object with default parameters: {stepSize: 1.0,
* numIterations: 100, regParm: 0.01, miniBatchFraction: 1.0}.
*/
@Since("0.8.0")
def this() = this(1.0, 100, 0.01, 1.0) override protected[mllib] def createModel(weights: Vector, intercept: Double) = {
new LogisticRegressionModel(weights, intercept)
}
}

optimizer被赋值为GradientDescent(gradient, updater),然后又看GradientDescent这个类:

class GradientDescent private[spark] (private var gradient: Gradient, private var updater: Updater)
extends Optimizer with Logging { private var stepSize: Double = 1.0
private var numIterations: Int = 100
private var regParam: Double = 0.0
private var miniBatchFraction: Double = 1.0
private var convergenceTol: Double = 0.001 ...
@DeveloperApi
def optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Vector = {
val (weights, _) = GradientDescent.runMiniBatchSGD(
data,
gradient,
updater,
stepSize,
numIterations,
regParam,
miniBatchFraction,
initialWeights,
convergenceTol)
weights
}
}

发现调用的是随机梯度下降的miniBatch方法,runMiniBatchSGD:

  def runMiniBatchSGD(
data: RDD[(Double, Vector)],
gradient: Gradient,
updater: Updater,
stepSize: Double,
numIterations: Int,
regParam: Double,
miniBatchFraction: Double,
initialWeights: Vector,
convergenceTol: Double): (Vector, Array[Double]) = { // convergenceTol should be set with non minibatch settings
if (miniBatchFraction < 1.0 && convergenceTol > 0.0) {
logWarning("Testing against a convergenceTol when using miniBatchFraction " +
"< 1.0 can be unstable because of the stochasticity in sampling.")
} val stochasticLossHistory = new ArrayBuffer[Double](numIterations)
// Record previous weight and current one to calculate solution vector difference var previousWeights: Option[Vector] = None
var currentWeights: Option[Vector] = None val numExamples = data.count() // if no data, return initial weights to avoid NaNs
if (numExamples == 0) {
logWarning("GradientDescent.runMiniBatchSGD returning initial weights, no data found")
return (initialWeights, stochasticLossHistory.toArray)
} if (numExamples * miniBatchFraction < 1) {
logWarning("The miniBatchFraction is too small")
} // Initialize weights as a column vector
var weights = Vectors.dense(initialWeights.toArray)
val n = weights.size /**
* For the first iteration, the regVal will be initialized as sum of weight squares
* if it's L2 updater; for L1 updater, the same logic is followed.
*/
var regVal = updater.compute(
weights, Vectors.zeros(weights.size), 0, 1, regParam)._2 //计算正则化的值 var converged = false // indicates whether converged based on convergenceTol
var i = 1
while (!converged && i <= numIterations) { //迭代开始,在小于最大迭代数的时候不断运行
val bcWeights = data.context.broadcast(weights)
// Sample a subset (fraction miniBatchFraction) of the total data
// compute and sum up the subgradients on this subset (this is one map-reduce)
val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i)
.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(
seqOp = (c, v) => {
// c: (grad, loss, count), v: (label, features)
val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) //计算一个batch中每条数据的梯度
(c._1, c._2 + l, c._3 + 1)
},
combOp = (c1, c2) => {
// c: (grad, loss, count)
(c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) //将batch中所有数据的梯度相加,损失函数值相加,记录batch的size
}) if (miniBatchSize > 0) {
/**
* lossSum is computed using the weights from the previous iteration
* and regVal is the regularization value computed in the previous iteration as well.
*/
stochasticLossHistory.append(lossSum / miniBatchSize + regVal) //原来损失函数是这样计算batch的总损失值除以batchSize再加上正则化值
val update = updater.compute(
weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble), //更新权重和下次的正则化值
stepSize, i, regParam)
weights = update._1
regVal = update._2 previousWeights = currentWeights
currentWeights = Some(weights)
if (previousWeights != None && currentWeights != None) {
converged = isConverged(previousWeights.get,
currentWeights.get, convergenceTol)
}
} else {
logWarning(s"Iteration ($i/$numIterations). The size of sampled batch is zero")
}
i += 1
} logInfo("GradientDescent.runMiniBatchSGD finished. Last 10 stochastic losses %s".format(
stochasticLossHistory.takeRight(10).mkString(", "))) (weights, stochasticLossHistory.toArray) }

发现要对Batch中每一条数据计算梯度,调用的是gradient.compute函数,对于二值分类:

  override def compute(
data: Vector,
label: Double,
weights: Vector,
cumGradient: Vector): Double = {
val dataSize = data.size // (weights.size / dataSize + 1) is number of classes
require(weights.size % dataSize == 0 && numClasses == weights.size / dataSize + 1)
numClasses match {
case 2 =>
/**
* For Binary Logistic Regression.
*
* Although the loss and gradient calculation for multinomial one is more generalized,
* and multinomial one can also be used in binary case, we still implement a specialized
* binary version for performance reason.
*/
val margin = -1.0 * dot(data, weights)
val multiplier = (1.0 / (1.0 + math.exp(margin))) - label
axpy(multiplier, data, cumGradient) //梯度的计算就是multiplier * data即,(h(x) - y)*x
if (label > 0) {
// The following is equivalent to log(1 + exp(margin)) but more numerically stable.
MLUtils.log1pExp(margin) //返回损失函数值
} else {
MLUtils.log1pExp(margin) - margin
}
... //下面有多分类,还没看
}

利用treeAggregate并行化batch所有数据后,得到gradientSum要除以miniBatchSize,然后进入updater.compute进行权重theta和正则化值的更新,为了下一次迭代:

@DeveloperApi
class SquaredL2Updater extends Updater {
override def compute(
weightsOld: Vector,
gradient: Vector,
stepSize: Double,
iter: Int,
regParam: Double): (Vector, Double) = {
// add up both updates from the gradient of the loss (= step) as well as
// the gradient of the regularizer (= regParam * weightsOld)
// w' = w - thisIterStepSize * (gradient + regParam * w)
// w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient //这个就是权重更新的迭代式子,这个是L2正则化后的更新,神奇的是(1 - thisIterStepSize * regParam)
val thisIterStepSize = stepSize / math.sqrt(iter) //记得更新式子不是w‘ = w - alpha*gradient alpha就是学习率也就是thisIterStepSize
val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector //你会发现alpha = thisIterStepSize = 1/sqrt(iter)也就是随着迭代次数越多学习率越低,迈出的步伐越小
brzWeights :*= (1.0 - thisIterStepSize * regParam)
brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
val norm = brzNorm(brzWeights, 2.0) (Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm) //正则化值就是w'的二范数的平方乘以正则化参数regParam乘以0.5
}
}

MlLib--逻辑回归笔记的更多相关文章

  1. Spark Mllib逻辑回归算法分析

    原创文章,转载请注明: 转载自http://www.cnblogs.com/tovin/p/3816289.html 本文以spark 1.0.0版本MLlib算法为准进行分析 一.代码结构 逻辑回归 ...

  2. Andrew Ng机器学习课程笔记(二)之逻辑回归

    Andrew Ng机器学习课程笔记(二)之逻辑回归 版权声明:本文为博主原创文章,转载请指明转载地址 http://www.cnblogs.com/fydeblog/p/7364636.html 前言 ...

  3. Coursera公开课笔记: 斯坦福大学机器学习第六课“逻辑回归(Logistic Regression)” 清晰讲解logistic-good!!!!!!

    原文:http://52opencourse.com/125/coursera%E5%85%AC%E5%BC%80%E8%AF%BE%E7%AC%94%E8%AE%B0-%E6%96%AF%E5%9D ...

  4. Spark MLlib回归算法------线性回归、逻辑回归、SVM和ALS

    Spark MLlib回归算法------线性回归.逻辑回归.SVM和ALS 1.线性回归: (1)模型的建立: 回归正则化方法(Lasso,Ridge和ElasticNet)在高维和数据集变量之间多 ...

  5. [吴恩达机器学习笔记]12支持向量机1从逻辑回归到SVM/SVM的损失函数

    12.支持向量机 觉得有用的话,欢迎一起讨论相互学习~Follow Me 参考资料 斯坦福大学 2014 机器学习教程中文笔记 by 黄海广 12.1 SVM损失函数 从逻辑回归到支持向量机 为了描述 ...

  6. ufldl学习笔记与编程作业:Logistic Regression(逻辑回归)

    ufldl学习笔记与编程作业:Logistic Regression(逻辑回归) ufldl出了新教程,感觉比之前的好,从基础讲起.系统清晰,又有编程实践. 在deep learning高质量群里面听 ...

  7. [机器学习] Coursera ML笔记 - 逻辑回归(Logistic Regression)

    引言 机器学习栏目记录我在学习Machine Learning过程的一些心得笔记,涵盖线性回归.逻辑回归.Softmax回归.神经网络和SVM等等.主要学习资料来自Standford Andrew N ...

  8. Coursera Deep Learning笔记 逻辑回归典型的训练过程

    Deep Learning 用逻辑回归训练图片的典型步骤. 笔记摘自:https://xienaoban.github.io/posts/59595.html 1. 处理数据 1.1 向量化(Vect ...

  9. Python_sklearn机器学习库学习笔记(三)logistic regression(逻辑回归)

    # 逻辑回归 ## 逻辑回归处理二元分类 %matplotlib inline import matplotlib.pyplot as plt #显示中文 from matplotlib.font_m ...

  10. Machine Learning 学习笔记 (1) —— 线性回归与逻辑回归

    本系列文章允许转载,转载请保留全文! [请先阅读][说明&总目录]http://www.cnblogs.com/tbcaaa8/p/4415055.html 1. 梯度下降法 (Gradien ...

随机推荐

  1. ORACLE使用GV_$TEMP_SPACE_HEADER统计临时表空使用情况不准确的问题

    以前写了一篇ORACLE临时表空间总结的文章, 里面介绍了几个查看临时表空间使用情况的脚本,其中一个脚本如下所示: SELECT TU.TABLESPACE_NAME                  ...

  2. 使用管道符在PowerShell中进行各种数据操作

    最近在培训PowerShell,在讲到Pipeline的时候,对于我这种长期和数据(数据库)打交道的人来说,觉得很实用,所以写此博文,记录一下. 无论是在Linux中写Bash脚本还是在Window上 ...

  3. SQL Server 2008 R2——使用FULL OUTER JOIN实现多表信息汇总

    =================================版权声明================================= 版权声明:原创文章 谢绝转载  请通过右侧公告中的“联系邮 ...

  4. jquery $.trim()去除字符串空格详解

    jquery $.trim()去除字符串空格详解 语法 jQuery.trim()函数用于去除字符串两端的空白字符. 作用 该函数可以去除字符串开始和末尾两端的空白字符(直到遇到第一个非空白字符串为止 ...

  5. WPF 自定义ListBox

     如题,要实现一个如下的列表,该如何实现? 在设计过程中,会遇到如下问题: 1.ListBox中ListBoxItem的模板设计 2.ListBox中ListBoxItem的模板容器设计 3.List ...

  6. 附加数据库后无法创建发布,error 2812 解决

    日前,由于业务需要,我要把一个数据库直接迁移到新的实例上 用的方法比较古老,就是直接停旧服务器,将数据文件复制到新服务器上,附加数据库 当然这个附加没有什么可说的了,但是在附加后需要将原来库上的发布重 ...

  7. [转]html超链接打开的窗口大小

    <a href="#" onclick="javascript:window.open('http://www.baidu.com','','height=20,w ...

  8. Facebook不相信所谓的员工能力等级。《长效商业英雄》(《哈佛商业评论》2016年11期),4星。

    老牌管理杂志.本期我给4星.以下是书中一些信息的摘抄: 1:爱因斯坦曾说:“任何傻瓜都能让事情更复杂,只有天才能让事情变简单.”单就这一点来看,乔布斯无疑是天才中的天才.#137 2:通过让苹果聚焦于 ...

  9. spring mvc 快速入门

    ---------- 转自尚学堂 高淇 --------- Spring  MVC 背景介绍 Spring 框架提供了构建 Web 应用程序的全功能 MVC 模块.使用 Spring 可插入的 MVC ...

  10. POJ2455Secret Milking Machine[最大流 无向图 二分答案]

    Secret Milking Machine Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 11865   Accepted ...