Proximal Algorithms
1. Introduction
Much like Newton's method is a standard tool for solving unconstrained smooth minimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed version of these problems. They are very generally applicable, but they turn out to be especially well-suited to problems of recent and widespread interest involving large or high-dimensional datasets.
Proximal methods sit at a higher level of abstraction than classical optimization algorithms like Newton’s method. In the latter, the base operations are low-level, consisting of linear algebra operations and the computation of gradients and Hessians. In proximal algorithms, the base operation is evaluating the proximal operator of a function, which involves solving a small convex optimization problem. These subproblems can be solved with standard methods, but they often admit closedform solutions or can be solved very quickly with simple specialized methods. We will also see that proximal operators and proximal algorithms have a number of interesting interpretations and are connected to many different topics in optimization and applied mathematics.
2. Algorithms
For following convex optimization problem
$$\min_{x}f(x)+g(x)$$
where $f$ is smooth, $g:R^n\rightarrow R\cup \{+\infty\}$ is closed proper convex.
Generally, there are several proximal methods to solve this problem.
- Proximal Gradient Method
$$x^{k+1}:=prox_{\lambda^kg}(x^k-\lambda^k \nabla f(x^k)$$
which converges with rate $O(1/k)$ when $\nabla f$ is Lipschitz continuous with constant L and step sizes are $ \lambda^k=\lambda\in(0,1/L]$. If $L$ is not known, we can use the following line search:
Typical value of $\beta$ is 1/2, and
$$\hat{f}_{\lambda}(x,y)=f(y)+\nabla f(y)^T(x-y)+(1/2\lambda)||x-y||_{2}^2$$
- Accelerated Proximal Gradient Method
$$y^{k+1}=x^k+\omega (x^k-x^{k-1})$$
$$x^{k+1}:=prox_{\lambda^kg}(y^{k+1}-\lambda^k \nabla f(y^{k+1}))$$
works for $\omega^k=k/(k+3)$ and similar line search as before.
This method has faster $O(1/k^2)$ convergence rate, originated with Nesterov (1983)
- ADMM
$$x^{k+1}:=prox_{\lambda f}(z^k-u^k)$$
$$z^{k+1}:=prox_{\lambda g}(x^{k+1}+u^k)$$
$$u^{k+1}:=u^k+x^{k+1}-z^{k+1}$$
basiclly, always works and has $O(1/k)$ rate in general. If $f$ and $g$ are both indicators, get a variation on alternating projections.
This method originates from Gabay, Mercier, Glowinski, Marrocco in 1970s.
3. Example
You are required to solve the following optimization problem
$$\min_{x}\frac{1}{2}x^TAx+b^Tx+c+\gamma||x||_{1}$$
where
$$A=\begin{pmatrix} 2 & 0.25 \\ 0.25 & 0.2 \end{pmatrix},\;b=\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix},\; c=-1.5, \; \lambda=0.2$$
As for this problem, if $f(x)=\frac{1}{2}x^TAx+b^Tx+c$ and $g(x)=\gamma||x||_{1}$ then
$$\nabla f(x)=Ax+b$$
If $g=||\cdot||_{1}$, then
$$prox_{\lambda f}(v)=(v-\lambda)_{+}-(-v-\lambda)_{+}$$
So the update step is
$$x^{k+1}:=prox_{\lambda^k \gamma||\cdot||_{1}}(x^k-\lambda^k \nabla f(x^k))$$
Finally, the 2D coutour plot of objective function and the trajectory of the value update are showed in following figure.
Additionally, when we use proximal gradient method based on exact line search to optimize the objective function, the result is:
We can find that proximal algorithm can solve this nonsmooth sonvex optimization problem successfully. And method based on exact line search can obtain faster convergence rate than one based on backtracking line search.
If you want to learn proximal algorithms further, you can read the book "Proximal Algorithms" by N. Parikh and S. Boyd, and corresponding website: http://web.stanford.edu/~boyd/papers/prox_algs.html
References
Parikh, Neal, and Stephen P. Boyd. "Proximal Algorithms." Foundations and Trends in optimization 1.3 (2014): 127-239.
Proximal Algorithms的更多相关文章
- Proximal Algorithms 6 Evaluating Proximal Operators
目录 一般方法 二次函数 平滑函数 标量函数 一般的标量函数 多边形 对偶 仿射集合 半平面 Box Simplex Cones 二阶锥 半正定锥 指数锥 Pointwise maximum and ...
- Proximal Algorithms 5 Parallel and Distributed Algorithms
目录 问题的结构 consensus 更为一般的情况 Exchange 问题 Global exchange 更为一般的情况 Allocation Proximal Algorithms 这一节,介绍 ...
- Proximal Algorithms 4 Algorithms
目录 Proximal minimization 解释 Gradient flow 解释1 最大最小算法 不动点解释 Forward-backward 迭代解释 加速 proximal gradien ...
- Proximal Algorithms 3 Interpretation
目录 Moreau-Yosida regularization 与次梯度的联系 改进的梯度路径 信赖域问题 Proximal Algorithms 这一节,作者总结了一些关于proximal的一些直观 ...
- Proximal Algorithms 1 介绍
目录 定义 解释 图形解释 梯度解释 一个简单的例子 Proximal Algorithms 定义 令\(f: \mathrm{R}^n \rightarrow \mathrm{R} \cup \{+ ...
- Proximal Algorithms 7 Examples and Applications
目录 LASSO proximal gradient method ADMM 矩阵分解 ADMM算法 多时期股票交易 随机最优 Robust and risk-averse optimization ...
- Proximal Algorithms 2 Properties
目录 可分和 基本的运算 不动点 fixed points Moreau decomposition 可分和 如果\(f\)可分为俩个变量:\(f(x, y)=\varphi(x) + \psi(y) ...
- Proximal Gradient Descent for L1 Regularization
[本文链接:http://www.cnblogs.com/breezedeus/p/3426757.html,转载请注明出处] 假设我们要求解以下的最小化问题: ...
- Matrix Factorization, Algorithms, Applications, and Avaliable packages
矩阵分解 来源:http://www.cvchina.info/2011/09/05/matrix-factorization-jungle/ 美帝的有心人士收集了市面上的矩阵分解的差点儿全部算法和应 ...
随机推荐
- openx _金额
1/work/openx/lib/max/Delivery/log.php MAX_Delivery_log_logAdImpression MAX_Delivery_log_logAdRequ ...
- Redis 安装 和 启动
Redis下载官网 http://download.redis.io/releases/ 本人下载了stable版 1:安装步骤 ># wget http://download.redis.i ...
- ubuntu禁用super(win)键
ubuntu在切换输入法使用super + space的时候经常会在按下super的时候弹出luncher,影响操作,解决方法为禁用super启动luncher. 1.安装compizconfig-s ...
- Windows7 64bit+python3.6环境下安装OpenCV3.3
安装opencv3.3 打开windows的Python扩展包网址 根据自己的系统选择下载,这里我选择的是 通过pip3安装该whl文件,使用如下命令 pip3 install 该whl的绝对路径 ...
- TCP/IP知识总结(TCP/IP协议族读书笔记四)
参考:http://blog.chinaunix.net/uid-26275986-id-4109679.html 继续!TCP的流量控制和拥塞控制. TCP相对UDP可靠的地方在于它的拥塞控制.流量 ...
- C++连接Oracle之OCCI(windows)
上一节我们讲过了ADO连接Oracle,这一节我们尝试通过OCCI的方式,来在windows平台下连接Oracle数据库,下一节讨论在Linux环境下通过OCCI的方式连接远程的Oracle数据库. ...
- How to Pronounce OF
How to Pronounce OF Tagged With: OF Reduction Study the OF reduction. There are many reductions in ...
- 基于 tensorflow 的 mnist 数据集预测
1. tensorflow 基本使用方法 2. mnist 数据集简介与预处理 3. 聚类算法模型 4. 使用卷积神经网络进行特征生成 5. 训练网络模型生成结果 how to install ten ...
- centos7 端口转发
firewall-cmd --add-masquerade firewall-cmd --add-forward-port=port=3001:proto=tcp:toaddr=172.17.18 ...
- express + mongodb 搭建一个简易网站 (五)
前面已经将导航中的“所有宝贝”页面连上了mongodb,现在我们就把其他的页面脸上数据库,将整个网站全部实现. 打开routes文件,找到jacket.js,将里面的代码修改如下: var expre ...