# 数值优化：经典随机优化算法及其收敛性与复杂度分析

## 1 随机优化算法概述

$f(w) = \frac{1}{n}\sum_{i=1}^nf_i(w)= \frac{1}{n}\sum_{i=1}^n(w^T x_i – y_i)^2$

$w^{t+1} = w^t – \eta \nabla f(w^t) = w^t – \frac{2\eta}{n}\sum_{i=1}^nx_i\left((w^t)^Tx_i – y_i\right)$

## 2 随机梯度下降法

### 2.1 算法描述

$w^{t+1} = w^t – \eta^t \nabla f_{i^t}(w^t)$

$w^{t+1} = w^t – \eta^t\nabla f_{\mathcal{I^t}}(w^t) = w^t-\frac{\eta^t}{|\mathcal{I}^t|} \sum_{i\in\mathcal{I}^t} \nabla f_i (w^t)$

### 2.2 收敛性和计算复杂度分析

$$L$$-Lipschitz连续凸函数收敛性：假设目标函数$$f: \mathbb{R}^d\rightarrow \mathbb{R}$$是凸函数，并且$$L$$-Lipschitz连续，$$w^* = \underset{\lVert w\rVert\leqslant D}{\text{arg min}}f(w)$$，当步长$$\eta^t = \sqrt{\frac{D^2}{L^2t}}$$时，对于给定的迭代步数$$T$$，随机梯度下降法具有$$\mathcal{O}(\frac{1}{\sqrt{T}})$$的次线性收敛速率：

$\mathbb{E}[ \frac{1}{T}\sum_{t=1}^T f(w^t) – f(w^*) ] \leqslant \frac{LD}{\sqrt{T}}$

$\mathbb{E}[ f(w^T) – f(w^*) ] \leqslant \frac{2\beta G^2}{\alpha^2T}$

## 3 随机坐标下降法

### 3.1 算法描述

$w^{t+1}_{j^t} = w^t_{j^t} – \eta^t \nabla_{j^t}f(w^t)$

$w^{t+1}_j = w^t_j – \eta^t\nabla_{J^t} f(w^t)$

### 3.2 收敛性和计算复杂度分析

$|\nabla_j f(w + \delta e^j) – \nabla_j f(w)| \leqslant \beta_j |\delta|$

$\mathbb{E}[f(w^T) – f(w^*)] \leqslant \frac{2d\beta_{\text{max}}D^2}{T}$

$\mathbb{E}[f(w^T) – f(w^*)] \leqslant （1-\frac{\alpha}{d\beta_{\text{max}}}）^T (f(w^0) – f(w^*))$

\begin{aligned} w^{t+1}_j & = w^t_j – \eta^t\nabla_j f(w^t)\\ & = w^t_j – \frac{2\eta_t}{n}\sum_{i=1}^nx_{i,j}((w^t)^Tx_i – y_i) \end{aligned}

## 4 随机拟牛顿法

### 4.1 算法描述

$w^{t+1} = w^t – \eta^t M^t \left(\frac{1}{|\mathcal{I}^t|} \sum_{i\in \mathcal{I}^t}\nabla f_i(w^t) \right)$

\begin{aligned} &\delta^s_1 = w^s – w^{s-1} \\ &\delta^s_2 = \frac{1}{|\mathcal{I^t}|}\sum_{i\in \mathcal{I^t}}\nabla^2f_i(w^s)(w^s – w^{s-1}) \end{aligned}

$M \leftarrow(I – \rho^s\delta^s_1(\delta^s_2)^T)M(I – \rho^s\delta^s_2(\delta^s_1)^T) + \rho^s \delta^s_1(\delta^s_1)^T$

### 4.2 收敛性和计算复杂度分析

• 目标函数$$f(w)$$是二阶连续可微的；
• 存在$$\lambda_2 > \lambda_1 > 0$$，使得对于任意$$w\in \mathbb{R}^d$$$$\lambda_1 I \prec \nabla^2 f(w) \prec \lambda_2I$$
• 存在$$\mu_2 > \mu_1 > 0$$，使得对于任意$$w\in \mathbb{R}^d$$$$\mu_1 I \prec (\nabla^2 f(w))^{-1} \prec \mu_2I$$
• 存在$$G>0$$，使得对于任意$$w\in \mathbb{R}^d$$$$\mathbb{E}[\lVert \nabla f_{i}(w) \rVert^2] \leqslant G^2$$

$\mathbb{E}[f(w^T)-f(w^*)] \leqslant \frac{Q(a)}{T}$

$1 + \frac{2M}{b} + \frac{3b_H}{2bL}$

## 5 随机对偶坐标上升法

### 5.1 算法描述

$\underset{w\in\mathbb{R}^d}{\text{min}} f(w) = \frac{1}{n}\sum_{i=1}^n\phi_i(w^Tx_i) + \frac{\lambda}{2}\lVert w\rVert^2$

$\underset{\alpha \in\mathbb{R}^n}{\text{max}} D(\alpha) = \frac{1}{n}\sum_{i=1}^n – \phi_i^*(-\alpha_i) – \frac{\lambda n}{2}\left\lVert \frac{1}{\lambda n}\sum_{i=1}^n\alpha_ix_i \right\rVert^2$

$w(\alpha^*) = w^*, f(w^*) = D(\alpha^*)$

• 随机采一个样本$$i$$，计算

$\Delta \alpha_i = \underset{z}{\text{argmax}} \left\{ -\phi_i^*(-\alpha_i^t + z) – \frac{\lambda n}{2} \lVert w^t + \frac{1}{\lambda n} zx_i \rVert^2 \right\}$

• 更新对偶变量，$$\alpha^{t+1} = \alpha^t + \Delta \alpha_ie_i$$
• 更新原始变量，$$w^{t+1} = w^t + \frac{1}{\lambda n} \Delta \alpha_i x_i$$

## 6 随机优化算法总结

• 当数据量较大时，随机梯度下降法比梯度下降法更高效。
• 如果目标函数是可分离的，随机坐标下降法比梯度下降法更高效。
• 如果目标函数是可分离的，并且数据维度较高，随机坐标下降法比随机梯度下降法更高效。
• 随机拟牛顿法的效率与随机梯度下降法的效率相同。

## 参考

• [1] Robbins H, Monro S. A stochastic approximation method[J]. The annals of mathematical statistics, 1951: 400-407.

• [2] Nesterov Y. Efficiency of coordinate descent methods on huge-scale optimization problems[J]. SIAM Journal on Optimization, 2012, 22(2): 341-362.

• [3] Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction[J]. Advances in neural information processing systems, 2013, 26.

• [4] Byrd R H, Hansen S L, Nocedal J, et al. A stochastic quasi-Newton method for large-scale optimization[J]. SIAM Journal on Optimization, 2016, 26(2): 1008-1031.

• [5] Shalev-Shwartz S, Zhang T. Stochastic dual coordinate ascent methods for regularized loss minimization[J]. Journal of Machine Learning Research, 2013, 14(Feb):567-599.

• [6] 刘浩洋，户将等. 最优化：建模、算法与理论[M]. 高教出版社, 2020.

• [7] 刘铁岩，陈薇等. 分布式机器学习：算法、理论与实践[M]. 机械工业出版社, 2018.

原文作者：orion-orion
原文地址: https://www.cnblogs.com/orion-orion/p/16403084.html
本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。