STAT 207: Advanced Optimization Topics¶

In general,

  • unconstrained optimization problems are easier to solve than constrained optimization problems,

  • equality constrained problems are easier to solve than inequality constrained problems

Inequality constraint - interior point method¶

  • We consider the constrained optimization $$ \text{min} f_0(x) $$ subject to $$ \begin{aligned} f_j(x) &\le 0,\quad 1 \leq j \leq m; \\ Ax &= b, \end{aligned} $$ where $f_0,...,f_m: R^n \rightarrow R$ are convex and twice continuously differentiable, and $A$ as full row rank.

  • Assume the problem is solvable with optimal point $\mathbf{x}^*$ and optimal value $f_0(\mathbf{x}^*) = p^*$.

  • KKT conditions:

$$ \begin{aligned} \mathbf{A}\mathbf{x}^* = \mathbf{b},\; f_i(\mathbf{x}^*) &\le 0, i=1,..,m \quad \text{(primal feasibility)} \\ \lambda_i^* &\ge 0, i=1,..,m \\ \nabla f_0(\mathbf{x}^*) + \sum_{i=1}^m \lambda_i^* \nabla f_i(\mathbf{x}^*) + \mathbf{A}^T\boldsymbol{\nu}^* &= 0 \quad \text{(dual feasibility)} \\ \lambda_i^*f_i(\mathbf{x}^*) &= 0, \quad i=1,\ldots,m. \end{aligned} $$

Barrier method¶¶

  • Convert the problem to implicitly include the inequality constraints in the objective and minimize $$ f_0(\mathbf{x}) + \sum_{i=1}^m I_-(f_i(\mathbf{x})) $$ subject to $\mathbf{A}\mathbf{x} = \mathbf{b}$, where $$ I_-(u) = \begin{cases} 0 & \text{if } u \leq 0 \\ \infty & \text{if } u > 0 \end{cases}. $$

  • And to approximate $I_-$ by a differentiable function $$ \widehat{I}_-(u) = -(1/t)\log(-u), \quad u<0, $$ where $t>0$ is a parameter tuning the approximation accuracy. As $t$ increases, the approximation becomes more accurate.

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import warnings

# Ignore the specific warning
warnings.filterwarnings("ignore", category=RuntimeWarning)

def I_hat_minus(u, t):
    return -(1/t) * np.log(-u)

u = np.linspace(-3, 0, 100)
t_values = [0.5, 1, 2]

for t in t_values:
    plt.plot(u, I_hat_minus(u, t), label=f't={t}')

plt.xlabel('u')
plt.ylabel('I_hat_minus(u)')
plt.legend()
plt.title('Plot of I_hat_minus(u)')
plt.grid(False)
plt.show()
No description has been provided for this image
  • The barrier method solves a sequence of equality-constraint problems $$ \begin{aligned} \min \quad t f_0(\mathbf{x}) - \sum_{i=1}^{m} \log(-f_i(\mathbf{x})) \\ \text{subject to} \quad \mathbf{A}\mathbf{x} = \mathbf{b}, \end{aligned} $$ with increased parameter $t$ at each step and starting each Newton minimization at the solution for the previous value of $t$.

  • The function $\phi(\mathbf{x}) = - \sum_{i=1}^{m} \log(-f_i(\mathbf{x}))$ is called the logarithmic barrier or log barrier function.

  • Denote the solution at $t$ by $\mathbf{x}^*(t)$. Using the duality theory, we can show $$ f_0(\mathbf{x}^*(t)) - p^* \le m/t. $$

Barrier method

\begin{center} \includegraphics[width=0.8\textwidth]{barrier.jpg} \end{center}

  • Barrier method has to start from a strictly feasible point. We can find such a point by solving $$ \text{min} s $$ subject to $$ \begin{aligned} f_j(x) &\le s,\quad 1 \leq j \leq m; \\ Ax &= b, \end{aligned} $$ by the barrier method.

Penalty Method¶

  • Unlike the barrier method that works from the interior of the feasible region, the penalty method works from the outside of the feasible region inward.

  • Construct a continuous nonnegative penalty $p(x)$ that is 0 on the feasible region and positive outside it.

  • Optimize $$ f_0(x) + \lambda_k p(x) $$ for an increasing sequence $\lambda_k$.

Example (Linear Regression with Linear Constraints) Consider the regression problem of minimizing $\|Y - X\beta \|_2^2$ subject to the linear constraints $V\beta = d$. If we take the penalty function $p(\beta) = \|V\beta - d\|_2^2$, then we must minimize at each stage the function $$ h_k(\beta) = \|Y - X\beta\|_2^2 + \lambda_k \|V\beta - d\|_2^2. $$ Setting the gradient $$ \nabla h_k(\beta) = -2X^T(Y - X\beta) + 2\lambda_k V^T(V\beta - d) = 0 $$ yields the sequence of solutions $$ \beta_k = (X^TX + \lambda_kV^TV)^{-1}(X^TY + \lambda_kV^T d). $$

  • Ascent and descent properties of the penalty and barrier methods.

Proposition 16.2.1: Consider two real-valued functions $f(x)$ and $g(x)$ on a common domain and two positive constants $\alpha < \omega$. Suppose the linear combination $f(x) + \alpha g(x)$ attains its minimum value at $y$, and the linear combination $f(x) + \omega g(x)$ attains its minimum value at $z$. Then, $f(y) \leq f(z)$ and $g(y) \ge g(z)$.

  • Global convergence for the penalty method.

Proposition 16.2.2 Suppose that both the objective function $f(x)$ and the penalty function $p(x)$ are continuous on $\mathbb{R}^m$, and the penalized functions $h_k(x) = f(x) + \lambda_k p(x)$ are coercive on $\mathbb{R}^m$. Then, one can extract a corresponding sequence of minimum points $x_k$ such that $f(x_k) \le f(x_{k+1})$. Furthermore, any cluster point of this sequence resides in the feasible region $C = \{x : p(x) = 0\}$ and attains the minimum value of $f(x)$ within $C$. Finally, if $f(x)$ is coercive and possesses a unique minimum point in $C$, then the sequence $x_k$ converges to that point.

  • Global convergence for the barrier method.

Proposition 16.2.3 Suppose the real-valued function $f(x)$ is continuous on the bounded open set $U$ and its closure $V$. Additionally, suppose the barrier function $b(x)$ is continuous and coercive on $U$. If the tuning constants $\mu_k$ decrease to 0, then the linear combinations $h_k(x) = f(x) + \mu_k b(x)$ attain their minima at a sequence of points $x_k$ in $U$ satisfying the descent property $f(x_{k+1}) \leq f(x_k)$. Furthermore, any cluster point of the sequence furnishes the minimum value of $f(x)$ on $V$. If the minimum point of $f(x)$ in $V$ is unique, then the sequence $x_k$ converges to this point.

  • Possible defects of of the penalty and barrier methods:

    • iterations within iterations

    • choosing the tuning parameter sequence

    • numerical instability

In [4]:
## jupyter nbconvert --TagRemovePreprocessor.remove_input_tags='{"hide_code"}' --to pdf Opt_top.ipynb