VectorsMatrices

STAT 207: Vectors and matrices¶

Zhe Fei (zhe.fei@ucr.edu)¶

NAS Chapter 6

Many scientific computational problems involve vectors and matrices.

The distinction between the set of real numbers, $\mathbb{R}$, and the set of floating-point numbers, $\mathbb{F}$, that we use in the computer has important implications for numerical computations.

We will consider the floating-point representation of elements of vectors and matrices and the computations in $\mathbb{F}$.

In multidimensional calculus, vector and matrix norms quantify notions of topology and convergence.

Norms are devices for deriving explicit bounds, theoretical developments in numerical analysis rely heavily on norms.
They are particularly useful in establishing convergence and in estimating rates of convergence of iterative methods for solving linear and nonlinear equations.
Norms also arise in almost every other branch of theoretical numerical analysis. Functional analysis, which deals with infinite-dimensional vector spaces, uses norms on functions.

Elementary Properties of Vector Norms¶

Euclidean vector norm: $$ \| x\|_2 = \sqrt{\sum_{i=1}^m x_i^2 } $$

A norm on $R^m$ is formally defined by four properties:

(a) $\| x\| \ge 0$,

(b) $\| x\| = 0$ if and only if $x = \mathbf{0}$,

(d) $\| x+y\| \le \|x\| + \|y\|$.

$\ell$-1 norm: $$ \| x\|_1 = \sum_{i=1}^m |x_i|, $$

$\ell$-$\infty$ norm: $$ \| x\|_\infty = \max_{i} |x_i|. $$

For each of the norms $\| x\|_p, p = 1, 2$, and $\infty$, a sequence of vectors $x_n$ converges to a vector $y$ if and only if each component sequence $x_{ni}$ converges to $y_i$. Thus, all three norms give the same topology on $R^m$.

NAS Proposition 6.2.1 Let $\| x\|$ be any norm on $R^m$. Then there exist positive constants $k_l$ and $k_u$ such that $k_l\| x\|_1 \le \| x\|\le k_u\| x\|_1$ holds for all $x \in R^m$.

$\sup_{x\ne 0} \| x\|/\| x\|^*$ is finite for any pair of norms $\| x\|$ and $\| x\|^*$.
For $p<q$ from $\{1,2,\infty\}$,

$$ \| x\|_q \le \| x\|_p $$

$$ \| x\|_p \le m^{1/p - 1/q} \| x\|_q $$

Elementary Properties of Matrix Norms¶

An $m\times m$ matrix $A = (a_{ij})$ is simply a vector in $R^{m^2}$.

It is preferred that a matrix norm is compatible with matrix multiplication.

In addition to (a) - (d), we require

(e) $\|AB\| \le \|A\| \cdot \|B\|$

for any product of $m\times m$ matrices $A$ and $B$.

Frobenius norm: $\|A\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^m a_{ij}^2 } = \sqrt{tr(AA^T) } = \sqrt{tr(A^TA) }$
Matrix norm induced from any vector norm $\|x\|$ on $R^{m}$: $$ \|A\| = \sup_{x\ne 0} \frac{\|Ax\|}{\|x\|} = \sup_{\|x\| = 1} \|Ax\|. $$
What is $\|I\|$ and $\|I\|_F$?

In infinite-dimensional settings, induced matrix norms are called operator norms.

NAS Proposition 6.2.1 If $A = (a_{ij})$ is an $m \times m$ matrix, then

(a) $\|A\|_1 = \max_j \sum_i |a_{ij}|$,

(b) $\|A\|_2 = \sqrt{\rho(A^TA)}$, which reduces to $\rho(A)$ if $A$ is symmetric,

(d) $\|A\|_\infty = \max_i\sum_j|a_{ij}|$.

Norm Preserving Linear Transformations¶

Orthogonal matrix: $OO^T = I$;
Orthonormal set of vectors $S$: every vector in S has magnitude 1 and the set of vectors are mutually orthogonal.

Following $(\det O)^2 = 1$, we can divide the orthogonal matrices into rotations with $\det O = 1$ and reflections with $\det O = -1$. Take $2 \times 2$ matrices for example,

Rotation of angle $\theta$ can be presented as: $$ O = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \\ \end{pmatrix} $$
Reflection of a point across the line at angle $\theta/2$ with the $x$ axis: $$ O = \begin{pmatrix} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \\ \end{pmatrix} $$

Remarks

The set of orthogonal matrices forms a group under matrix multiplication.
The identity matrix is the unit of the group.
The rotations constitute a subgroup of the orthogonal group, but the reflections do not since the product of two reflections is a rotation.

Orthogonal transformations preserve inner products and Euclidean norms: $(Ou)^TOv = u^TO^TOv = u^Tv$.

As a result, all eigenvalues of $O$ lie on the boundary of the unit circle.

Norm invariance for vectors also leads to norm invariance for matrices.

$\|OA\|_2^2 = \|A\|_2^2$
$\|AO\|_2^2 = \|A\|_2^2$
$\|OA\|_F^2 = \|A\|_F^2$
$\|AO\|_F^2 = \|A\|_F^2$

Householder matrices¶

Given a unit vector $u$, the Householder matrix $H = I−2uu^T$ represents a reflection across the plane perpendicular to $u$.

$H$ is orthogonal and symmetric, $$ HH^T = I - 4uu^T + 4u\|u\|_2^2 u^T = I. $$

$Hv = v$ whenever $v$ lies in the plane perpendicular to $u$.
$Hu = -u$.
$H$ has one eigenvalue equal to $-1$ and all others equal to 1.

Example: Let $$ u = \begin{pmatrix} 0 \\ 1 \end{pmatrix}. $$

Then $$ H = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} - 2\begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}. $$

For any vector $v = \begin{pmatrix} x \\ y \end{pmatrix}$, the transformation is $$ Hv = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x \\ -y \end{pmatrix}. $$

Applications:

Matrix decomposition: QR, SVD,
Orthogonalization:
Stabilizing Numerical Algorithms:

Iterative Solution of Linear Equations¶

Many numerical problems involve iterative schemes of the form $$ x_n = Bx_{n-1} + w \tag{1} $$ for solving the vector-matrix equation $(I-B)x = w$. The map $f(x) = Bx + w$ satisfies $$ \| f(y) - f(x)\| = \|B (y-x) \| \le \|B\|\cdot \|y - x\| $$ and is contractive for a vector norm $\|x\|$ if $\|B\|<1$ holds for the induced matrix norm.

NAS Proposition 6.5.1 Let $B$ be an arbitrary matrix with spectral radius $\rho(B)$ (largest absolute value of the eigenvalues). Then $\rho(B) < 1$ if and only if $\| B \| < 1$ for some induced matrix norm. The inequality $\| B \| < 1$ implies:

(a) $\lim_{n\rightarrow\infty} \left| B^n \right| = 0$,

(b) $\left( I - B \right)^{-1} = \sum_{n=0}^{\infty} B^n$,

Linear iteration is especially useful in solving the equation $Ax = b$ for $x$ when an approximation $C$ to $A^{−1}$ is known, let $B = I-CA$ and $w = Cb$ and iterate equation (1).

Jacobi’s Iteration¶

Jacobi’s method offers a typical example of this strategy.
Suppose $A = (a_{ij})$ is strictly diagonally dominant ($|a_{ii}|>\sum_{j\ne i} |a_{ij}|$ ), let $D = \text{diag}(a_{ii})$ be the diagonal matrix. Then $C = D^{-1}$ is an approximate inverse of $A$, and $B = I-CA$ satisfies $\|B\|_\infty <1$.

In [1]:

import numpy as np

def jacobi(A, b, x0, tol=1e-6, max_iter=1000):
    """
    Solves the system of linear equations Ax = b using Jacobi's method.
    """
    n = len(A)
    C = np.diag(1/np.diag(A))
    B = np.identity(n) - C@A
    w = C@b
    
    x = B@x0 + w
    k = 1
    while max(abs(x-x0))> tol:
        x0 = x
        x = B@x0 + w
        k += 1
    
    if k<max_iter:
        print(f'Converged in {k} iterations.')
        return x
    else:
        print(f'Maximum iterations reached.')
        return x

In [2]:

# Example usage
A = np.array([[4, 1, 1], [2, 5, 1], [1, 2, 4]])
b = np.array([4, 1, 2])
x0 = np.array([0, 0, 0])

x = jacobi(A, b, x0)
print(f'Solution: {x}')

print(A@x)

Converged in 28 iterations.
Solution: [ 0.96874976 -0.26562527  0.39062468]
[3.99999845 0.99999787 1.99999795]

Landweber’s Iteration Scheme¶

In practice, the approximate inverse $C$ can be rather crude. $C = \epsilon A^T$ for $\epsilon$ small and positive.

$A^TA$ is positive definite with eigenvalues $0< \lambda_1\le ...\le \lambda_m$.
$I - \epsilon A^TA$ has eigenvalues $1-\epsilon \lambda_1, ... 1-\epsilon \lambda_m$.
Need $1-\epsilon \lambda_m > -1$, so that $\|I - \epsilon A^TA\|_2 < 1$.

Equilibrium Distribution of a Markov Chain¶

Movement among the $m$ states of a Markov chain is governed by its $m\times m$ transition matrix $P = (p_{ij})$, whose entries are nonnegative and satisfy $\sum_j p_{ij} = 1$ for all $i$.

A column vector $x$ with nonnegative entries and norm $\|x\|_1 = \sum_i x_i = 1$ is said to be an equilibrium (stationary) distribution for $P$ provided $x^TP = x^T$, or equivalently $Qx = x$ for $Q = P^T$.

Let $S = \{x: x_i\ge 0, i=1,2,..m, \sum_i x_i =1 \}$. Assume for some power $k$, $Q^k$ has all positive entries, then consider the matrix $R = Q^k - c\mathbf{1}\mathbf{1}^T$.

The map $x \rightarrow Q^k x$ is contractive on $S$ with unique fixed point $x_\infty$.
$Qx_\infty = x_\infty$.
$\lim_{n\rightarrow \infty} Q^n x = x_\infty$ for all $x\in S$.

This power method is also used in PageRank:

Eld$\acute{e}$n L (2007) Matrix Methods in Data Mining and Pattern Recognition. SIAM, Philadelphia
Langville AN, Meyer CD (2006) Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton NJ

Condition Number of a Matrix¶

Consider the matrix $$ A = \begin{pmatrix} 10 & 7& 8& 7 \\ 7& 5& 6& 5 \\ 8& 6& 10& 9\\ 7& 5& 9& 10 \end{pmatrix} $$ that is symmetric and positive definite, and $Ax = b$.

for $b = (32,23,33,31)^T$, we find $x = (1,1,1,1)^T$,
for $b+\Delta b=(32.1, 22.9, 33.1, 30.9)^T$, the solution is violently perturbed $x+\Delta x = (9.2,−12.6, 4.5,−1.1)^T$,
for $b = (4,3,3,1)^T$, $x = (1,-1,1,-1)^T$,
if we perturb $A$ to $A + 0.01I$, the solution is $x+\Delta x = (.59,−.32, .82,−.89)^T$.

How to explain these patterns? Define the condition number $$ \text{cond}(A) = \|A\|\cdot\|A^{-1} \| $$ of matrix $A$ relative to the given norm. We can have

$$ \frac{\|\Delta x\|}{\|x\|}\le \text{cond}(A) \frac{\|\Delta b\|}{\|b\|}, $$ if $Ax=b$ and $A(x+\Delta x) = b + \Delta b$.
$$ \frac{\|\Delta x\|}{\|x + \Delta x\|}\le \text{cond}(A) \frac{\|\Delta A\|}{\|A\|}, $$ if $Ax=b$ and $(A + \Delta A)(x+\Delta x) = b$.

The condition number $\text{cond}_2(A)$ relative to the matrix norm $\|A\|_2$ is the ratio of the largest and smallest eigenvalues of $A$, which are $\lambda_1 = 0.01015, \lambda_4 = 30.2887$. Therefore, $\text{cond}_2(A) = 2984$.

In [ ]: