Eigen

STAT 207: Eigenvalues and Eigenvectors¶

Zhe Fei (zhe.fei@ucr.edu)¶

NAS Chapter 8

Finding the eigenvalues and eigenvectors of a symmetric matrix is one of the basic tasks of computational statistics.

Application 1: PCA a random $m$-vector $X$ with covariance matrix $\Omega$, $$ \Omega = UDU^T, $$ where $D = \text{diag}(\lambda_1,\lambda_2,..,\lambda_m)$ and $U$ is the orthogonal matrix of eigenvectors.

Application 2: If $\Omega$ is the covariance matrix of a normally distributed random vector $X$ with mean $E(X) = \mu$, then the quadratic form and the determinant $$(x - \mu)\Omega^{-1}(x - \mu) = [U^t(x - \mu)]^tD^{-1}U^t(x - \mu) $$ $$ \det(\Omega) = \prod_{i} \lambda_i$$ appearing in the density of $X$.

Jacobi’s Method¶

ideas for proving convergence of iterative methods in general
easy to implement with parallel computing

Basic idea:

repeatedly applying a sequence of similarity transformations to the matrix until its off-diagonal elements are sufficiently small to be considered zero.
the diagonal elements of the matrix represent its eigenvalues,
the rows (or columns) of the transformed matrix correspond to its eigenvectors.

Apply the rotation $$ R = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \\ \end{pmatrix} $$ to any row $k$ and column $\ell$ of the $m \times m$ symmetrix matrix $A = (a_{ij})$. WLOG, we take $k=1, \ell=2$ and $$ U = \begin{pmatrix} R & 0 \\ 0^T & I_{m-2} \\ \end{pmatrix}, $$ then the upper-left block of $B = U^TAU$ becomes: $$ \begin{aligned} b_{11} &= a_{11} \cos^2\theta - 2a_{12}\cos\theta\sin\theta + a_{22}\sin^2\theta \\ b_{12} &= (a_{11} - a_{22})\cos\theta\sin\theta + a_{12}(\cos^2\theta - \sin^2\theta) \\ b_{22} &= a_{11}\sin^2\theta + 2a_{12}\cos\theta\sin\theta + a_{22}\cos^2\theta. \end{aligned} $$ Further, $$ b_{12} = \frac{a_{11}-a_{22}}{2} \sin(2\theta) + a_{12}\cos(2\theta). $$ To force $b_{12} = 0$, $$ \left\{ \begin{array}{ll} \tan(2\theta) = \frac{2a_{12}}{a_{22}-a_{11}} & \text{if } a_{22}-a_{11} \ne 0 \\ \theta = \pi/4 & \text{if } a_{22}-a_{11} = 0 \end{array} \right. $$ And $$ \begin{aligned} b_{11} = a_{11} - a_{12}\tan(\theta), b_{22} = a_{22} + a_{12}\tan(\theta). \end{aligned} $$ Because $\|B\|_F^2 = \|A\|_F^2$, $$ b_{11}^2 + b_{22}^2 = a_{11}^2 + a_{22}^2 + 2a_{12}^2, $$ which implies the off-diagonal part $$ \text{off}(B) = \text{off}(A) - 2a_{12}^2. $$ Finally, $$ \Omega_n = U_n^T...U_1^T\Omega U_1...U_n. $$

Classical Jacobi: search for the largest $|a_{ij}|$ at each iteration. What is the rate of convergence?

Parallel Jacobi:

Distribute rows of $A$ to multiple processors.
Perform computation based on the owner-computes rule.
Perform all-all broadcasting after each iteration.

reference

In [1]:

import numpy as np

def jacobi_eigenvalue(A, tol=1e-8):
    # initialize matrix V as identity matrix
    V = np.eye(A.shape[0])
    while True:
        # get the index of the largest off-diagonal element
        max_idx = np.argmax(np.abs(np.triu(A, 1))) # flattened upper tri mat
        i, j = divmod(max_idx, A.shape[1])

        # calculate the rotation angle
        if A[i,i] == A[j,j]:
            theta = np.pi / 4
        else:
            theta = 0.5 * np.arctan(2 * A[i,j] / (A[i,i] - A[j,j]))

        # create the rotation matrix
        S = np.eye(A.shape[0])
        S[i,i] = np.cos(theta)
        S[j,j] = np.cos(theta)
        S[i,j] = -np.sin(theta)
        S[j,i] = np.sin(theta)

        # update A and V with the rotation
        A = S.T @ A @ S
        V = V @ S

        # check if the off-diagonal elements are below tolerance
        if np.max(np.abs(np.triu(A, 1))) < tol:
            break

    # return the diagonal elements of A as eigenvalues
    eigenvalues = np.diag(A)

    # sort eigenvalues and eigenvectors
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = V[:,idx]

    return eigenvalues, eigenvectors

In [2]:

# generate a random matrix
A = np.random.randint(1, 10, size=(4, 4))

# Make the matrix symmetric
A = np.tril(A) + np.tril(A, -1).T

print(A)

[[6 2 7 4]
 [2 8 5 7]
 [7 5 9 8]
 [4 7 8 3]]

In [4]:

w, v = jacobi_eigenvalue(A)

print(w)
print(v)

[23.57512911  5.93479954  0.09322972 -3.60315837]
[[ 0.40923118 -0.58156807  0.70278377  0.02008474]
 [ 0.46710256  0.73985457  0.34981776 -0.33475059]
 [ 0.61999028 -0.29769955 -0.59550827 -0.41515892]
 [ 0.47953842  0.16052656 -0.17056497  0.84568417]]

The Rayleigh Quotient¶

Definition: $$ R(x) = \frac{x^TAx}{x^Tx} $$ for $x \ne 0$.

Let $A$ have eigenvalues $\lambda_1,\ldots,\lambda_m$ and corresponding orthonormal eigenvectors $u_1,\ldots,u_m$.

With the unique presentation $x = \sum_{i=1}^m c_i u_i$, $$ R(x) = \frac{\sum_{i=1}^m \lambda_i c_i^2}{\sum_{i=1}^m c_i^2}. $$

Therefore, $\lambda_1 \leq R(x) \leq \lambda_m$ and the equality $R(u_m) = \lambda_m$.

Hence, $R(x)$ is maximized by $x = u_m$ and correspondingly minimized by $x = u_1$. The following generalizes this result.

NAS Proposition 8.3.1 (Courant-Fischer) Let $V_k$ be a $k$-dimensional subspace of $\mathbb{R}^m$. Then $$ \begin{aligned} \lambda_k &= \min_{V_k} \max_{x\in V_k,\ x\neq 0} R(x) \\ &= \max_{V_{m-k+1}} \min_{y\in V_{m-k+1},\ y\neq 0} R(y). \end{aligned} $$

The next proposition shows how much the eigenvalues of a symmetric matrix change under a symmetric perturbation of the matrix.

NAS Proposition 8.3.2 Let the $m \times m$ symmetric matrices $A$ and $B = A + \Delta A$ have ordered eigenvalues $\lambda_1, \ldots, \lambda_m$ and $\mu_1, \ldots, \mu_m$, respectively. Then the inequality $$|\lambda_k - \mu_k| \leq \|\Delta A\|_2$$ holds for all $k \in \{1, \ldots, m\}$.

Finding a Single Eigenvalue¶

The power method iterates: $$ u_n = \frac{1}{\|A u_{n-1}\|_2} A u_{n-1} $$ to find the dominant eigenvector whenever $A$ is diagonalizable.

To find the eigenvalue with smallest absolute value, use the inverse power method with $A^{-1}$ instead of $A$.
To find any eigenvalue $\lambda_i$, with $\mu$ close to $\lambda_i$, update using $(A - \mu I)^{-1}$.

The Rayleigh quotient iteration algorithm to find the dominant eigenvalue $\lambda_m$, with $\mu_n = u_n^TAu_n$, $$ u_n = \frac{1}{\|(A - \mu_{n-1}I)^{-1}u_{n-1}\|_2}(A - \mu_{n-1}I)^{-1}u_{n-1}. $$

It converges at a cubic rate;
Also works for the smallest eigenvalue

In [ ]: