QR Method to Compute Eigenvalues

R package build 2021-02-02 1364 min read

Eigenvalues

In Linear Algebra and consequently in Data Science the concept of Eigenvalues is pretty important and useful to know given how often it comes up. You may encounter them when going over estimation of regression coefficients, unsupervised machine learning and dimensionality reduction, as well as all over mathematics and engineering. Outside of statistics I remember eigenvecotors being important in solving differential equations.

Knowing the mathematical definition and having a conceptual understanding is probably all you really need but one thing that bothered me is not knowing how these eigenvalues are actually computed for matrices with more that two rows and columns.

What I discovered was the QR Method that I read about here

Here I explore the QR method which is think might be the simplest algorithm for finding the eigenvalues. I also try to implement this algorithm using R, C++ and Rcpp.

A Note for the Inner Math Geek

With a 2 by 2 matrix we can solve for eigenvalues by hand. That works because the determinant of a 2 by 2 matrix is a polynomial of degree 2 so we can factorize and solve it using regular algebra.

But HOW do you compute eigenvalues for large matrices? Turns out, that when you can’t factorize and solve polynomials, what you really should be doing is factorizing matrices.

The Numberphile YouTube channel had a video on this idea posted here. Worth checking out if you are into this type of content!

Quick Refresher: What are they?

First, just a quick refresher on what Eigenvalues and Eigenvectors are. Take a square matrix \(X\). If there is a vector \(v\) and a scalar \(λ\)such that

\[Xv=λv\]

then \(v\)is an eigenvector of \(X\) and \(λ\) is the corresponding eigenvalue of\(X\).

I don’t know German but apparently “eigen” means “own.” I think of it more of as “same” because multiplying the eigenvalue and vector is the same as applying the matrix to the eigenvector, but I guess the pair somehow belong to \(X\). Anyway.

What this means in words is that, if you think of multiplying \(v\) by$ X$as applying a function to \(v\), then for this particular vector \(v\) this function is nothing but a stretching scalar multiplication. Usually multiplying a vector by a matrix equates to taking linear combinations of the components of a vector. But if you take an eigenvector, you don’t need to do all that computation. Just multiply by the eigenvalue and you are all good.

The QR Method

The QR method for computing eigenvalues begins with the \(QR\) matrix decomposition. This decomposition allows one to express a matrix \(X=QR\) as a product of a an orthogonal matrix \(Q\) and an upper triangular matrix \(R\). The fact that \(Q\) is orthogonal is important because that implies that \(Q^T\)is it’s inverse.

The central idea of the QR method for finding the eigenvalues is iteratively applying the \(QR\) matrix decomposition to the original matrix \(X\).

Important Definition - Similar Matrices

Here I need to mention one mathematical property of eigenvalues of \(X\).

If we take any invertable matrix \(M\) then the matrix

\[ E=M^{−1}XM \]

will have the same eigenvalues as \(X\). Such matrices \(E\) and \(X\) are formally defined as similar matrices, which simply means that they have the same eigenvalues. Eigenvectors will be different however.

QR Algorithm

So, the idea of the QR method is to iterate the following steps

  1. Apply QR decomposition to X so that \(X=Q_0R_0\)

  2. Let \(V_0=Q_0\) and compute

\[E_0=Q_0^{-1}XQ_0=Q^T_0XQ_0\]

Note \(E_0\) is similar to \(X\). Their eigenvalues are the same.

  1. Then decompose

\[E_0=Q_1R_1\]

  1. And so \(V_1=Q_1\) and

\[E_1=Q^T_1E_0Q_1\]

  1. Iterate unit \(E_i\) is essentially diagonal.

When \(E_i\) is diagonal what you have are the desired eigenvalues on the diagonal, and \(V_i\) are the eigenvectors! And this iterated \(E\) will converge to a diagonal matrix, in most cases.

This is the simplest version of the QR method There are a couple of improved versions that I will not go into here. Next, I just want to write a function in R and C++ to implement the method and verify for myself that it works.

Simple Implementation

  1. I will first show a function written in R that explicitly implements the QR method for finding eigenvalues.

  2. I will verify that it worked by comparing results to the built-in eigenfunction.

  3. I will implement the same function in C++ and make available through the myc package. Then I will verify that it works correctly as well.

  4. Lastly I will compare their performance.

library(tidyverse)
library(microbenchmark)
library(plotly)
library(myc)

The R Function

my_eigen is a pretty simple function. First few lines preform the initial iteration of the QR method. Then a while loop iterates until the diagonal of the \(E\) matrix stabilize and no longer change beyond a small margin. The returned list with values and vectorsshould be identical to what the eigen function returns. For the QR decomposition step I am using myc_qr from the myc

my_eigen <- function(A, margin = 1e-20) {
  Q <- myc_qr(A)$Q
  E <- t(Q) %*% A %*% Q
  U <- Q
  res <- diag(E)
  init <- diag(A)
  while (sum((init - res)^2) > margin) {
    init <- res
    Q <- myc_qr(E)$Q
    E <- t(Q) %*% E %*% Q
    U <- U %*% Q
    res <- diag(E)
  }
  return(list(values = round(diag(E), 6), vecotrs = U))
}

Verifying that it works

Let’s check that it works. I’ll use a symmetric matrix because I want to make sure eigenvalues are not complex numbers.

X <- matrix(c(
  3, 2, 3, 2,
  5, 1, 5, 4,
  9, 3, 2, 1,
  4, 5, 6, 7
), ncol = 4)

A <- t(X) %*% X

Here is the matrix that I’ll use to verify that my function works as expected.

A
##      [,1] [,2] [,3] [,4]
## [1,]   26   40   41   54
## [2,]   40   67   62   83
## [3,]   41   62   95   70
## [4,]   54   83   70  126

Here is what the built-in eigen function returns.

eigen(A)
## eigen() decomposition
## $values
## [1] 268.6301739  39.1116701   5.8239493   0.4342066
## 
## $vectors
##            [,1]        [,2]         [,3]       [,4]
## [1,] -0.3085888  0.02606027 -0.001691293  0.9508370
## [2,] -0.4823478  0.07140554 -0.858273366 -0.1600270
## [3,] -0.5053596 -0.81601931  0.242450805 -0.1412152
## [4,] -0.6455425  0.57300488  0.452306949 -0.2244074

As you can see below when calling the my_eigen function the result is identical.

my_eigen(A)
## $values
## [1] 268.630174  39.111670   5.823949   0.434207
## 
## $vecotrs
##           [,1]        [,2]         [,3]       [,4]
## [1,] 0.3085888  0.02606027  0.001691289 -0.9508370
## [2,] 0.4823478  0.07140567  0.858273355  0.1600270
## [3,] 0.5053596 -0.81601935 -0.242450683  0.1412152
## [4,] 0.6455425  0.57300481 -0.452307035  0.2244074

When calling the myc_eigen function which is an Rcpp version the result is also the same.

myc_eigen(A)
## $values
## [1] 268.6301739  39.1116701   5.8239493   0.4342066
## 
## $vectors
##           [,1]        [,2]         [,3]       [,4]
## [1,] 0.3085888  0.02606027  0.001691289 -0.9508370
## [2,] 0.4823478  0.07140567  0.858273355  0.1600270
## [3,] 0.5053596 -0.81601935 -0.242450683  0.1412152
## [4,] 0.6455425  0.57300481 -0.452307035  0.2244074

Additionally we can verify that the computed decomposition is correct by comparing the result of applying \(A\) and \(λ\) to an eigenvector and checking that they are equal.

eigen_decomp <- myc_eigen(A)

v <- eigen_decomp$vectors[, 2]
l <- eigen_decomp$values[2]

Here we apply the matrix.

as.numeric(A %*% v)
## [1]   1.019261   2.792791 -31.915878  22.411177

And as expected scaling \(v\) by \(λ\) gives the same vector. Values do seem to diverge a bit towards the 7th significant figure but the solution is acceptably close.

(l * v)
## [1]   1.019261   2.792795 -31.915880  22.411175

Benchmarking

Just out of interest I wanted to compare how fast these there functions are at computing the eigenvalues. It looks like on a small matrix like the 4 by 4 one we used above the the build-in eigen function is a little slower.

microbenchmark(eigen(A), my_eigen(A), myc_eigen(A))
## Unit: microseconds
##          expr     min       lq     mean   median       uq     max neval
##      eigen(A) 154.751 159.3550 167.5798 163.3545 169.6255 399.668   100
##   my_eigen(A) 110.584 127.9590 133.0185 131.0840 135.3960 281.418   100
##  myc_eigen(A)  40.001  47.2715  51.3557  49.3970  52.2300 180.543   100

If we take a larger matrix then of course eigen is much faster. The Rcpp version is my function is only a bit faster that the pure R function, probably because it uses a matrix multiplication function myc_matmult that is notably slower than the %*% operator.

X <- matrix(rnorm(100), ncol = 10)
A <- t(X) %*% X

microbenchmark(eigen(A), my_eigen(A), myc_eigen(A))
## Unit: microseconds
##          expr      min       lq      mean   median       uq      max neval
##      eigen(A)  168.542  182.626  229.4819  205.959  276.355  397.917   100
##   my_eigen(A) 1996.584 2188.272 2617.3944 2298.354 2481.876 9857.500   100
##  myc_eigen(A) 1148.375 1233.063 1545.6636 1329.083 1485.480 7641.000   100

Visualization and Animation

Obviously implementing the QR method yourself is not motivated by better performance. What we can do however is build a function that keeps track of how values are computed and maybe get a better feeling for how it works.

Below is a function my_eigen2 which does the same computation as does the my_eigen function, except instead of using a while loop it iterates a given maximum number of times and keeps a record of the updated eigen values and vectors. If the computation converges then iterations stop before reaching the given maximum.

my_eigen2 <- function(A, margin = 1e-10, itrs = 40) {
  Q <- myc_qr(A)$Q
  Qt <- t(Q)

  Elist <- vector("list", length = 21)
  Ulist <- vector("list", length = 21)

  E <- t(Q) %*% A %*% Q
  U <- Q

  Elist[[1]] <- E
  Ulist[[1]] <- U

  res <- diag(E)
  for (i in 1:itrs) {
    init <- res
    Q <- myc_qr(E)$Q
    E <- t(Q) %*% E %*% Q
    U <- U %*% Q
    Elist[[i + 1]] <- E
    Ulist[[i + 1]] <- U
    res <- diag(E)
    print(sum((init - res)^2))
    if (sum((init - res)^2) < margin) {
      break()
    }
  }

  return(list(
    values = round(diag(E), 6),
    vectors = U,
    Elist = Elist[1:i],
    Ulist = Ulist[1:i]
  ))
}

Visualizing a 2 by 2 Case

To visualize how the QR methods works I decided to use a 2 by 2 co-variance matrix. This is just to make this relevant to Principal Component Analysis, and to make it easy to actually visualize.

A <- matrix(c(
  1.0, 0.3,
  0.3, 0.8
), nrow = 2)

A
##      [,1] [,2]
## [1,]  1.0  0.3
## [2,]  0.3  0.8

First we let my_eigen2 do the computation and build up the lists of matrices that keep track of how the QR method converges to the answer. After 9 steps the computation converges.

eigen_decomps <- my_eigen2(A, itrs = 30)
## [1] 0.005127351
## [1] 0.0003368977
## [1] 1.882642e-05
## [1] 1.011322e-06
## [1] 5.382776e-08
## [1] 2.858891e-09
## [1] 1.517664e-10
## [1] 8.055721e-12

These are the actual eigenvalues and vectors:

eigen_decomps$values
## [1] 1.216227 0.583773
eigen_decomps$vectors
##           [,1]       [,2]
## [1,] 0.8118117 -0.5839193
## [2,] 0.5839193  0.8118117

The following code extracts the direction of the two eigenvectos at each step of the computation and stacked them into a data frame. We can use these to compute the slopes and then visualize them on top of a density plot.

The correct slopes are

\[s1= \frac{0.8118117}{0.5839193}=−1.390281\] and

\[s2=−1/s1=0.7192791\]

animation_data <-
  map2(eigen_decomps$Elist, eigen_decomps$Ulist, ~ as.data.frame(1 / .x %*% .y)) %>%
  bind_rows(.id = "frame") %>%
  rename(x1 = "V1", y1 = "V2") %>%
  bind_cols(d = rep(c("d1", "d2"), length(eigen_decomps$Elist))) %>%
  pivot_wider(names_from = d, values_from = c(x1, y1)) %>%
  bind_cols(
    x0 = rep(0, length(eigen_decomps$Elist)),
    y0 = rep(0, length(eigen_decomps$Elist))
  ) %>%
  mutate(frame = as.numeric(frame)) %>%
  mutate(
    slope1 = y1_d1 / x1_d1,
    slope2 = y1_d2 / x1_d2
  )

animation_data
## # A tibble: 8 x 9
##   frame x1_d1 x1_d2 y1_d1 y1_d2    x0    y0 slope1 slope2
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
## 1     1 0.865  2.67 -7.00  1.76     0     0  -8.09  0.659
## 2     2 0.893  2.79 -2.27  2.03     0     0  -2.54  0.727
## 3     3 0.941  2.88 -1.71  2.10     0     0  -1.81  0.729
## 4     4 0.975  2.91 -1.53  2.11     0     0  -1.57  0.725
## 5     5 0.994  2.92 -1.46  2.11     0     0  -1.47  0.723
## 6     6 1.00   2.93 -1.43  2.11     0     0  -1.43  0.722
## 7     7 1.01   2.93 -1.42  2.11     0     0  -1.41  0.721
## 8     8 1.01   2.93 -1.41  2.11     0     0  -1.40  0.721

Using ggplot and plotly we can produce the following animated visualization

data.grid <- expand.grid(x = seq(-5, 5, length.out = 200), y = seq(-5, 5, length.out = 200))
dens <- cbind(data.grid,
  z = mvtnorm::dmvnorm(data.grid,
    mean = c(0, 0),
    sigma = A
  )
)

ggplotly(
  ggplot(dens, aes(x = x, y = y, z = z)) +
    geom_contour(color = "blue", alpha = 0.3) +
    coord_fixed(xlim = c(-2, 2), ylim = c(-2, 2), ratio = 1) +
    geom_abline(aes(slope = slope1, intercept = 0, frame = frame),
      color = "red", data = animation_data
    ) +
    geom_abline(aes(slope = slope2, intercept = 0, frame = frame),
      color = "red", data = animation_data
    ) +
    theme_void() +
    theme(
      legend.position = "none",
      axis.line = element_blank(),
      panel.grid.major = element_blank()
    ) +
    geom_raster(aes(fill = z)) +
    scale_fill_gradientn(colours = RColorBrewer::brewer.pal(n = 5, name = "BuGn"))
) %>%
  animation_opts(frame = 500, redraw = TRUE) %>%
  style(hoverinfo = "skip")
## Warning: Ignoring unknown aesthetics: frame

## Warning: Ignoring unknown aesthetics: frame

Based on the above we can see that the first eigenvector is found almost instantly while the second vector is then iteratively getting closes and closer to being perpendicular to the first one. With the density plot and the level curves plotted as well, we can see that the first eigenvector was found first, and it points in the direction of greatest variance. On a more intuitive level, if you have asymmetric bowl, and you want to find “largest radius,” or two points that are furthest away from each other then the first eigenvector will tell you the direction. Of course we have a bowl here because of the matrix I chose, but even if you have a saddle the intuition is similar.

My Takeaways

  1. I am really grateful for the Numerical Linear Algebra libraries like LAPACK or Armadillo because having gone through the simplest possible method for eigenvalue computation, I can better appreciate all the work that went into creating them.

  2. Going through this exercise forced me to become really intimate with the details behind computing eigenvalues. While at times it was frustrating, I really feel like I learned a lot, and in my opinion the feeling of having learned something really thoroughly is one of the best feelings there is.