Sources

  • Matrix in R tutorial by Prof. Giovanni Petris.
  • doParallel guide by Weston and Calaway available here.
  • Miscellaneous internet resources.
  • Before you run a module or chunk, make sure that you have all the packages installed.
  • We need the R package ‘Matrix’ and ‘doParallel’.

Matrices

  • To arrange values into a matrix, we use the matrix() function:
a <- matrix(1 : 6, nrow = 2, ncol = 3)
a
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
  • Individual entries can be referred to using a pair of indices. For example, the element in the second row, third column can be printed as:
a[2, 3]
## [1] 6

Rows, Columns

  • An entire row or column can be retrieved by specifying its index and leaving the other index empty:
a[2, ]
## [1] 2 4 6
  • Similarly to the way R works with vectors, you can use negative indices to exclude row or colums of a matrix:
a[-2,]
## [1] 1 3 5

Matrix indices

  • Note how the values 1,2,3,4,5,6 are used to create the matrix: the first two are used to fill the first column, then the next two to fill the second column, and so on.

  • R allows matrices to be indexed by a single number, e.g.,

a[5]
## [1] 5
  • The fifth element of a is the fifth element of the vector obtained by “unrolling” the matrix, column by column.

Fill by row

  • Sometimes you need to fill a matrix row by row, instead than column by column.
  • You can change the default behavior as follows:
a <- matrix(1 : 6, nrow = 2, ncol = 3, byrow = TRUE)
a
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Diagonals

  • We have seen rbind() and cbind() before.
  • e.g. We can use cbind() to construct a 3 x 3 Hilbert matrix. The (i, j) entry is 1/(i + j - 1).
H3 <- 1 / cbind(1 : 3, 2 : 4, 3 : 5)
H3
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5000000 0.3333333
## [2,] 0.5000000 0.3333333 0.2500000
## [3,] 0.3333333 0.2500000 0.2000000

Diagonals and Determinants

  • The diagonal of a matrix can be extracted using diag().
diag(H3)
## [1] 1.0000000 0.3333333 0.2000000
  • The determinant of a square matrix can be obtained with det().
det(H3)
## [1] 0.000462963

Triangular

  • The functions lower.tri() and upper.tri() can be used to obtain the lower and upper triangular parts of matrices. The output of the functions is a matrix of logical elements, with TRUE representing the relevant triangular elements. For example,
lower.tri(H3)
##       [,1]  [,2]  [,3]
## [1,] FALSE FALSE FALSE
## [2,]  TRUE FALSE FALSE
## [3,]  TRUE  TRUE FALSE

Triangular

  • A typical use of these functions is to set the upper or lower triangular part of a matrix to zero, thus constructing a triangular matrix.
Htri <- H3
Htri[lower.tri(Htri)] <- 0
Htri
##      [,1]      [,2]      [,3]
## [1,]    1 0.5000000 0.3333333
## [2,]    0 0.3333333 0.2500000
## [3,]    0 0.0000000 0.2000000

Multiplication

  • Matrix multiplication in R is performed by the operator %*%. For example,
A <- matrix(c(1, 2, 0, 1), 2)
B <- matrix(c(1, 3, 3, 1, 4, 5), nr = 2)
A
##      [,1] [,2]
## [1,]    1    0
## [2,]    2    1
B
##      [,1] [,2] [,3]
## [1,]    1    3    4
## [2,]    3    1    5
  • Matrices must conform, i.e. multiplication must make sense !

Multiplication

A %*% B
##      [,1] [,2] [,3]
## [1,]    1    3    4
## [2,]    5    7   13
dim(A)
## [1] 2 2
ncol(A)
## [1] 2
nrow(B)
## [1] 2

Multiplication

With \(A\) and \(B\) defined as above the matrix product \(BA\) in not defined, but the product \(B^T A\) is (\(B^T\) denotes the transpose of \(B\)). - The transpose of a matrix can be obtained with the function t().

t(B) %*% A
##      [,1] [,2]
## [1,]    7    3
## [2,]    5    1
## [3,]   14    5

Crossprod

  • A much more efficient way of computing the same matrix product is via the function crossprod().
all.equal(crossprod(B, A), t(B) %*% A)
## [1] TRUE
  • A similar function, tcrossprod(), computes the matrix product \(AB^T\) whenever the product is well defined.

Exercise (Try at home)

  • Consider the matrix
X <- cbind(1, seq(-1, 1, length = 11))
X
##       [,1] [,2]
##  [1,]    1 -1.0
##  [2,]    1 -0.8
##  [3,]    1 -0.6
##  [4,]    1 -0.4
##  [5,]    1 -0.2
##  [6,]    1  0.0
##  [7,]    1  0.2
##  [8,]    1  0.4
##  [9,]    1  0.6
## [10,]    1  0.8
## [11,]    1  1.0
  • Use crossprod() and tcrossprod() to find \(X^T X\) and \(X X^T\).
  • Note the dimensions of the resulting matrices.

Matrix inversion

  • The inverse of a matrix can be found using solve():
Ainv <- solve(A)
Ainv
##      [,1] [,2]
## [1,]    1    0
## [2,]   -2    1
all.equal(Ainv %*% A, diag(2))
## [1] TRUE

Solve linear system

  • Inverse in R uses QR decomposition, that we might briefly talk about at the very end (if time permits)!

  • For linear systems we don’t need to compute inverses. We can use Gaussian Elimination.

b <- c(5, 3)
solve(A, b) # use this
## [1]  5 -7
Ainv %*% b # don't use this
##      [,1]
## [1,]    5
## [2,]   -7

Time comparison

  • Compare execution times for the different methods of solving the Least Squares problem when the number of observations is large.

  • Let us start by making up our own random data: \(n \times p\) design matrix \(X\) and an \(n \times 1\) response vector \(y\).

  • Take \(n = 5 \times 10^5\).

set.seed(123)
n <- 5e5; p <- 20
X <- matrix(rnorm(n * p, mean = 1 : p, sd = 10), nr = n, nc = p, byrow = TRUE)
y <- rowSums(X) + rnorm(n)

OLS

  • The naive approach to solve the LS problem is to compute: \(\hat{\beta} = (X^T X)^{-1}X^T y\)
system.time(bHat1 <- solve(t(X) %*% X) %*% t(X) %*% y)
##    user  system elapsed 
##    0.44    0.03    0.47
  • Not so naive: use crossprod.
system.time(bHat2 <- solve(crossprod(X), crossprod(X, y)))
##    user  system elapsed 
##     0.2     0.0     0.2

More advanced topics

Inverting a matrix using Cholesky helps

  • A symmetric positive definite matrix can be factorized as: \[ A = L L^T \]
p = 1000; A = array(rnorm(p*p), c(p,p))
A = crossprod(A)
ptm <- proc.time(); B = solve(A); proc.time()-ptm
##    user  system elapsed 
##    0.81    0.00    0.81
ptm <- proc.time(); B = chol2inv(A); proc.time()-ptm
##    user  system elapsed 
##    0.22    0.00    0.22

Storage of sparse matrices

  • Sparse matrices appear in many areas of Statistics and Machine learning.
    1. For large Markov chain, the transition matrix becomes sparse.
    2. Large design matrix / contingency tables.
    3. Modern dataset with 0 as imputation for missing values.
  • If matrix \(A_{n\times n}\) has average \(k\) non-zero entries in each row and \(k << n\), then storage reduces from \(O(n^2)\) to \(O(n)\).

Storage of unstructured sparse matrices

# Demo 2: saving sparse matrices in R can save memory
library(Matrix)
m1 = matrix(0, nrow=1000, ncol=1000)
m2 = Matrix(0, nrow=1000, ncol=1000, sparse=TRUE)
object.size(m1)
## 8000136 bytes
object.size(m2)
## 5072 bytes