# Software tutorial/Vectors and matrices

# Creating matrices and vectors

An vector of zeros is created using the `numeric`

function. This is used, for example, to initialize a vector of zeros before calling a for loop to fill the entries in that vector. This is called *preallocation*, and is good programming practice in any language: rather preallocate the matrix for storage than incrementally add entries to an existing matrix.

```
numeric(6)
[1] 0 0 0 0 0 0
```

A matrix or vector can be created using the `matrix`

function. The advantage of the `matrix`

function is that it can set the values to a non-zero, or missing values (`NA`

):

```
# Matrix of zeros:
matrix(data=0, nrow=2, ncol=3)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
# Column vector of missing values:
matrix(nrow=4)
[,1]
[1,] NA
[2,] NA
[3,] NA
[4,] NA
# Row vector of seven 7's
matrix(data=7, ncol=7)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 7 7 7 7 7 7 7
```

I prefer to allocate matrices with missing entries, `NA`

, because it is easy to check after the loop if you have missed any entries. For example, for matrix `X`

, we can check: `any(is.na(X))`

There is yet another way to create a matrix or vector of only *zero entries*:

```
# 3x1 vector of zeros
mat.or.vec(3, 1)
# 2x3 matrix of zeros
mat.or.vec(2, 3)
```

# Creating a matrix from several vectors

If you have a some equal-length vectors you might want to stack them together to create a matrix.

## Stacking vectors side-by-side: `cbind`

The `cbind`

function will put each vector in a column and return the matrix. Let's say we wanted to use only a subset of the columns from a given data frame:

```
bio <- read.csv('http://openmv.net/file/bioreactor-yields.csv')
bio
temperature duration speed baffles yield
1 82 260 4300 No 51
2 90 260 3700 Yes 30
.. ... ... .... ... ..
13 101 260 4400 No 42
14 92 260 4900 Yes 38
data <- cbind(bio$temperature, bio$ speed)
data
[,1] [,2]
[1,] 82 4300
[2,] 90 3700
..... ... ....
[13,] 101 4400
[14,] 92 4900
```

## Stacking top-to-bottom: `rbind`

Use the `rbind`

function to join existing matrices or vectors, one on top of another:

Create a matrix with 2 rows and 4 columns filled with random numbers, and another matrix containing 3 rows and 4 columns, filled with the number 2.6. Then to stack them:

```
X <- matrix(rnorm(2*4), 2, 4)
Y <- matrix(data=2.6, nrow=3, ncol=4)
rbind(X,Y)
[,1] [,2] [,3] [,4]
[1,] 0.6096289 -0.3730975 -1.0376647 -0.01033531
[2,] 0.6305808 -0.6768284 0.4918309 -0.44430265
[3,] 2.6000000 2.6000000 2.6000000 2.60000000
[4,] 2.6000000 2.6000000 2.6000000 2.60000000
[5,] 2.6000000 2.6000000 2.6000000 2.60000000
```

# Reshaping a vector into a matrix

There are occasions when we need to create subgroups of data from a vector. This happens, for example, when constructing control charts:

- we need to create the subgroups from
*non-overlapping*segments in the vector - we need to do calculations on these subgroups

One way to construct the subgroups is to use 2 nested for-loops to extract the necessary data. This works, but can get messy. There is an alternative way: by rearranging your vector into a matrix and then doing your calculations on each column of the matrix.

Let the raw data be given by this random vector:

```
N.raw = 5000
raw <- rnorm(N.raw, mean=220, sd=50)
plot(raw)
```

Let's say our subgroup size is 9. Now 5000 is not neatly divisible by 9, but this doesn't matter. We create a matrix with 9 rows and as many columns as we can (roughly 5000/9).

```
N.sub = 9
subgroups <- matrix(raw, N.sub, N.raw/N.sub)
# Warning message:
# In matrix(raw, N.sub, N.raw/N.sub) :
# data length [5000] is not a sub-multiple or multiple of the number of rows [9]
```

Notice that R gives a warning, not an error message. You can see what size matrix it created: 9 rows and 555 columns, so there are 4995 elements in the array. That's usually good enough: we throw away the last 5 entries that don't form a complete subgroup.

```
dim(subgroups)
[1] 9 555
nrow(subgroups)
[1] 9
```

The next step is to calculate the mean and standard deviation of each subgroup, i.e. for each column in the new matrix. For that we will use the `apply(...)`

command, which applies any R function to a matrix along a particular direction.

`apply(X, 1, sd)`

will apply the`sd`

function to matrix`X`

across the row direction (that's what the`1`

is for: the first dimension)`apply(X, 2, sd)`

will apply the`sd`

function to matrix`X`

across the column direction (that's what the`2`

is for: the second array dimension)

In our example the subgroups appear in each column. So calculate the standard deviation and mean for each column:

```
subgroups.S <- apply(subgroups, 2, sd)
subgroups.xbar <- apply(subgroups, 2, mean)
length(subgroups.S)
[1] 555
# "subgroups.S" is not a matrix, so the "dim" function isn't valid
dim(subgroups.S)
NULL
```

**Note**: there are built-in R functions, called `colMeans(X)`

and `rowMeans(X)`

which will perform the specific task of applying the `mean`

function along the column or row direction. The `apply`

function is more general though.

# Matrix operations

Once you've got your matrices created it is time to work with them: addition and subtraction, multiplication, transposes, inverses and determinants can all be calculated in R.

## Matrix addition and subtraction

```
X <- matrix(data=56, nrow=4, ncol=5)
Y <- matrix(data=32, nrow=4, ncol=5)
X+Y
[,1] [,2] [,3] [,4] [,5]
[1,] 88 88 88 88 88
[2,] 88 88 88 88 88
[3,] 88 88 88 88 88
[4,] 88 88 88 88 88
X-Y
[,1] [,2] [,3] [,4] [,5]
[1,] 24 24 24 24 24
[2,] 24 24 24 24 24
[3,] 24 24 24 24 24
[4,] 24 24 24 24 24
```

## Matrix multiplication

**Element-by-element multiplication**

```
a <- matrix(data=5, nrow=4, ncol=5)
b <- matrix(data=6, nrow=4, ncol=5)
a*b
[,1] [,2] [,3] [,4] [,5]
[1,] 30 30 30 30 30
[2,] 30 30 30 30 30
[3,] 30 30 30 30 30
[4,] 30 30 30 30 30
```

**The usual algebraic matrix multiplication** - which is probably what you are expecting when multiplying matrices:

```
a <- matrix(data=5, nrow=4, ncol=5)
b <- matrix(data=6, nrow=5, ncol=3)
dim(a)
[1] 4 5
dim(b)
[1] 5 3
a %*% b
[,1] [,2] [,3]
[1,] 150 150 150
[2,] 150 150 150
[3,] 150 150 150
[4,] 150 150 150
```

## Matrix transpose: `t(X)`

```
X <- matrix(data=c(56,12), nrow=4, ncol=5)
X
[,1] [,2] [,3] [,4] [,5]
[1,] 56 56 56 56 56
[2,] 12 12 12 12 12
[3,] 56 56 56 56 56
[4,] 12 12 12 12 12
t(X)
[,1] [,2] [,3] [,4]
[1,] 56 12 56 12
[2,] 56 12 56 12
[3,] 56 12 56 12
[4,] 56 12 56 12
[5,] 56 12 56 12
```

## Solving a system of equations \(Ax = b\): `solve(A, b)`

Use the `solve(a, b, ...)`

function in R. The `solve()`

function solves the linear system of equations \(Ax = b\) and returns \(x\).

```
A <- matrix(data = c(2,4,6, 5, 2, 7, 6, 2, 2), 3, 3)
b <- matrix(data = c(6, 2, 2), nrow=3)
x <- solve(A, b)
x
[,1]
[1,] 0
[2,] 0
[3,] 1
# Should be a vector of zeros:
check <- A %*% x - b
```

## Matrix inverse: `solve(A)`

Recall that only square matrices, \(n \times n\), can be inverted. We use the `solve(...)`

function in R, which solves the system \(Ax = b\), but if we replace `b`

with successive columns from the identity matrix, we can successively build up a solution to \(AA^{-1} = I\).

Fortunately in R, all this work is done for you automatically. The `solve()`

function creates an identity matrix if you don't supply the `b`

argument. So to invert a matrix \(X\):

```
n = 4
A <- matrix(rnorm(n*n), n, n)
A.inv <- solve(A)
[,1] [,2] [,3] [,4]
[1,] 0.6564071 0.09397735 0.769666 -0.836499
[2,] 6.5235936 3.83526471 15.547712 -24.572593
[3,] 3.4061013 2.48309826 8.647189 -13.329604
[4,] 0.7537863 0.97065805 1.759342 -3.509522
# Should be the identity matrix
check <- A %*% A.inv
# which it is, within machine precision:
[,1] [,2] [,3] [,4]
[1,] 1.000000e+00 0.000000e+00 -1.110223e-16 -8.881784e-16
[2,] -9.714451e-17 1.000000e+00 1.387779e-17 5.551115e-17
[3,] -5.551115e-17 0.000000e+00 1.000000e+00 0.000000e+00
[4,] -2.775558e-17 -4.440892e-16 5.551115e-17 1.000000e+00
```

The generalized matrix inverse can also be calculated, after loading the built-in `MASS`

package, use the `ginv(...)`

function.

## Matrix determinant: `det(X)`

Recall that the determinant is only computable for a square, \(n \times n\) matrix.

```
N = 4
X <- diag(x=3, N, N)
det(X)
[1] 81
```

## Singular value decomposition: `svd(X)`

The singular value decomposition (SVD) is useful for solving certain principal component analysis models, amongst other uses. The SVD is given by:

where matrices \(\bf{U}\) and \(\bf{V}\) are orthogonal, and matrix \(\bf{D}\) is a diagonal matrix, containing the singular values. For efficiency, the entries in \(\bf{D}\) are returned as a vector.

Use the example below as a guide to using the `svd`

function in R:

```
X <- matrix(data=rnorm(10*5), 10, 5)
decomp <- svd(X)
d.vector <- decomp$d
U <- decomp$u
V <- decomp$v
```

## Other matrix operations

Other important matrix operations that can be performed efficiently in R are:

- Eigenvalue and eigenvector decomposition:
`eigen(...)`

- QR decomposition:
`qr(...)`

- Cholesky decomposition:
`chol(...)`

- LU decomposition:
`lu(...)`

, but it requires the`Matrix`

package.

The `Matrix`

package

Enhanced matrix capability is provided by the `Matrix`

package, which is not loaded by default. To start using it, load the library as usual: `library(Matrix)`

. Type `help(package="Matrix")`

for more details.