Software tutorial/Vectors and matrices

From Statistics for Engineering
Jump to: navigation, search
← Programming in R: loops and flow control (previous step) Tutorial index Next step: Building a least squares model in R →


Creating matrices and vectors

An vector of zeros is created using the numeric function. This is used, for example, to initialize a vector of zeros before calling a for loop to fill the entries in that vector. This is called preallocation, and is good programming practice in any language: rather preallocate the matrix for storage than incrementally add entries to an existing matrix.

numeric(6)
[1] 0 0 0 0 0 0

A matrix or vector can be created using the matrix function. The advantage of the matrix function is that it can set the values to a non-zero, or missing values (NA):

# Matrix of zeros:
matrix(data=0, nrow=2, ncol=3)
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

# Column vector of missing values:
matrix(nrow=4)
     [,1]
[1,]   NA
[2,]   NA
[3,]   NA
[4,]   NA

# Row vector of seven 7's
matrix(data=7, ncol=7)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    7    7    7    7    7    7    7

I prefer to allocate matrices with missing entries, NA, because it is easy to check after the loop if you have missed any entries. For example, for matrix X, we can check: any(is.na(X))

There is yet another way to create a matrix or vector of only zero entries:

# 3x1 vector of zeros
mat.or.vec(3, 1)

# 2x3 matrix of zeros
mat.or.vec(2, 3)

Creating a matrix from several vectors

If you have a some equal-length vectors you might want to stack them together to create a matrix.

Stacking vectors side-by-side: cbind

The cbind function will put each vector in a column and return the matrix. Let's say we wanted to use only a subset of the columns from a given data frame:

bio <- read.csv('http://openmv.net/file/bioreactor-yields.csv')
bio
   temperature duration speed baffles yield
1           82      260  4300      No    51
2           90      260  3700     Yes    30
..         ...      ...  ....     ...    ..
13         101      260  4400      No    42
14          92      260  4900     Yes    38

data <- cbind(bio$temperature, bio$ speed)
data
      [,1] [,2]
 [1,]   82 4300
 [2,]   90 3700
.....  ... ....
[13,]  101 4400
[14,]   92 4900

Stacking top-to-bottom: rbind

Use the rbind function to join existing matrices or vectors, one on top of another:

Create a matrix with 2 rows and 4 columns filled with random numbers, and another matrix containing 3 rows and 4 columns, filled with the number 2.6. Then to stack them:

X <- matrix(rnorm(2*4), 2, 4)
Y <- matrix(data=2.6, nrow=3, ncol=4)

rbind(X,Y)

          [,1]       [,2]       [,3]        [,4]
[1,] 0.6096289 -0.3730975 -1.0376647 -0.01033531
[2,] 0.6305808 -0.6768284  0.4918309 -0.44430265
[3,] 2.6000000  2.6000000  2.6000000  2.60000000
[4,] 2.6000000  2.6000000  2.6000000  2.60000000
[5,] 2.6000000  2.6000000  2.6000000  2.60000000

Reshaping a vector into a matrix

There are occasions when we need to create subgroups of data from a vector. This happens, for example, when constructing control charts:

  • we need to create the subgroups from non-overlapping segments in the vector
  • we need to do calculations on these subgroups

One way to construct the subgroups is to use 2 nested for-loops to extract the necessary data. This works, but can get messy. There is an alternative way: by rearranging your vector into a matrix and then doing your calculations on each column of the matrix.

Let the raw data be given by this random vector:

N.raw = 5000
raw <- rnorm(N.raw, mean=220, sd=50)
plot(raw)

Let's say our subgroup size is 9. Now 5000 is not neatly divisible by 9, but this doesn't matter. We create a matrix with 9 rows and as many columns as we can (roughly 5000/9).

N.sub = 9
subgroups <- matrix(raw, N.sub, N.raw/N.sub)
# Warning message:
# In matrix(raw, N.sub, N.raw/N.sub) :
#   data length [5000] is not a sub-multiple or multiple of the number of rows [9]

Notice that R gives a warning, not an error message. You can see what size matrix it created: 9 rows and 555 columns, so there are 4995 elements in the array. That's usually good enough: we throw away the last 5 entries that don't form a complete subgroup.

dim(subgroups)
[1]   9 555

nrow(subgroups)
[1] 9

The next step is to calculate the mean and standard deviation of each subgroup, i.e. for each column in the new matrix. For that we will use the apply(...) command, which applies any R function to a matrix along a particular direction.

  • apply(X, 1, sd) will apply the sd function to matrix X across the row direction (that's what the 1 is for: the first dimension)
  • apply(X, 2, sd) will apply the sd function to matrix X across the column direction (that's what the 2 is for: the second array dimension)

In our example the subgroups appear in each column. So calculate the standard deviation and mean for each column:

subgroups.S <- apply(subgroups, 2, sd)
subgroups.xbar <- apply(subgroups, 2, mean)

length(subgroups.S)
[1] 555

# "subgroups.S" is not a matrix, so the "dim" function isn't valid
dim(subgroups.S)
NULL

Note: there are built-in R functions, called colMeans(X) and rowMeans(X) which will perform the specific task of applying the mean function along the column or row direction. The apply function is more general though.

Matrix operations

Once you've got your matrices created it is time to work with them: addition and subtraction, multiplication, transposes, inverses and determinants can all be calculated in R.

Matrix addition and subtraction

X <- matrix(data=56, nrow=4, ncol=5)
Y <- matrix(data=32, nrow=4, ncol=5)
X+Y
     [,1] [,2] [,3] [,4] [,5]
[1,]   88   88   88   88   88
[2,]   88   88   88   88   88
[3,]   88   88   88   88   88
[4,]   88   88   88   88   88

X-Y
     [,1] [,2] [,3] [,4] [,5]
[1,]   24   24   24   24   24
[2,]   24   24   24   24   24
[3,]   24   24   24   24   24
[4,]   24   24   24   24   24

Matrix multiplication

Element-by-element multiplication

a <- matrix(data=5, nrow=4, ncol=5)
b <- matrix(data=6, nrow=4, ncol=5)

a*b
     [,1] [,2] [,3] [,4] [,5]
[1,]   30   30   30   30   30
[2,]   30   30   30   30   30
[3,]   30   30   30   30   30
[4,]   30   30   30   30   30

The usual algebraic matrix multiplication - which is probably what you are expecting when multiplying matrices:

a <- matrix(data=5, nrow=4, ncol=5)
b <- matrix(data=6, nrow=5, ncol=3)
dim(a)
[1] 4 5
dim(b)
[1] 5 3

a %*% b
     [,1] [,2] [,3]
[1,]  150  150  150
[2,]  150  150  150
[3,]  150  150  150
[4,]  150  150  150

Matrix transpose: t(X)

X <- matrix(data=c(56,12), nrow=4, ncol=5)
X
     [,1] [,2] [,3] [,4] [,5]
[1,]   56   56   56   56   56
[2,]   12   12   12   12   12
[3,]   56   56   56   56   56
[4,]   12   12   12   12   12

t(X)
     [,1] [,2] [,3] [,4]
[1,]   56   12   56   12
[2,]   56   12   56   12
[3,]   56   12   56   12
[4,]   56   12   56   12
[5,]   56   12   56   12

Solving a system of equations \(Ax = b\): solve(A, b)

Use the solve(a, b, ...) function in R. The solve() function solves the linear system of equations \(Ax = b\) and returns \(x\).

A <- matrix(data = c(2,4,6, 5, 2, 7, 6, 2, 2), 3, 3)
b <- matrix(data = c(6, 2, 2), nrow=3)
x <- solve(A, b)

x
     [,1]
[1,]    0
[2,]    0
[3,]    1

# Should be a vector of zeros:
check <- A %*% x - b

Matrix inverse: solve(A)

Recall that only square matrices, \(n \times n\), can be inverted. We use the solve(...) function in R, which solves the system \(Ax = b\), but if we replace b with successive columns from the identity matrix, we can successively build up a solution to \(AA^{-1} = I\).

Fortunately in R, all this work is done for you automatically. The solve() function creates an identity matrix if you don't supply the b argument. So to invert a matrix \(X\):

n = 4
A <- matrix(rnorm(n*n), n, n)
A.inv <- solve(A)
          [,1]       [,2]      [,3]       [,4]
[1,] 0.6564071 0.09397735  0.769666  -0.836499
[2,] 6.5235936 3.83526471 15.547712 -24.572593
[3,] 3.4061013 2.48309826  8.647189 -13.329604
[4,] 0.7537863 0.97065805  1.759342  -3.509522

# Should be the identity matrix
check <- A %*% A.inv

# which it is, within machine precision:
              [,1]          [,2]          [,3]          [,4]
[1,]  1.000000e+00  0.000000e+00 -1.110223e-16 -8.881784e-16
[2,] -9.714451e-17  1.000000e+00  1.387779e-17  5.551115e-17
[3,] -5.551115e-17  0.000000e+00  1.000000e+00  0.000000e+00
[4,] -2.775558e-17 -4.440892e-16  5.551115e-17  1.000000e+00

The generalized matrix inverse can also be calculated, after loading the built-in MASS package, use the ginv(...) function.

Matrix determinant: det(X)

Recall that the determinant is only computable for a square, \(n \times n\) matrix.

N = 4
X <- diag(x=3, N, N)
det(X)
[1] 81

Singular value decomposition: svd(X)

The singular value decomposition (SVD) is useful for solving certain principal component analysis models, amongst other uses. The SVD is given by:

\[\bf X = U D V'\]

where matrices \(\bf{U}\) and \(\bf{V}\) are orthogonal, and matrix \(\bf{D}\) is a diagonal matrix, containing the singular values. For efficiency, the entries in \(\bf{D}\) are returned as a vector.

Use the example below as a guide to using the svd function in R:

X <- matrix(data=rnorm(10*5), 10, 5)
decomp <- svd(X)
d.vector <- decomp$d
U <- decomp$u
V <- decomp$v

Other matrix operations

Other important matrix operations that can be performed efficiently in R are:

The Matrix package

Enhanced matrix capability is provided by the Matrix package, which is not loaded by default. To start using it, load the library as usual: library(Matrix). Type help(package="Matrix") for more details.

← Programming in R: loops and flow control (previous step) Tutorial index Next step: Building a least squares model in R →