Difference between revisions of "Least squares modelling"

From Statistics for Engineering
Jump to navigation Jump to search
Line 226: Line 226:
abline(model, col="red")
abline(model, col="red")
</syntaxhighlight>
</syntaxhighlight>
=== Build the linear regression in R for video 4 ===
* [http://www.r-fiddle.org/#/fiddle?id=3Ex27Wkb Try this code in a web-browser]
<syntaxhighlight lang="rsplus">
Taste <- c(12.3, 20.9, 25.9, 37.3, 5.5, 16.8, 38.9, 54.9, 57.2, 6.4)
H2S <- c(3.14, 5.04, 7.60, 8.72, 4.79, 3.66, 9.06, 6.75, 7.91, 4.70)
# Always plot your data first
plot(H2S, Taste)
# Build a linear model
model <- lm(Taste~H2S)
# Create a short name for "y", the output variable, Taste
y <- Taste
n <- length(Taste)
y.hat <- predict(model)  # model predictions
errors <- y - y.hat      # error = observed - predicted
# Now calculate various sums of squares
TSS <-  sum((y - mean(y))**2)
RegSS <- sum((y.hat - mean(y))**2)
RSS <- sum(errors**2)
# The standard error will be our reference value to judge a model's performance
sqrt(RSS/(n-2))
#  Did you know you can do this in R?
plot(model)
</syntaxhighlight>


=== Build a multiple linear regression in R (manually and automatically) ===
=== Build a multiple linear regression in R (manually and automatically) ===


* Compare the manual and automatic results to ensure the model coefficients are the same.
* Compare the manual and automatic results to ensure the model coefficients are the same.
* Also see the example after this one
* [http://www.r-fiddle.org/#/fiddle?id=Wrn7Z8BC Try this code in a web-browser]
* [http://www.r-fiddle.org/#/fiddle?id=Wrn7Z8BC Try this code in a web-browser]


Line 255: Line 289:
model <- lm(y.raw ~ x1.raw + x2.raw)
model <- lm(y.raw ~ x1.raw + x2.raw)
summary(model)
summary(model)
</syntaxhighlight>
=== Build thea multiple linear regression in R for video 13 ===
* This is the same as the prior example, except we use the centered values directly
* [http://www.r-fiddle.org/#/fiddle?id=Wrn7Z8BC Try this code in a web-browser]
<syntaxhighlight lang="rsplus">
x1 <- c(-4.5,  -2.5,  -1.5,  1.5,  3.5,  3.5)
x2 <- c(  4,    4,    1,  -2,  -4,  -3)
y <- matrix(c(-3.5,  -1.5,  -0.5,  1.5,  0.5,  3.5), nrow=6, ncol=1)
X <- cbind(x1, x2)
XtX <- t(X) %*% X
Xty <- t(X) %*% y
XtX.inv <- solve(XtX)
b <- XtX.inv %*% Xty
summary(lm(y ~ x1 + x2))
</syntaxhighlight>
</syntaxhighlight>

Revision as of 16:06, 3 January 2016

Learning outcomes

  • Understand the difference between correlation and covariance.
  • What the objective function of least squares does
  • Understand and use an analysis of variance table
  • Calculate and interpret the confidence intervals from a least squares model
  • Know about the assumptions required to interpret least squares model coefficients
  • Use the prediction error range from the model
  • Identify outlier points and classify them
  • Use the linear model when there are multiple predictor variables (this is what we are building up towards; we will use this extensively in the next topic)

Extended readings/practice

  • Run the code below to see how to build and use a linear model in R, but see step 16 and onwards in the R tutorial as well.
  • Try some practice problems.
  • Describe a linear regression model you have made for a lab report.
    • What was the \(R^2\) value?
    • How did you calculate the regression model values?
    • Use the same data from your report and instead calculate the standard error, \(S_E\). How do you interpret that \(S_E\) value now?
  • Do the YouTube challenge: find a video on YouTube that explains the central limit theorem, or the confidence interval, or least squares in a way that is different to explained in class (hopefully you find better explanations than mine). Share the link with a friend in your class.
  • Does eating chocolate lead to winning a Nobel prize?

Resources

Tasks to do first Quiz Solution
Watch videos AAA Quiz Solution
Watch videos BBB Quiz Solution
Watch videos CCC Quiz Solution
Watch videos DDD Quiz Solution
Watch videos EEE Quiz Solution

Class videos from prior years

Videos from 2015

  1. Overview of this module [02:59]
  2. Covariance and correlation [09:46]
  3. Why least squares, and some other alternatives [covered in class]
  4. Some of the math behind the LS model [09:44]
  5. Understanding the analysis of variance (ANOVA) [11:58]
  6. Interpretation of standard error [covered in class]
  7. Assumptions to derive LS confidence intervals [05:45]
  8. Confidence intervals interpreted and used in 3 LS examples [11:54]
  9. Prediction intervals from the least squares model [04:24]
  10. Checking for violations of the least squares assumptions (1 of 2) [07:27]
  11. Checking for violations of the least squares assumptions (2 of 2) [11:46]
  12. Introducing multiple linear regression - why we need to use it [2:59]
  13. MLR - the matrix equation form and an example [11:25]
  14. Interpreting the MLR model coefficients and confidence intervals [04:58]
  15. Integer variables in the multiple linear regression model [09:51]
02:59 | Download video | Download captions | Script
09:46 | Download video | Download captions | Script
Covered in class | No video | Script
09:44 | Download video | Download captions | Script
11:58 | Download video | Download captions | Script
Covered in class | No video | Script
05:45 | Download video | Download captions | Script
11:54 | Download video | Download captions | Script
04:24 | Download video | Download captions | Script
07:27 | Download video | Download captions | Script
11:46 | Download video | Download captions | Script
02:59 | Download video | Download captions | Script
11:25 | Download video | Download captions | Script
04:58 | Download video | Download captions | Script
09:51 | Download video | Download captions | Script


Videos from 2014

See the webpage from 2014

Videos from 2013

See the webpage from 2013

Software codes for this section

Code to show how to build and plot a least squares model in R

Try this code in a web-browser

x <- c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5)
y <- c(8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68)
plot(x,y)
model.ls <- lm(y ~ x)
summary(model.ls)

coef(model.ls)
confint(model.ls)

names(model.ls)
model.ls$resisduals
resid(model.ls)

plot(x, y)
abline(model.ls)

Thermocouple example

  • Compare the \(R^2\) and \(S_E\) values. What does each of those values tell you about the model?
  • How do you use a least squares model in R to make a prediction on new data?
  • Check that the linear regression assumptions are satisfied.
  • Try this code in a web-browser
V <- c(0.01, 0.12, 0.24, 0.38, 0.51, 0.67, 0.84, 1.01, 1.15, 1.31)
T <- c(273, 293, 313, 333, 353, 373, 393, 413, 433, 453)
plot(V, T)
model <- lm(T ~ V)
summary(model)
coef(model)
confint(model)   # get the coefficient confidence intervals
resid(model)     # model residuals
library(car)
qqPlot(resid(model)) # q-q plot of the residuals to check normality

plot(V, T)
v.new <- seq(0, 1.5, 0.1)
t.pred <- coef(model)[1] + coef(model)[2] * v.new
lines(v.new, t.pred, type="l", col="blue")

# Plot x against the residuals to check for non-linearity
plot(V, resid(model))
abline(h=0)

# Plot the raw data and the regression line in red
plot(V, T)
abline(model, col="red")

Build the linear regression in R for video 4

Taste <- c(12.3,	20.9,	25.9,	37.3,	5.5,	16.8,	38.9,	54.9, 	57.2, 	6.4)
H2S 	<- c(3.14,	5.04,	7.60,	8.72,	4.79,	3.66,	9.06,	6.75,	7.91,	4.70)
 
# Always plot your data first
plot(H2S, Taste)
 
# Build a linear model
model <- lm(Taste~H2S)
 
# Create a short name for "y", the output variable, Taste
y <- Taste
n <- length(Taste)

y.hat <- predict(model)   # model predictions
errors <- y - y.hat       # error = observed - predicted
 
# Now calculate various sums of squares
TSS <-  sum((y - mean(y))**2)
RegSS <- sum((y.hat - mean(y))**2)
RSS <- sum(errors**2)
 
# The standard error will be our reference value to judge a model's performance
sqrt(RSS/(n-2))
 
#  Did you know you can do this in R?
plot(model)


Build a multiple linear regression in R (manually and automatically)

  • Compare the manual and automatic results to ensure the model coefficients are the same.
  • Also see the example after this one
  • Try this code in a web-browser
# Calculate the model manually
x1.raw <- c(1,3,4,7,9,9)
x2.raw <- c(9,9,6,3,1,2)
y.raw  <- c(3,5,6,8,7,10)
n <- length(x1.raw)

x1 <- x1.raw - mean(x1.raw)
x2 <- x2.raw - mean(x2.raw)
y <- y.raw - mean(y.raw)

X <- cbind(x1, x2)

# Calculate:  b = inv(X'X) X'y
XTX <- t(X) %*% X    # compare this to cov(X)*(n-1)
XTY <- t(X) %*% y
XTX.inv <- solve(XTX)
b <- XTX.inv %*% XTY
b

# Let R calculate the model:
model <- lm(y.raw ~ x1.raw + x2.raw)
summary(model)

Build thea multiple linear regression in R for video 13

x1 <- c(-4.5,  -2.5,  -1.5,  1.5,  3.5,  3.5)
x2 <- c(   4,     4,     1,   -2,   -4,   -3)
y <- matrix(c(-3.5,  -1.5,  -0.5,  1.5,  0.5,  3.5), nrow=6, ncol=1)
X <- cbind(x1, x2)

XtX <- t(X) %*% X
Xty <- t(X) %*% y

XtX.inv <- solve(XtX)

b <- XtX.inv %*% Xty

summary(lm(y ~ x1 + x2))