Software tutorial/Linear models with multiple X-variables (MLR)

From Statistics for Engineering
Jump to: navigation, search
← Investigating outliers, discrepancies and other influential points (previous step) Tutorial index Next step: Linear models with integer variables →


Including multiple variables in a linear model in R is straightforward. This case is called multiple linear regression, MLR.

Just extend the formula you normally provide to the lm(...) function with extra terms. For example:

  • Standard, univariate model, \(y = b_0 + b_1 x\) is represented as: y ~ x
  • To add extra explanatory variables, for example \(y = b_0 + b_1 x_1 + b_2 x_2\), is represented by: y ~ x1 + x2

Using the stackloss example from earlier:

attach(stackloss)
colnames(stackloss)
# [1] "Air.Flow"   "Water.Temp" "Acid.Conc." "stack.loss"

model <- lm(stack.loss ~ Air.Flow +  Acid.Conc. + Water.Temp)
summary(model)

Call:
lm(formula = stack.loss ~ Air.Flow + Acid.Conc. + Water.Temp)

Residuals:
    Min      1Q  Median      3Q     Max
-7.2377 -1.7117 -0.4551  2.3614  5.6978

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -39.9197    11.8960  -3.356  0.00375 **
Air.Flow      0.7156     0.1349   5.307  5.8e-05 ***
Acid.Conc.   -0.1521     0.1563  -0.973  0.34405
Water.Temp    1.2953     0.3680   3.520  0.00263 **
---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

Residual standard error: 3.243 on 17 degrees of freedom
Multiple R-squared: 0.9136,     Adjusted R-squared: 0.8983
F-statistic:  59.9 on 3 and 17 DF,  p-value: 3.016e-09

We can interrogate this model object in the same way as we did for the single \(x\)-variable case.

  • resid(model): get a list of residuals
  • fitted(model): predicted values of the model-building data
  • coef(model): the model coefficients
  • confint(model): provides the ''marginal'' confidence intervals (recall there are joint and marginal confidence intervals)
  • predict(model): can be used to get new predictions. For example, create a new data frame with 2 observations:
x.new = data.frame(Air.Flow = c(56, 62),
                   Water.Temp = c(18, 24),
                   Acid.Conc. = c(82, 89))
x.new
#   Air.Flow Water.Temp Acid.Conc.
# 1       56         18         82
# 2       62         24         89
y.new = predict(model, newdata=x.new)
y.new
#        1        2
# 10.99728 21.99798

← Investigating outliers, discrepancies and other influential points (previous step) Tutorial index Next step: Linear models with integer variables →