Software tutorial/Linear models with multiple X-variables (MLR)

From Statistics for Engineering
< Software tutorial
Revision as of 18:55, 14 February 2013 by Kevin Dunn (talk | contribs) (Created page with "{{Navigation|Book=Software tutorial|previous=Investigating outliers, discrepancies and other influential points|current=Tutorial index|next=Linear models with integer variable...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
← Investigating outliers, discrepancies and other influential points (previous step) Tutorial index Next step: Linear models with integer variables →


<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> Including multiple variables in a linear model in R is straightforward. This case is called multiple linear regression, MLR.

Just extend the formula you normally provide to the ``lm(...)`` function with extra terms. For example:

  • Standard, univariate model, :math:`y = b_0 + b_1 x` is represented as: ``y ~ x``
  • To add extra explanatory variables, for example :math:`y = b_0 + b_1 x_1 + b_2 x_2`, is represented by: ``y ~ x1 + x2``

Using the stackloss example from earlier:

.. code-block:: s

attach(stackloss) colnames(stackloss) # [1] "Air.Flow" "Water.Temp" "Acid.Conc." "stack.loss"

model <- lm(stack.loss ~ Air.Flow + Acid.Conc. + Water.Temp) summary(model)

Call: lm(formula = stack.loss ~ Air.Flow + Acid.Conc. + Water.Temp)

Residuals: Min 1Q Median 3Q Max -7.2377 -1.7117 -0.4551 2.3614 5.6978

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -39.9197 11.8960 -3.356 0.00375 ** Air.Flow 0.7156 0.1349 5.307 5.8e-05 *** Acid.Conc. -0.1521 0.1563 -0.973 0.34405 Water.Temp 1.2953 0.3680 3.520 0.00263 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.243 on 17 degrees of freedom Multiple R-squared: 0.9136, Adjusted R-squared: 0.8983 F-statistic: 59.9 on 3 and 17 DF, p-value: 3.016e-09

We can interrogate this ``model`` object in the same way as we did for the single :math:`x`-variable case.

  • ``resid(model)``: get a list of residuals
  • ``fitted(model)``: predicted values of the model-building data
  • ``coef(model)``: the model coefficients
  • ``confint(model)``: provides the marginal confidence intervals (recall there are joint and marginal confidence intervals)
  • ``predict(model)``: can be used to get new predictions. For example, create a new data frame with 2 observations:

.. code-block:: s

x.new = data.frame(Air.Flow = c(56, 62), Water.Temp = c(18, 24), Acid.Conc. = c(82, 89)) x.new # Air.Flow Water.Temp Acid.Conc. # 1 56 18 82 # 2 62 24 89 y.new = predict(model, newdata=x.new) y.new # 1 2 # 10.99728 21.99798 </rst>