Software tutorial/Building a least squares model in R

From Statistics for Engineering
Jump to navigation Jump to search
← Vectors and matrices (previous step) Tutorial index Next step: Extracting information from a linear model in R →


<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> .. note:: A particularly useful tutorial for the theory of least squares are Chapters 5, 9 and 10 of the book `Introductory Statistics with R <http://link.springer.com/book/10.1007/978-0-387-79054-1/>`_ by Dalgaard. You might be able access the PDF version from your company or university's subscription.


The ``lm(...)`` function is the primary tool to build a linear model in R. The input for this function must be a formula object (type ``help(formula)`` for further info). In the example below the formula is ``y ~ x``. This says: "calculate for me the linear model that relates :math:`x` to :math:`y`"; or alternatively and equivalently: "build the linear model where :math:`y` is regressed on :math:`x`".

.. code-block:: s

x <- c(1, 2, 3, 4, 5) y <- c(2, 3, 4, 4, 5) model <- lm(y~x)

The output from ``lm`` is a linear model *object*, also called an ``lm`` object. In R you can get a description of most objects when using the ``summary(...)`` command.

.. code-block:: s

summary(model) Call: lm(formula = y ~ x)

Residuals: 1 2 3 4 5 -2.00e-01 1.00e-01 4.00e-01 -3.00e-01 2.29e-16

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.5000 0.3317 4.523 0.02022 * x 0.7000 0.1000 7.000 0.00599 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3162 on 3 degrees of freedom Multiple R-squared: 0.9423, Adjusted R-squared: 0.9231 F-statistic: 49 on 1 and 3 DF, p-value: 0.005986

This output gives you the intercept and slope for the equation :math:`y = b_0 + b_1 x` and in this case it is :math:`y = 1.5 + 0.7x`. The residual standard error, :math:`S_E = 0.3162` and :math:`R^2 = 0.9423`. </rst>

← Vectors and matrices (previous step) Tutorial index Next step: Extracting information from a linear model in R →