Software tutorial/Dealing with distributions

From Statistics for Engineering
< Software tutorial
Revision as of 09:42, 13 January 2016 by Kevin Dunn (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
← Calculating statistics from a data sample (previous step) Tutorial index Next step: Extending R with packages →

Values from various distribution functions are easily calculated in R.

Direct probability from a distribution


To calculate the probability value directly from *any* distribution in R you use a function created by combining ``d`` with the name of the distribution, that is what is meant by ``dDIST`` in the illustration here:

Show-dDIST.jpg

<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> For the *normal* distribution: ``dnorm(x=...)``

For example, ``dnorm(1)`` returns 0.2419707, the point of inflection on the normal distribution curve.

For the :math:`t` distribution: ``dt(x=..., df=...)`` where ``df`` are the degrees of freedom in the :math:`t`-distribution

For the :math:`F`-distribution: ``df(x=..., df1=..., df2=...)`` given the ``df1`` (numerator) and ``df2`` (denominator) degrees of freedom.

For the chi-squared distribution: ``dchisq(x=..., df=...)`` given the ``df`` degrees of freedom.


Values from the cumulative and inverse cumulative distribution


Similar to the above, we call the function by combining ``p`` - to get the cumulative percentage area under the distribution, and ``q`` - to get the quantile. </rst>

Show-pDIST-and-qDIST.jpg

<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/>

  • For the *normal* distribution: ``pnorm(...)`` and ``qnorm(...)``
  • For the :math:`t`-distribution: ``pt(...)`` and ``qt(...)``
  • For the :math:`F`-distribution: ``pf(...)`` and ``qf(...)``
  • For the chi-squared distribution: ``pchisq(...)`` and ``qchisq(...)``

Obtaining random numbers from a particular distribution


To obtain a single random number from the normal distribution with mean of 0 and standard deviation of 1.0:

.. code-block:: s

rnorm(1) [1] -0.3451397

For example, to obtain 10 random, normally distributed values:

.. code-block:: s

rnorm(10) [1] 0.4604076 -0.9670948 -0.2624246 -0.2223866 0.2492692 [6] 0.7160273 -0.2734768 2.4437870 0.4269511 -0.4831478

where the ``r`` prefix indicates we want random numbers.

Notice that R has used a default value of ``mean=0`` and *standard deviation* ``sd=1``. If you'd like your random numbers centred about a different mean, with a different level of spread, then:

.. code-block:: s

rnorm(n=10, mean=30, sd=4) [1] 31.62686 37.83101 28.07470 20.95000 30.47500 [6] 28.21797 35.81518 28.61481 30.59083 32.94051

Please pay attention to the fact that this function accepts the *standard deviation* and not the variance. In the previous example, the usual notation in statistics is to say :math:`x \sim \mathcal{N}(30, 16)` that is, we specify the variance, but the random number generator requires you specify the standard deviation.

  • For the :math:`t` distribution: ``rt(...)``
  • For the :math:`F`-distribution: ``rf(...)``
  • For the chi-squared distribution: ``rchisq(...)``

</rst>

← Calculating statistics from a data sample (previous step) Tutorial index Next step: Extending R with packages →