Software tutorial/Dealing with distributions

From Statistics for Engineering
Jump to: navigation, search
← Calculating statistics from a data sample (previous step) Tutorial index Next step: Extending R with packages →

Values from various distribution functions are easily calculated in R.

Direct probability from a distribution


To calculate the probability value directly from *any* distribution in R you use a function created by combining ``d`` with the name of the distribution, that is what is meant by ``dDIST`` in the illustration here:

Show-dDIST.jpg
For the normal distribution:

dnorm(x=...)

For example, dnorm(1) returns 0.2419707, the point of inflection on the normal distribution curve.

For the \(t\) distribution:
dt(x=..., df=...) where df are the degrees of freedom in the \(t\)-distribution
For the \(F\)-distribution:
df(x=..., df1=..., df2=...) given the df1 (numerator) and df2 (denominator) degrees of freedom.
For the chi-squared distribution:
dchisq(x=..., df=...) given the df degrees of freedom.

Values from the cumulative and inverse cumulative distribution

Similar to the above, we call the function by combining p - to get the cumulative percentage area under the distribution, and q - to get the quantile.

Show-pDIST-and-qDIST.jpg
  • For the normal distribution: pnorm(...) and qnorm(...)
  • For the \(t\)-distribution: pt(...) and qt(...)
  • For the \(F\)-distribution: pf(...) and qf(...)
  • For the chi-squared distribution: pchisq(...) and qchisq(...)

Obtaining random numbers from a particular distribution

To obtain a single random number from the normal distribution with mean of 0 and standard deviation of 1.0:

rnorm(1)
[1]  -0.3451397

For example, to obtain 10 random, normally distributed values:

rnorm(10)
[1]  0.4604076 -0.9670948 -0.2624246 -0.2223866  0.2492692
[6]  0.7160273 -0.2734768  2.4437870  0.4269511 -0.4831478

where the r prefix indicates we want random numbers.

Notice that R has used a default value of mean=0 and standard deviation sd=1. If you'd like your random numbers centred about a different mean, with a different level of spread, then:

rnorm(n=10, mean=30, sd=4)
[1] 31.62686 37.83101 28.07470 20.95000 30.47500
[6] 28.21797 35.81518 28.61481 30.59083 32.94051

Please pay attention to the fact that this function accepts the standard deviation and not the variance. In the previous example, the usual notation in statistics is to say \(x \sim \mathcal{N}(30, 16)\) that is, we specify the variance, but the random number generator requires you specify the standard deviation.

  • For the \(t\) distribution: rt(...)
  • For the \(F\)-distribution: rf(...)
  • For the chi-squared distribution: rchisq(...)

← Calculating statistics from a data sample (previous step) Tutorial index Next step: Extending R with packages →