Difference between revisions of "Assignment 2 - 2011 - Solution"

From Statistics for Engineering
Jump to navigation Jump to search
Line 20: Line 20:
- Deal with issues that are prevalent in real data sets.
- Deal with issues that are prevalent in real data sets.
- Improve your skills with R (if you are using R for the course).   
- Improve your skills with R (if you are using R for the course).   
**Notes**:
- I would normally expect you to spend between 3 and 5 hours outside of class on assignments.  This assignment should take about that long.  Answer with bullet points, not in full paragraphs.
- **Numbers in bold** next to the question are the grading points.  Read more about the `assignment grading system <http://stats4eng.connectmv.com/wiki/Assignment_grading_system>`_.
- 600-level students must complete all the question; 400-level students may attempt the 600 level question for extra credit.  Also 600-level students must read the paper by PJ Rousseeuw, "`Tutorial to Robust Statistics <http://dx.doi.org/10.1002/cem.1180050103>`_".
Question 1 [1]
=================
Recall from class that :math:`\mu = \mathcal{E}(x) = \frac{1}{N}\sum{x}` and :math:`\mathcal{V}\left\{x\right\} = \mathcal{E}\left\{ (x - \mu )^2\right\} = \sigma^2 = \frac{1}{N}\sum{(x-\mu)^2}`.
#. What is the expected value thrown of a fair 12-sided dice?
#. What is the expected variance of a fair 12-sided dice?
#. Simulate 10,000 throws in R, MATLAB, or Python from this dice and see if your answers match those above.  Record the average value from the 10,000 throws.
#. Repeat the simulation for the average value of the dice a total of 10 times.  Calculate and report the mean and standard deviation of these 10 simulations and *comment* on the results.
Solution
--------
The objective of this question is to recall basic probability rules.
#.  Let :math:`X` represent a discrete random variable for the event of throwing a fair die. Let :math:`x_{i}` for :math:`i=1,\ldots,12` represent the numerical or realized values of the outcome of the random event given by :math:`X`. Now we can define the expected value of :math:`X` as,
    .. math::
        \mathcal{E}(X)=\sum_{i=1}^{12}x_{i}P(x_{i})
    where the probability of obtaining a value of :math:`1,\ldots,12` is :math:`P(x_{i})=1/N=1/12 \;\forall\; i=1,\ldots,12`. So, we have,
    .. math::
        \mathcal{E}(X)=\frac{1}{N}\sum_{i=1}^{12}x_{i}=\frac{1}{12}\left(1+2+\cdots+12\right)=\bf{6.5}
#.  Continuing the notation from the above question we can derive the expected variance as,
    .. math::
        \mathcal{V}(X)&=\mathcal{E}\left\{[X-\mathcal{E}(X)]^{2}\right\}\\
        &=\mathcal{E}(X^{2})-[\mathcal{E}(X)]^{2}
     
    where :math:`\mathcal{E}(X^{2})=\sum_{i}x_{i}^{2}P(x_{i})`. So we can now calculate :math:`\mathcal{V}(X)` as,
    .. math::
        \mathcal{V}(X)&=\sum_{i=1}^{12}x_{i}^{2}P(x_{i})-\left[\sum_{i=1}^{12}x_{i}P(x_{i})\right]^{2}\\
        &=\frac{1}{12}(1^{2}+2^{2}+\cdots+12^{12}) - [6.5]^{2}\approx \bf{11.9167}
#. Simulating 10,000 throws corresponds to 10,000 independent and mutually exclusive random events, each with an outcome in the set :math:`\mathcal{S}={1,2,\ldots,12}`. The sample mean and variance from my sample was:
.. math::
\overline{x} &= 6.4925\\
s^2 &= 11.77915
.. twocolumncode::
    :code1: ../che4c3/Assignments/Assignment-2/code/q1c.R
    :language1: s
    :header1: R code
    :code2: ../che4c3/Assignments/Assignment-2/code/q1c.m
    :language2: matlab
    :header2: MATLAB code
#. Repeating the above simulation 10 times (i.e., 10 independent experiments) produces 10 different estimates of :math:`\mu` and :math:`\sigma^2`. Note, everyone's answer should be slightly different, and different each time you run the simulation.
.. twocolumncode::
    :code1: ../che4c3/Assignments/Assignment-2/code/q1d.R
    :language1: s
    :header1: R code
    :code2: ../che4c3/Assignments/Assignment-2/code/q1d.m
    :language2: matlab
    :header2: MATLAB code
Note that each :math:`\overline{x} \sim \mathcal{N}\left(\mu, \sigma^2/n \right)`, where :math:`n = 10000`.  We know what :math:`\sigma^2` is in this case: it is our theoretical value of **11.92**, calculated earlier, and for :math:`n=10000` samples, our :math:`\overline{x} \sim \mathcal{N}\left(6.5, 0.00119167\right)`.
Calculating the average of those 10 means, let's call that :math:`\overline{\overline{x}}`, shows values around 6.5, the theoretical mean.
Calculate the variance of those 10 means shows numbers that are around 0.00119167, as expected.
</rst>
</rst>

Revision as of 13:24, 22 September 2018

<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/>

.. rubric:: Assignment objectives

- A review of basic probability, histograms and sample statistics. - Collect data from multiple sources, consolidate it, and analyze it. - Deal with issues that are prevalent in real data sets. - Improve your skills with R (if you are using R for the course). </rst>