Assignment 2 - 2012
Due date(s): | 23 January 2012, noon |
(PDF) | Assignment questions |
(PDF) | Assignment solutions |
<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/>
Assignment objectives
=========
.. rubric:: Assignment objectives:
- Use a table of normal distributions to calculate probabilities
- Summarizing data my means and standard deviations, and their robust equivalent
- Ability to downloaded data and analyze it
Question 1 [2]
==
Estimate the following:
- . Without using tables or a computer: the cumulative area under the normal distribution between 15 and 35, with mean of 25 and standard deviation of 5.
- . The same as part 1, but using a table of normal distributions from the course notes (or another statistics textbook).
- . Between which lower and upper bounds will we find 60% probability of an event occurring, using the standardized (:math:`z`) normal distribution? Calculate your answer using a printed table, ensuring that the two bounds are symmetrical about zero.
- . Convert these dimensionless :math:`z`-bounds to real-world bounds for a process with mean of 100 kg and a standard deviation of 25 kg.
- . Verify your previous two answers using R, or other computer software.
Question 2 [3]
==
A chicken facility produces bags filled with breaded chicken strips. The advertised weight for each package is 750 grams. Each bag contains between 8 and 15 strips, given that each chicken strip is between 40 an 80 grams and from a uniform distribution. The company sets their target fill weight at 790 grams to avoid breaking regulations that require an accurate package labelling.
- . If we take a large sample of bagged chicken strips and weigh each bag, from which distribution will we expect these weights to come from?
- . Clearly explain why.
- . If the standard deviation of this large sample of bag weights is 12 grams, out of 10,000 customers, how many will purchase bags below the advertised 750g weight?
Question 3 [3]
==
- . Compute the mean, median, standard deviation and MAD for salt content of various potato chips `in this report <http://beta.images.theglobeandmail.com/archive/00245/Read_the_report_245543a.pdf>`_ (page 22) as described in the the article from the `Globe and Mail <http://www.theglobeandmail.com/life/health/salt-variation-between-brands-raises-call-for-cuts/article1299117/>`_ on 24 September 2009.
- . Plot a boxplot of the data and report the interquartile range (IQR). Comment on the 3 measures of spread you have calculated: standard deviation, MAD, and interquartile range.
- . Comment on the effectiveness of the visualization plots used in the PDF report.
Question 4 [4]
==
Data `characterizing 200 commuting trips of your instructor <http://openmv.net/info/travel-times>`_ was visualized in the previous assignment.
- . Plot a histogram of the ``TotalTime`` variable (the total time for the commute) to confirm the variable is not normally distributed.
- . How would you characterize the distribution of the ``TotalTime`` variable? Give reasons *why* the variable is not normally distributed.
- . Confirm the variable is not normally distributed by using a suitable, visual statistical test.
- . The 407 highway speeds are almost always much faster than the 403. Does the ``MaxSpeed`` variable (the maximum speed recorded during the entire trip, usually while travelling the 407) follow a normal distribution. Plot both a histogram and a q-q plot to check.
Question 5 [3]
==
In this question we investigate the stock prices for the Canadian National Railway Company (ticker ``CNR`` on the Toronto Stock Exchange).
- Visit https://finance.yahoo.com/
- Type in ``CNR.TO`` in the symbol (ticker) box
- Click **Historical Prices** in the left column
- Change the date range from 01 March 2011 to 01 January 2012
- Click **Get Prices** to get the "Daily" prices of the stock
- Scroll to the bottom of the page and click "Download to spreadsheet" to download a CSV file
Once you have loaded the CSV file into R, answer the following questions regarding the ``Adj.Close`` column (the price at which stock closes at end of the trading day, after adjusted it for stock splits and dividends paid)
- . Are these closing prices from a normal distribution? Test your answer with a q-q plot.
- . Estimate the distribution's location and spread, assuming the data are from a normal distribution. 600-level students must use the ``fitdistr`` function in R from the MASS package.
- . Are these data points independent?
- . What is the probability of observing a stock value above $ 77.00 ?
- Note**: the purpose of this exercise is more for you to become comfortable with web-based data retrieval, which is common in most companies.
.. raw:: latex
\vspace{0.5cm} \hrule \begin{center}END\end{center}
</rst>