Assignment 2 - 2012

From Statistics for Engineering
Jump to: navigation, search
Due date(s): 23 January 2012, noon
Nuvola mimetypes pdf.png (PDF) Assignment questions
Nuvola mimetypes pdf.png (PDF) Assignment solutions

Assignment objectives

Assignment objectives:

  • Use a table of normal distributions to calculate probabilities
  • Summarizing data my means and standard deviations, and their robust equivalent
  • Ability to downloaded data and analyze it

Question 1 [2]

Estimate the following:

  1. Without using tables or a computer: the cumulative area under the normal distribution between 15 and 35, with mean of 25 and standard deviation of 5.
  2. The same as part 1, but using a table of normal distributions from the course notes (or another statistics textbook).
  3. Between which lower and upper bounds will we find 60% probability of an event occurring, using the standardized (\(z\)) normal distribution? Calculate your answer using a printed table, ensuring that the two bounds are symmetrical about zero.
  4. Convert these dimensionless \(z\)-bounds to real-world bounds for a process with mean of 100 kg and a standard deviation of 25 kg.
  5. Verify your previous two answers using R, or other computer software.

Question 2 [3]

A chicken facility produces bags filled with breaded chicken strips. The advertised weight for each package is 750 grams. Each bag contains between 8 and 15 strips, given that each chicken strip is between 40 an 80 grams and from a uniform distribution. The company sets their target fill weight at 790 grams to avoid breaking regulations that require an accurate package labelling.

  1. If we take a large sample of bagged chicken strips and weigh each bag, from which distribution will we expect these weights to come from?
  2. Clearly explain why.
  3. If the standard deviation of this large sample of bag weights is 12 grams, out of 10,000 customers, how many will purchase bags below the advertised 750g weight?

Question 3 [3]

  1. Compute the mean, median, standard deviation and MAD for salt content of various potato chips in this report (page 22) as described in the the article from the Globe and Mail on 24 September 2009.
  2. Plot a boxplot of the data and report the interquartile range (IQR). Comment on the 3 measures of spread you have calculated: standard deviation, MAD, and interquartile range.
  3. Comment on the effectiveness of the visualization plots used in the PDF report.

Question 4 [4]

Data characterizing 200 commuting trips of your instructor was visualized in the previous assignment.

  1. Plot a histogram of the TotalTime variable (the total time for the commute) to confirm the variable is not normally distributed.
  2. How would you characterize the distribution of the TotalTime variable? Give reasons why the variable is not normally distributed.
  3. Confirm the variable is not normally distributed by using a suitable, visual statistical test.
  4. The 407 highway speeds are almost always much faster than the 403. Does the MaxSpeed variable (the maximum speed recorded during the entire trip, usually while travelling the 407) follow a normal distribution. Plot both a histogram and a q-q plot to check.

Question 5 [3]

In this question we investigate the stock prices for the Canadian National Railway Company (ticker CNR on the Toronto Stock Exchange).

  • Visit
  • Type in CNR.TO in the symbol (ticker) box
  • Click Historical Prices in the left column
  • Change the date range from 01 March 2011 to 01 January 2012
  • Click Get Prices to get the "Daily" prices of the stock
  • Scroll to the bottom of the page and click "Download to spreadsheet" to download a CSV file

Once you have loaded the CSV file into R, answer the following questions regarding the Adj.Close column (the price at which stock closes at end of the trading day, after adjusted it for stock splits and dividends paid)

  1. Are these closing prices from a normal distribution? Test your answer with a q-q plot.
  2. Estimate the distribution's location and spread, assuming the data are from a normal distribution. 600-level students must use the fitdistr function in R from the MASS package.
  3. Are these data points independent?
  4. What is the probability of observing a stock value above $ 77.00 ?

Note: the purpose of this exercise is more for you to become comfortable with web-based data retrieval, which is common in most companies.