# Assignment 3 - 2014

 Due date(s): 05 February 2014, in class (PDF) Assignment questions (PDF) Assignment solutions (full solutions)

Question 1 [8]

In a question on the final exam in 4M3 there was an open-ended question. The data values are the grades achieved for the answer to that question, broken down by whether the student used a systematic method, or not. No grades were given for using a systematic method; grades were awarded only on answering the question.

Use a statistical test, at the 95% confidence level to check whether this difference is significant. Interpret your answer carefully and clearly.

Question 2 [6]

Your company is creating a new product line that produces a plastic. A measure of the plastic's strength, $$q$$ is possible. How many lab samples must you take to be sure the true strength is in a range of 5 units, centered about the sample average [use typical levels of confidence]? The device that takes the measurements has an error, characterized by a standard deviation of $$\pm 3.3$$ units.

Question 3 [6]

Consider the BOD data set discussed in class, 27 January. We showed in class that there is no difference between the two methods. However, we felt uncertain about that result as it went against our expectations. Repeat the example, to discover any underlying problems with the data, and proceed to show a more careful analysis of the data.

Question 4 [12]

The ammonia concentration in your wastewater treatment plant is measured every 6 hours. The data for one year are available from the dataset website.

1. Use a visualization plot to hypothesize from which distribution the data might come. Which distribution do you think is most likely? Once you've decided on a distribution, use a qq-plot to test your decision.

2. Estimate location and spread statistics assuming the data are from a normal distribution. You can investigate using the fitdistr function in R, in the MASS package, or any other appropriate method of assessing the distribution's parameters.

3. What if you were told the measured values are not independent. How does it affect your answer?

4. What is the probability of having an ammonia concentration greater than 40 mg/L when:

• you may use only the data (do not use any estimated statistics)
• you use the estimated statistics for the distribution?

Note: Answer this entire question using computer software to calculate values from the normal distribution. But also make sure you can answer the last part of the question by hand, (when given the mean and variance), and using a table of normal distributions.

Question 5 [5 (600-level students; extra credit for 400-levels students)]

The confidence interval for the population mean takes one of two forms below, depending on whether we know the variance or not. At the 90% confidence level, for a sample size of 13, compare and comment on the upper and lower bounds for the two cases. Assume that $$s = \sigma = 3.72$$.

$\begin{split}\begin{array}{rcccl} - c_n &\leq& \displaystyle \frac{\bar{x} - \mu}{\sigma/\sqrt{n}} &\leq & c_n\\ \\ - c_t &\leq& \displaystyle \frac{\bar{x} - \mu}{s/\sqrt{n}} &\leq & c_t \end{array}\end{split}$

Question 6 [0 (for practice)]

From the 2011 midterm

Sulphur dioxide is a byproduct from ore smelting, coal-fired power stations, and other sources.

These 11 samples of sulphur dioxide, SO2, measured in parts per billion [ppb], were taken from our plant. Environmental regulations require us to report the 90% confidence interval for the mean SO2 value.

$180, \,\, 340, \,\,220, \,\,410, \,\,101, \,\,89, \,\,210, \,\,99, \,\,128, \,\,113, \,\,111$
1. What is the confidence interval that must be reported, given that the sample average of these 11 points is 181.9 ppb and the sample standard deviation is 106.8 ppb?
2. Why might Environment Canada require you to report the confidence interval instead of the mean?

Question 7 [0 (for practice)]

From the 2011 midterm

A concrete slump test is used to test for the fluidity, or workability, of concrete. It's a crude, but quick test often used to measure the effect of polymer additives that are mixed with the concrete to improve workability.

The concrete mixture is prepared with a polymer additive. The mixture is placed in a mold and filled to the top. The mold is inverted and removed. The height of the mold minus the height of the remaining concrete pile is called the "slump".

Illustration from Wikipedia

Your company provides the polymer additive, and you are developing an improved polymer formulation, call it B, that hopefully provides the same slump values as your existing polymer, call it A. Formulation B costs less money than A, but you don't want to upset, or loose, customers by varying the slump value too much.

1. You have a single day to run your tests (experiments). Preparation, mixing times, measurement and clean up take 1 hour, only allowing you to run 10 experiments. Describe all precautions, and why you take these precautions, when planning and executing your experiment. Be very specific in your answer (use bullet points).

2. The following slump values were recorded over the course of the day:

A 5.2
A 3.3
B 5.8
A 4.6
B 6.3
A 5.8
A 4.1
B 6.0
B 5.5
B 4.5

What is your conclusion on the performance of the new polymer formulation (system B)? Your conclusion must either be "send the polymer engineers back to the lab" or "let's start making formulation B for our customers". Explain your choice clearly.

To help you, $$\overline{x}_A = 4.6$$ and $$s_A = 0.97$$. For system B: $$\overline{x}_B = 5.62$$ and $$s_B = 0.69$$.

Note: In your answer you must be clear on which assumptions you are using and, where necessary, why you need to make those assumptions.

3. Describe the circumstances under which you would rather use a paired test for differences between polymer A and B.

4. What are the advantage(s) of the paired test over the unpaired test?