Univariate data analysis

Learning outcomes

The study of variability important to help answer: "what happened?"
Univariate tools such as the histogram, median, MAD, standard deviation, quartiles will be reviewed from prior courses (as a refresher)
The normal and t-distribution will be important in our work: what are they, how to interpret them, and use tables of these distributions
The central limit theorem will be explained conceptually: you cannot finish a course on stats without knowing the key result from this theorem.
Using and interpreting confidence intervals will be crucial in all the modules that follow.

Tasks to do first	Quiz	Solution
Complete steps 10, 11, 12 and 13 of the software tutorial (also steps 1 through 9)	Quiz 1	Solution 1
Watch videos 1, 2, 3, 4, and 5	Quiz 2 Quiz 3	Solution 2 Solution 3
Watch videos 6, 7, and 8	Quiz 4 Quiz 5	Solution 4 Solution 5
Watch videos 9, 10, 11 and 12	Quiz 6 Quiz 7	Solution 6 Solution 7
Watch videos 13, 14 and 15	Quiz8	Solution 8
Watch video 16	Quiz 9 Quiz 10	Solution 9 Solution 10

New Boeing planes will generate 0.5 TB of data per flight. Read about this, and other sources of data: "every piece of that plane has an internet connection, from the engines to the flaps to the landing gear".
An interesting move has started to take place over the last few years in academic publishing, but is really accelerating now. Journals are now disallowing the use of "p-values", as described why in this editorial in Basic and Applied Social Psychology: http://dx.doi.org/10.1080/01973533.2015.1012991. I intentionally don't cover p-values in the course, because they can be confusing and counterintuitive for engineers. You see these p-values listed in the R-output though for linear models, and they are very closely related to confidence intervals. This means that future courses will start to de-emphasize confidence intervals and look at the alternatives suggested in the link above. Confidence intervals still have their place though: they are widely used in existing literature, and are still a valid way of interpreting results, as long as you are aware of exactly what its interpretation is. This is important to note for those of you going to grad school and looking at graduate research.
All students, but especially the 600-level students should read the article by Peter J. Rousseeuw, Tutorial to Robust Statistics it is easy to read, and contains so much useful content.

Watch all these videos in this YouTube playlist

Introduction [05:59]
Histograms [04:50]
Basic terminology [06:41]
Outliers, medians and MAD [04:42]
The central limit theorem [06:56]
The normal distribution, and standardizing variables [05:54]
Normal distribution notation and using tables and R [05:48]
Checking if data are normally distributed [05:57]
Introducing the idea of a confidence interval [covered in class]
Confidence intervals when we don't know the variance [07:59]
Interpreting the confidence interval [07:52]
A worked example: calculating and interpreting the CI [03:37]
A motivating example to see why tests for differences are important [08:29]
The mathematical derivation for a confidence interval for differences [covered in class]
Using the confidence interval to test for differences to solve the motivating example [covered in class]
Confidence intervals for paired tests: theory and an example [11:59]