Difference between revisions of "Univariate data analysis"
Jump to navigation
Jump to search
Kevin Dunn (talk | contribs) |
Kevin Dunn (talk | contribs) |
||
(31 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<div class="noautonum">__TOC__</div> | |||
== Learning outcomes == | == Learning outcomes == | ||
* The study of variability important to help answer: "what happened?" | * The study of variability important to help answer: "what happened?" | ||
Line 6: | Line 7: | ||
* Using and interpreting confidence intervals will be crucial in all the modules that follow. | * Using and interpreting confidence intervals will be crucial in all the modules that follow. | ||
== Resources == | == Resources == | ||
* [[Image:Nuvola_mimetypes_pdf.png|20px|link=Media:2015-4C3-6C3- | * [[Image:Nuvola_mimetypes_pdf.png|20px|link=Media:2015-4C3-6C3-Univariate-data-analysis.pdf]] [[Media:2015-4C3-6C3-Univariate-data-analysis.pdf | Class notes 2015]] | ||
* [[Image:Nuvola_mimetypes_pdf.png|20px|link=Media:2014-4C3-6C3- | * [[Image:Nuvola_mimetypes_pdf.png|20px|link=Media:2014-4C3-6C3-Univariate-data-analysis.pdf]] [[Media:2014-4C3-6C3-Univariate-data-analysis.pdf | Class notes 2014]] | ||
* [http://learnche.org/pid/univariate-review/index Textbook, chapter 2] | * [http://learnche.org/pid/univariate-review/index Textbook, chapter 2] | ||
* Quizzes (with solutions): attempt these after you have watched the videos | |||
:{| class="wikitable" | |||
|- | |||
! Tasks to do first | |||
! Quiz | |||
! Solution | |||
|- | |||
| Complete steps 10, 11, 12 and 13 of the [http://learnche.org/4C3/Software_tutorial software tutorial] | |||
(also steps 1 through 9) | |||
| [https://docs.google.com/document/d/1UHoVz0Gi9eazCYpvtIzRlP79RmjtjxERS5xka5ZBkps Quiz 1] | |||
| [https://docs.google.com/document/d/1lMSDnUVTZ6X-uIbC7iUc1UKk7QvQWrngm8LDm1r8RuI Solution 1] | |||
|- | |||
| Watch videos 1, 2, 3, 4, and 5 | |||
| [https://docs.google.com/document/d/19NnM1v7yut1q0HFaHMQmNg6zMH2UPN1rrHZl7l38qBI Quiz 2] | |||
[[Media:Univ-Quiz-1.pdf |Quiz 3]] | |||
| [https://docs.google.com/document/d/1YcF4PxxfQqnz4UqgGZMWv3zfFmmjB7uSkNfQ9XMe97A Solution 2] | |||
[[Media:Univ-Quiz-1-solution.pdf |Solution 3]] | |||
|- | |||
| Watch videos 6, 7, and 8 | |||
| [https://docs.google.com/document/d/1BySja0o1O8MtZOitV42BU2bnjWmzH_vARcXCbh5OviI Quiz 4] | |||
[[Media:Univ-Quiz-2.pdf |Quiz 5]] | |||
| [https://docs.google.com/document/d/17jKKHwgVLkBT04R3OgUr9TqSzCGkxVYBP-fZtuiszig Solution 4] | |||
[[Media:Univ-Quiz-2-solution.pdf |Solution 5]] | |||
|- | |||
| Watch videos 9, 10, 11 and 12 | |||
| [https://docs.google.com/document/d/1qMBqum4kdgLBK3dlzXncZJm42NKdp8pnsp9jxG_dKXA Quiz 6] | |||
[[Media:Univ-Quiz-3.pdf |Quiz 7]] | |||
| [https://docs.google.com/document/d/19mCY3qCpVKLzvOBpa-c1t7JDzLTl0xagnivRpslnTD0 Solution 6] | |||
[[Media:Univ-Quiz-3-solution.pdf |Solution 7]] | |||
|- | |||
| Watch videos 13, 14 and 15 | |||
| [https://docs.google.com/document/d/1zxOUn30wTeg8xiJfIgfZ3WKgiUk3hqrnCV-sDTrlFiA Quiz8 ] | |||
| [https://docs.google.com/document/d/19RozNWVpmQo5oJOfw2SDhWZrJ2Sbpq2LuWFq8NsLN6g Solution 8] | |||
|- | |||
| Watch video 16 | |||
| [https://docs.google.com/document/d/1ZJ1aZ-hvS11-k5btkBqb1X16RAihbFTDjX-fNX8iQm8 Quiz 9] | |||
[[Media:Univ-Quiz-4.pdf |Quiz 10]] | |||
| [https://docs.google.com/document/d/10h2qavaw08AYC5qGAWyemXrgro0F-ARBaDnoE52S77Y Solution 9] | |||
[[Media:Univ-Quiz-4-solution.pdf |Solution 10]] | |||
|} | |||
* A [http://www.r-fiddle.org/#/fiddle?id=49eUbRMk demonstration of R] | * A [http://www.r-fiddle.org/#/fiddle?id=49eUbRMk demonstration of R] | ||
* [[Tables_of_the_normal_and_t-distribution| Using tables of the normal distribution]] | * [[Tables_of_the_normal_and_t-distribution| Using tables of the normal distribution]] | ||
== Extended readings == | |||
* New Boeing planes will generate [http://www.forbes.com/sites/ibm/2015/01/12/the-internet-of-things-will-transform-retail-as-we-know-it/ 0.5 TB of data per flight]. Read about this, and other sources of data: "every piece of that plane has an internet connection, from the engines to the flaps to the landing gear". | |||
* An interesting move has started to take place over the last few years in academic publishing, but is really accelerating now. Journals are now disallowing the use of "p-values", as described why in this editorial in Basic and Applied Social Psychology: http://dx.doi.org/10.1080/01973533.2015.1012991. I intentionally don't cover p-values in the course, because they can be confusing and counterintuitive for engineers. You see these p-values listed in the R-output though for linear models, and they are very closely related to confidence intervals. This means that future courses will start to de-emphasize confidence intervals and look at the alternatives suggested in the link above. Confidence intervals still have their place though: they are widely used in existing literature, and are still a valid way of interpreting results, as long as you are aware of exactly what its interpretation is. This is important to note for those of you going to grad school and looking at graduate research. | |||
* All students, but especially the 600-level students should read the article by Peter J. Rousseeuw, [http://onlinelibrary.wiley.com/doi/10.1002/cem.1180050103/abstract Tutorial to Robust Statistics] it is easy to read, and contains so much useful content. | |||
== Class videos from prior years == | == Class videos from prior years == | ||
===Videos from 2015=== | ===Videos from 2015=== | ||
Watch all these videos in [https://www.youtube.com/watch?v=-wPc24FT-2Y&list=PLHUnYbefLmeOPRuT1sukKmRyOVd4WSxJE&index=5 this YouTube playlist] | |||
# Introduction [05:59] | # Introduction [05:59] | ||
# Histograms [04:50] | # Histograms [04:50] | ||
Line 122: | Line 167: | ||
[[Univariate_data_analysis_(2014)|See the webpage from 2014]] | [[Univariate_data_analysis_(2014)|See the webpage from 2014]] | ||
{{#widget: | {{#widget:YouTube|id=0a1YeaheSXc}} | ||
{{#widget: | {{#widget:YouTube|id=2ffZAlWUUAE}} | ||
{{#widget: | {{#widget:YouTube|id=fWcvYScLSC4}} | ||
{{#widget: | {{#widget:YouTube|id=8OJGWGlP0Ok}} | ||
{{#widget: | {{#widget:YouTube|id=q5wzW8k2TIE}} | ||
{{#widget: | {{#widget:YouTube|id=aGr4LVOgVhk}} | ||
{{#widget:Vimeo|id=58487266}} | {{#widget:Vimeo|id=58487266}} <!-- replace this later--> | ||
{{#widget: | {{#widget:YouTube|id=f7KkIy9wZco}} | ||
===Videos from 2013=== | ===Videos from 2013=== | ||
[[Univariate_data_analysis_(2013)|See the webpage from 2013]] | [[Univariate_data_analysis_(2013)|See the webpage from 2013]] | ||
== Software codes for this section == | |||
=== Code to show how to deal with missing values === | |||
[http://www.r-fiddle.org/#/fiddle?id=d78iSWpc&version=1 Try this code in a web-browser] | |||
<html><div data-datacamp-exercise data-lang="r"> | |||
<code data-type="sample-code"> | |||
f <- 'http://openmv.net/file/raw-material-properties.csv' | |||
data <- read.csv(f) | |||
# notice the NAs in the columns: these refer to | |||
# missing value (Not Available) | |||
summary(data) | |||
sd(data$density1) # why NA as the answer? | |||
help(sd) | |||
# no NA as the answer anymore! | |||
sd(data$density1, na.rm=TRUE) | |||
help(mad) | |||
help(IQR) # etc: all these functions accept and na.rm input | |||
</code> | |||
</div></html> | |||
=== Understanding the central limit theorem with the rolling dice example === | |||
[http://www.r-fiddle.org/#/fiddle?id=dslFTTbG Try this code in a web-browser] | |||
<html><div data-datacamp-exercise data-lang="r"> | |||
<code data-type="sample-code"> | |||
N = 500 | |||
m <- t(matrix(seq(1,6), 3, 2)) | |||
layout(m) | |||
s1 <- as.integer(runif(N, 1, 7)) | |||
s2 <- as.integer(runif(N, 1, 7)) | |||
s3 <- as.integer(runif(N, 1, 7)) | |||
s4 <- as.integer(runif(N, 1, 7)) | |||
s5 <- as.integer(runif(N, 1, 7)) | |||
s6 <- as.integer(runif(N, 1, 7)) | |||
s7 <- as.integer(runif(N, 1, 7)) | |||
s8 <- as.integer(runif(N, 1, 7)) | |||
s9 <- as.integer(runif(N, 1, 7)) | |||
s10 <- as.integer(runif(N, 1, 7)) | |||
hist(s1, main="", xlab="One throw", | |||
breaks=seq(0,6)+0.5) | |||
bins = 8 | |||
hist((s1+s2)/2, breaks=bins, | |||
main="", xlab="Average of two throws") | |||
hist((s1+s2+s3+s4)/4, breaks=bins, main="", | |||
xlab="Average of 4 throws") | |||
hist((s1+s2+s3+s4+s5+s6)/6, breaks=bins, | |||
main="", xlab="Average of 6 throws") | |||
bins=12 | |||
hist((s1+s2+s3+s4+s5+s6+s7+s8)/8, | |||
breaks=bins, main="", | |||
xlab="Average of 8 throws") | |||
hist((s1+s2+s3+s4+s5+s6+s7+s8+s9+s10)/10, | |||
breaks=bins, main="", | |||
xlab="Average of 10 throws") | |||
</code> | |||
</div></html> | |||
=== Code used to illustrate how the q-q plot is constructed === | |||
[http://www.r-fiddle.org/#/fiddle?id=5mdsZDiD Try this code in a web-browser] | |||
<html><div data-datacamp-exercise data-lang="r"> | |||
<code data-type="sample-code"> | |||
N <- 10 | |||
# What are the quantiles from the theoretical | |||
# normal distribution? | |||
index <- seq(1, N) | |||
P <- (index - 0.5) / N | |||
theoretical.quantity <- qnorm(P) | |||
# Our sampled data: | |||
yields <- c(86.2, 85.7, 71.9, 95.3, 77.1, | |||
71.4, 68.9, 78.9, 86.9, 78.4) | |||
mean.yield <- mean(yields) # 80.0 | |||
sd.yield <- sd(yields) # 8.35 | |||
# What are the quantiles for the sampled data? | |||
yields.z <- (yields - mean.yield)/sd.yield | |||
yields.z | |||
yields.z.sorted <- sort(yields.z) | |||
# Compare the values in text: | |||
yields.z.sorted | |||
theoretical.quantity | |||
# Compare them graphically: | |||
plot(theoretical.quantity, yields.z.sorted, asp=1) | |||
abline(a=0, b=1) | |||
# Built-in R function to do all the above for you: | |||
qqnorm(yields) | |||
qqline(yields) | |||
# A better function: see | |||
# http://learnche.org/4C3/Software_tutorial/Extending_R_with_packages | |||
library(car) | |||
qqPlot(yields) | |||
</code> | |||
</div></html> | |||
=== Code to illustrate the central limit theorem's reduction in variance === | |||
[http://www.r-fiddle.org/#/fiddle?id=g75N9Yh5 Try this code in a web-browser] | |||
<html><div data-datacamp-exercise data-lang="r"> | |||
<code data-type="sample-code"> | |||
# Show the 3 plots side by side | |||
layout(matrix(c(1,2,3), 1, 3)) | |||
# Sample the population: | |||
N <- 100 | |||
x <- rnorm(N, mean=80, sd=5) | |||
mean(x) | |||
sd(x) | |||
# Plot the raw data | |||
x.range <- range(x) | |||
plot(x, ylim=x.range, main='Raw data') | |||
# Subgroups of 2 | |||
subsize <- 2 | |||
x.2 <- numeric(N/subsize) | |||
for (i in 1:(N/subsize)) | |||
{ | |||
x.2[i] <- mean(x[((i-1)*subsize+1):(i*subsize)]) | |||
} | |||
plot(x.2, ylim=x.range, main='Subgroups of 2') | |||
# Subgroups of 4 | |||
subsize <- 4 | |||
x.4 <- numeric(N/subsize) | |||
for (i in 1:(N/subsize)) | |||
{ | |||
x.4[i] <- mean(x[((i-1)*subsize+1):(i*subsize)]) | |||
} | |||
plot(x.4, ylim=x.range, main='Subgroups of 4') | |||
</code> | |||
</div></html> | |||
=== Paired test example === | |||
[http://www.r-fiddle.org/#/fiddle?id=SkursT0M&version=2 Try this code in a web-browser] | |||
<html><div data-datacamp-exercise data-lang="r"> | |||
<code data-type="sample-code"> | |||
dilution <- c(11, 26, 18, 16, 20, 12, 8, 26, 12, 17, 14) | |||
manometric <- c(25, 3, 27, 30, 33, 16, 28, 27, 12, 32, 16) | |||
N <- length(dilution) | |||
paste0('The average of the dilution values is = ', mean(dilution)) | |||
paste0('The average of the manometric values is = ', mean(manometric)) | |||
plot(c(dilution, manometric), ylab="BOD values", xaxt='n', | |||
main='Dilution and manometric values, side by side') | |||
text(5.5,3, "Dilution") | |||
text(18,3, "Manometric") | |||
abline(v=11.5) | |||
plot(dilution, type="p", pch=4, | |||
cex=2, cex.lab=1.5, cex.main=1.8, cex.sub=1.8, | |||
cex.axis=1.8, ylab="BOD values", xlab="Sample number", | |||
ylim=c(0,35), xlim=c(0,11.5), col="darkgreen", | |||
main="Dilution and Manometric values as paired experiments") | |||
lines(manometric, type="p", pch=16, cex=2, col="blue") | |||
lines(rep(0, N), dilution, type="p", pch=4, cex=2, | |||
col="darkgreen") | |||
lines(rep(0, N), manometric, type="p", pch=16, | |||
cex=2, col="blue") | |||
grid() | |||
abline(v=0.5) | |||
legend(8, 5, pch=c(4, 16), c("Dilution", "Manometric"), | |||
col=c("darkgreen", "blue"), pt.cex=2) | |||
plot(dilution-manometric, type="p", | |||
ylab="Dilution - Manometric", xlab="Sample number", | |||
cex.lab=1.5, cex.main=1.8, cex.sub=1.8, | |||
cex.axis=1.8, cex=2, | |||
main="Dilution minus Manometric differences") | |||
abline(h=0, col="grey60") | |||
</code> | |||
</div></html> |
Latest revision as of 08:14, 15 January 2019
Learning outcomes
- The study of variability important to help answer: "what happened?"
- Univariate tools such as the histogram, median, MAD, standard deviation, quartiles will be reviewed from prior courses (as a refresher)
- The normal and t-distribution will be important in our work: what are they, how to interpret them, and use tables of these distributions
- The central limit theorem will be explained conceptually: you cannot finish a course on stats without knowing the key result from this theorem.
- Using and interpreting confidence intervals will be crucial in all the modules that follow.
Resources
- Class notes 2015
- Class notes 2014
- Textbook, chapter 2
- Quizzes (with solutions): attempt these after you have watched the videos
Tasks to do first Quiz Solution Complete steps 10, 11, 12 and 13 of the software tutorial (also steps 1 through 9)
Quiz 1 Solution 1 Watch videos 1, 2, 3, 4, and 5 Quiz 2 Solution 2 Watch videos 6, 7, and 8 Quiz 4 Solution 4 Watch videos 9, 10, 11 and 12 Quiz 6 Solution 6 Watch videos 13, 14 and 15 Quiz8 Solution 8 Watch video 16 Quiz 9 Solution 9
Extended readings
- New Boeing planes will generate 0.5 TB of data per flight. Read about this, and other sources of data: "every piece of that plane has an internet connection, from the engines to the flaps to the landing gear".
- An interesting move has started to take place over the last few years in academic publishing, but is really accelerating now. Journals are now disallowing the use of "p-values", as described why in this editorial in Basic and Applied Social Psychology: http://dx.doi.org/10.1080/01973533.2015.1012991. I intentionally don't cover p-values in the course, because they can be confusing and counterintuitive for engineers. You see these p-values listed in the R-output though for linear models, and they are very closely related to confidence intervals. This means that future courses will start to de-emphasize confidence intervals and look at the alternatives suggested in the link above. Confidence intervals still have their place though: they are widely used in existing literature, and are still a valid way of interpreting results, as long as you are aware of exactly what its interpretation is. This is important to note for those of you going to grad school and looking at graduate research.
- All students, but especially the 600-level students should read the article by Peter J. Rousseeuw, Tutorial to Robust Statistics it is easy to read, and contains so much useful content.
Class videos from prior years
Videos from 2015
Watch all these videos in this YouTube playlist
- Introduction [05:59]
- Histograms [04:50]
- Basic terminology [06:41]
- Outliers, medians and MAD [04:42]
- The central limit theorem [06:56]
- The normal distribution, and standardizing variables [05:54]
- Normal distribution notation and using tables and R [05:48]
- Checking if data are normally distributed [05:57]
- Introducing the idea of a confidence interval [covered in class]
- Confidence intervals when we don't know the variance [07:59]
- Interpreting the confidence interval [07:52]
- A worked example: calculating and interpreting the CI [03:37]
- A motivating example to see why tests for differences are important [08:29]
- The mathematical derivation for a confidence interval for differences [covered in class]
- Using the confidence interval to test for differences to solve the motivating example [covered in class]
- Confidence intervals for paired tests: theory and an example [11:59]
05:59 | Download video | Download captions | Script |
04:50 | Download video | Download captions | Script |
06:41 | Download video | Download captions | Script |
04:42 | Download video | Download captions | Script |
06:56 | Download video | Download captions | Script |
05:54 | Download video | Download captions | Script |
05:48 | Download video | Download captions | Script |
05:57 | Download video | Download captions | Script |
Covered in class | No video | Script |
07:59 | Download video | Download captions | Script |
07:52 | Download video | Download captions | Script |
03:37 | Download video | Download captions | Script |
08:29 | Download video | Download captions | Script |
Audio only | No video | Script |
Audio only | No video | Script |
11:59 | Download video | Download captions | Script |
Videos from 2014
Videos from 2013
Software codes for this section
Code to show how to deal with missing values
Try this code in a web-browser
f <- 'http://openmv.net/file/raw-material-properties.csv'
data <- read.csv(f)
# notice the NAs in the columns: these refer to
# missing value (Not Available)
summary(data)
sd(data$density1) # why NA as the answer?
help(sd)
# no NA as the answer anymore!
sd(data$density1, na.rm=TRUE)
help(mad)
help(IQR) # etc: all these functions accept and na.rm input
Understanding the central limit theorem with the rolling dice example
Try this code in a web-browser
N = 500
m <- t(matrix(seq(1,6), 3, 2))
layout(m)
s1 <- as.integer(runif(N, 1, 7))
s2 <- as.integer(runif(N, 1, 7))
s3 <- as.integer(runif(N, 1, 7))
s4 <- as.integer(runif(N, 1, 7))
s5 <- as.integer(runif(N, 1, 7))
s6 <- as.integer(runif(N, 1, 7))
s7 <- as.integer(runif(N, 1, 7))
s8 <- as.integer(runif(N, 1, 7))
s9 <- as.integer(runif(N, 1, 7))
s10 <- as.integer(runif(N, 1, 7))
hist(s1, main="", xlab="One throw",
breaks=seq(0,6)+0.5)
bins = 8
hist((s1+s2)/2, breaks=bins,
main="", xlab="Average of two throws")
hist((s1+s2+s3+s4)/4, breaks=bins, main="",
xlab="Average of 4 throws")
hist((s1+s2+s3+s4+s5+s6)/6, breaks=bins,
main="", xlab="Average of 6 throws")
bins=12
hist((s1+s2+s3+s4+s5+s6+s7+s8)/8,
breaks=bins, main="",
xlab="Average of 8 throws")
hist((s1+s2+s3+s4+s5+s6+s7+s8+s9+s10)/10,
breaks=bins, main="",
xlab="Average of 10 throws")
Code used to illustrate how the q-q plot is constructed
Try this code in a web-browser
N <- 10
# What are the quantiles from the theoretical
# normal distribution?
index <- seq(1, N)
P <- (index - 0.5) / N
theoretical.quantity <- qnorm(P)
# Our sampled data:
yields <- c(86.2, 85.7, 71.9, 95.3, 77.1,
71.4, 68.9, 78.9, 86.9, 78.4)
mean.yield <- mean(yields) # 80.0
sd.yield <- sd(yields) # 8.35
# What are the quantiles for the sampled data?
yields.z <- (yields - mean.yield)/sd.yield
yields.z
yields.z.sorted <- sort(yields.z)
# Compare the values in text:
yields.z.sorted
theoretical.quantity
# Compare them graphically:
plot(theoretical.quantity, yields.z.sorted, asp=1)
abline(a=0, b=1)
# Built-in R function to do all the above for you:
qqnorm(yields)
qqline(yields)
# A better function: see
# http://learnche.org/4C3/Software_tutorial/Extending_R_with_packages
library(car)
qqPlot(yields)
Code to illustrate the central limit theorem's reduction in variance
Try this code in a web-browser
# Show the 3 plots side by side
layout(matrix(c(1,2,3), 1, 3))
# Sample the population:
N <- 100
x <- rnorm(N, mean=80, sd=5)
mean(x)
sd(x)
# Plot the raw data
x.range <- range(x)
plot(x, ylim=x.range, main='Raw data')
# Subgroups of 2
subsize <- 2
x.2 <- numeric(N/subsize)
for (i in 1:(N/subsize))
{
x.2[i] <- mean(x[((i-1)*subsize+1):(i*subsize)])
}
plot(x.2, ylim=x.range, main='Subgroups of 2')
# Subgroups of 4
subsize <- 4
x.4 <- numeric(N/subsize)
for (i in 1:(N/subsize))
{
x.4[i] <- mean(x[((i-1)*subsize+1):(i*subsize)])
}
plot(x.4, ylim=x.range, main='Subgroups of 4')
Paired test example
Try this code in a web-browser
dilution <- c(11, 26, 18, 16, 20, 12, 8, 26, 12, 17, 14)
manometric <- c(25, 3, 27, 30, 33, 16, 28, 27, 12, 32, 16)
N <- length(dilution)
paste0('The average of the dilution values is = ', mean(dilution))
paste0('The average of the manometric values is = ', mean(manometric))
plot(c(dilution, manometric), ylab="BOD values", xaxt='n',
main='Dilution and manometric values, side by side')
text(5.5,3, "Dilution")
text(18,3, "Manometric")
abline(v=11.5)
plot(dilution, type="p", pch=4,
cex=2, cex.lab=1.5, cex.main=1.8, cex.sub=1.8,
cex.axis=1.8, ylab="BOD values", xlab="Sample number",
ylim=c(0,35), xlim=c(0,11.5), col="darkgreen",
main="Dilution and Manometric values as paired experiments")
lines(manometric, type="p", pch=16, cex=2, col="blue")
lines(rep(0, N), dilution, type="p", pch=4, cex=2,
col="darkgreen")
lines(rep(0, N), manometric, type="p", pch=16,
cex=2, col="blue")
grid()
abline(v=0.5)
legend(8, 5, pch=c(4, 16), c("Dilution", "Manometric"),
col=c("darkgreen", "blue"), pt.cex=2)
plot(dilution-manometric, type="p",
ylab="Dilution - Manometric", xlab="Sample number",
cex.lab=1.5, cex.main=1.8, cex.sub=1.8,
cex.axis=1.8, cex=2,
main="Dilution minus Manometric differences")
abline(h=0, col="grey60")