Is it fair to say that a monitoring chart is like an online version of a confidence interval? Explain your answer.
Question 2
Use the batch yields data and construct a monitoring chart using the 300 yield values. Use a subgroup of size 5. Report your target value, lower control limit and upper control limit, showing the calculations you made. I recommend that you write your code so that you can reuse it for other questions.
Please see the code below. The Shewhart chart’s parameters are as below, with plots generated from the R code.
Target = 80.4
Lower control limit at 3 standard deviations = 71.1
Upper control limit at 3 standard deviations = 89.6
Try it yourself:
data_file <- 'http://openmv.net/file/batch-yields.csv'
batch <- read.csv(data_file)
# make sure we have the expected data
summary(batch)
attach(batch)
# To get a feel for the data;
# looks pretty good; no unusual outliers
plot(Yield)
N = length(Yield)
N.sub = 5 # subgroup size
subgroup <- matrix(Yield, N.sub, N/N.sub)
N.groups <- ncol(subgroup)
dim(subgroup) # 5 by 60 matrix
subgroup.sd <- apply(subgroup, 2, sd)
subgroup.xbar <- apply(subgroup, 2, mean)
# Take a look at what these numbers mean
plot(subgroup.xbar,
type="b",
ylab="Subgroup average")
plot(subgroup.sd,
type="b",
ylab="Subgroup spread")
# Report your target value, lower control
# limit and upper control limit, showing
# the calculations you made.
target <- mean(subgroup.xbar)
Sbar <- mean(subgroup.sd)
# a_n value is from the table when
# subgroup size = 5
an <- 0.94
an.num <- sqrt(2)*gamma(N.sub/2)
an.den <- sqrt(N.sub-1)*gamma(N.sub/2-0.5)
an <- an.num/an.den
sigma.estimate <- Sbar / an
LCL <- target - 3 * sigma.estimate/sqrt(N.sub)
UCL <- target + 3 * sigma.estimate/sqrt(N.sub)
c(LCL, target, UCL)
plot(subgroup.xbar,
ylim=c(LCL-5, UCL+5),
ylab="Subgroup means",
main="Shewhart chart")
abline(h=target, col="green")
abline(h=UCL, col="red")
abline(h=LCL, col="red")
Question 3
The boards data on the website are from a line which cuts spruce, pine and fir (SPF) to produce general quality lumber that you could purchase at Rona, Home Depot, etc. The price that a saw mill receives for its lumber is strongly dependent on how accurate the cut is made. Use the data for the 2 by 6 boards (each row is one board) and develop a monitoring system using these steps.
Plot all the data.
Now assume that boards 1 to 500 are the phase 1 data; identify any boards in this subset that appear to be unusual (where the board thickness is not consistent with most of the other operation)
Remove those unusual boards from the phase 1 data. Calculate the Shewhart monitoring limits and show the phase 1 data with these limits. Note: choose a subgroup size of 7 boards.
Test the Shewhart chart on boards 501 to 2000, the phase 2 data. Show the plot and calculate the type I error rate (\(\alpha\)) from the phase 2 data; assuming, of course, that all the phase 2 data are from in-control operation.
Calculate the ARL and look at the chart to see if the number looks about right. Use the time information in the raw data and your ARL value to calculate how many minutes between a false alarm. Will the operators be happy with this?
Describe how you might calculate the consumer’s risk (\(\beta\)).
How would you monitor if the saws are slowly going out of alignment?
Question 4
Your process with Cpk of 2.0 experiences a drift of \(1.5\sigma\) away from the current process operating point towards the closest specification limit. What is the new Cpk value; how many defects per million items did you have before the drift? And after the drift?
The new Cpk value is 1.5. The number of defects per million items at Cpk = 2.0 is 0.00098 (essentially no defects), while at Cpk = 1.5 it is 3.4 defects per million items. You only have to consider one-side of the distribution, since Cpk is by definition for an uncentered process, and deals with the side closest to the specification limits.
A Shewhart chart has no memory, and is suited to detecting unusual spikes in your production. CUSUM and EWMA charts have memory, and while they would pick up this spike, they would also create a long duration of false alarms after that. So those charts are much less appropriate.
Question 6
A tank uses small air bubbles to keep solid particles in suspension. If too much air is blown into the tank, then excessive foaming and loss of valuable solid product occurs; if too little air is blown into the tank the particles sink and drop out of suspension.
Which monitoring chart would you use to ensure the airflow is always near target?
Use the aeration rate dataset from the website and plot the raw data (total litres of air added in a 1 minute period). Are you able to detect any problems?
Construct the chart you described in part 1, and show it’s performance on all the data. Make any necessary assumptions to construct the chart.
At what point in time are you able to detect the problem, using this chart?
Construct a Shewhart chart, choosing appropriate data for phase 1, and calculate the Shewhart limits. Then use the entire dataset as if it were phase 2 data.
Show this phase 2 Shewhart chart.
Compare the Shewhart chart’s performance to the chart in part 3 of this question.
Solution based on work by Ryan and Stuart (2011 class)
A CUSUM chart would be a suitable chart to monitor that the airflow is near target. While a Shewhart chart is also intended to monitor the location of a variable, it has a much larger run length for detecting small shifts. An EWMA chart with small \(\lambda\) (long memory) would approximate a CUSUM chart, and so would also be suitable
The aeration rate dataset is depicted below:
It is very difficult to assess problems from the raw data plot. There might be a slight upward shift around 300 and 500 minutes.
Assumptions for the CUSUM chart:
We will plot the CUSUM chart on raw data, though you could use subgroups if you wanted to.
The target value can be the mean (24.17) of all the data, or more robustly, use the median (24.1), especially if we expect problems with the raw data (true of almost every real data set).
The CUSUM chart, using the median as target value showed a problem starting to occur around \(t=300\). So we recalculated the median, using only data from 0 to \(t=200\), to avoid biasing the target value. Using this median instead, 23.95, we get the following CUSUM chart:
The revised CUSUM chart suggests that the error occurs around 275 min, as evidenced by the steep positive slope thereafter. It should be noted that the CUSUM chart begins to bear a positive slope around 200 min, but this initial increase in the cumulative error would likely not be diagnosable (i.e. using a V-mask).
# Code by Ryan and Stuart (2011 class)CUSUM<-function(x,target){N<-length(x)S<-numeric(N)S[1]=x[1]-targetfor (tin2:N){S[t]=S[t-1]+(x[t]-target)}return(S)}# Import data and remove missing values (NA)aeration.data<-read.csv('http://openmv.net/file/aeration-rate.csv')aeration<-na.omit(aeration.data$Aeration)# Plot raw databitmap('aeration-rate-raw-data.png',type="png256",width=10,height=4,res=300,pointsize=14)plot(aeration,type="l",xlab="Time (min)",ylab="Aeration rate (L/min)")grid()dev.off()# Plot CUSUM Charttarget<-median(aeration[1:200])bitmap('aeration-CUSUM.png',type="png256",width=10,height=4,res=300,pointsize=14)plot(CUSUM(aeration,target),type="l",xlab="Time (min)",ylab="CUSUM cumulative deviations")grid()dev.off()# Plot the Shewhart chart: see code from the other question to # calculate the control limitsLCL<-22.1UCL<-25.8N<-5subgroups<-matrix(aeration,N,length(aeration)/N)x.mean<-numeric(length(aeration)/N)x.sd<-numeric(length(aeration)/N)# Calculate mean and sd of subgroups (see R-tutorial)x.mean<-apply(subgroups,2,mean)x.sd<-apply(subgroups,2,sd)ylim<-range(x.mean)+c(-5,+5)xdb<-target# use the same CUSUM target !bitmap('aeration-Shewhart-chart.png',type="png256",width=10,height=4,res=300,pointsize=14)par(mar=c(4.2,4.2,0.5,0.5))par(cex.lab=1.3,cex.main=1.5,cex.sub=1.5,cex.axis=1.5)plot(seq(1,length(x.mean)*N,N),x.mean,type="b",pch=".",cex=5,main="",ylab="Phase II subgroups",xlab="Time order",ylim=ylim)abline(h=UCL,col="red")abline(h=LCL,col="red")abline(h=xdb,col="green")lines(c(275,275),ylim,col="blue")text(280,29,"CUSUM detected problem at t=275",adj=c(0,0))dev.off()
Using the iterative Shewhart code from the previous question, we used
Phase I was taken far enough away from the suspected error: 0 - 200 min
The Shewhart chart applied to the entire dataset is shown below. In contrast to the CUSUM chart, the Shewhart chart is unable to detect the problem in the aeration rate. Unlike the CUSUM chart, which has infinite memory, the Shewhart chart has no memory and cannot adequately assess the location of the monitored variable in relation to its specified target. Instead, the Shewhart chart merely monitors aeration rate with respect to the control limits for the process. Since the aeration rate does not exceed the control limits for the process (i.e. process remains in control), the Shewhart chart does not detect any abnormalities.
If you used the Western Electric rules, in addition to the Shewhart chart limits, you would have picked up a consecutive sequence of 8 points on one side of the target around \(t=350\).
Question 7
Do you think a Shewhart chart would be suitable for monitoring the closing price of a stock on the stock market? Please explain your answer if you agree, or describe an alternative if you disagree.
No, a Shewhart chart is not suitable for monitoring stock prices. Stock prices are volatile variables (not stable), so there is no sense in monitoring their location. Hopefully the stock is moving up, which it should on average, but the point is that stock prices are not stable. Nor are stock prices independent day-to-day.
So what aspect of a stock price is stable? The difference between the opening and closing price of a stock is remarkably stationary. Monitoring the day-to-day change in a stock price would work. Since you aren’t expected to know this fact, any reasonable answer that attempts to monitor a stable substitute for the price will be accepted. E.g. another alternative is to remove the linear up or down trend from a stock price and monitor the residuals.
There are many alternatives; if this sort of thing interests you, you might find the area called technical analysis worth investigating. An EWMA chart is widely used in this sort of analysis.
Question 8
Describe how a monitoring chart could be used to prevent over-control of a batch-to-batch process. (A batch-to-batch process is one where a batch of materials is processed, followed by another batch, and so on).
Over-control of any process takes place when too much corrective action is applied. Using the language of feedback control, your gain is the right sign, but the magnitude is too large. Batch processes are often subject to this phenomenon: e.g. the operator reduces the set-point temperature for the next batch, because the current batch produced product with a viscosity that was too high. But then the next batch has a viscosity that is too low, so the operator increases the temperature set-point for the following batch. This constant switching is known as over-control (the operator is the feedback controller and his/her gain is too high, i.e. they are over-reacting).
A monitoring chart such as a Shewhart chart would help the operator: if the previous batch was within the limits, then s/he should not take any corrective action. Only take action when the viscosity value is outside the limits. An EWMA chart would additionally provide a one-step ahead prediction, which is an advantage.
Question 9
You need to construct a Shewhart chart. You go to your company’s database and extract data from 10 periods of time lasting 6 hours each. Each time period is taken approximately 1 month apart so that you get a representative data set that covers roughly 1 year of process operation. You choose these time periods so that you are confident each one was from in control operation. Putting these 10 periods of data together, you get one long vector that now represents your phase 1 data.
There are 8900 samples of data in this phase 1 data vector.
You form subgroups: there are 4 samples per subgroup and 2225 subgroups.
You calculate the mean within each subgroup (i.e. 2225 means). The mean of those 2225 means is 714.
The standard deviation within each subgroup is calculated; the mean of those 2225 standard deviations is 98.
Give an unbiased estimate of the process standard deviation?
Calculate lower and upper control limits for operation at \(\pm 3\) of these standard deviations from target. These are called the action limits.
Operators like warning limits on their charts, so they don’t have to wait until an action limit alarm occurs. Discussions with the operators indicate that lines at 590 and 820 might be good warning limits. What percentage of in control operation will lie inside the proposed warning limit region?
Short answer: Click to show answerUnbiased estimate of the process standard deviation = 106.4; UCL = 874; LCL = 554.
Question 10
If an exponentially weighted moving average (EWMA) chart can be made to approximate either a CUSUM or a Shewhart chart by adjusting the value of \(\lambda\), what is an advantage of the EWMA chart over the other two? Describe a specific situation where you can benefit from this.
Question 11
The most recent estimate of the process capability ratio for a key quality variable was 1.30, and the average quality value was 64.0. Your process operates closer to the lower specification limit of 56.0. The upper specification limit is 93.0.
What are the two parameters of the system you could adjust, and by how much, to achieve a capability ratio of 1.67, required by recent safety regulations. Assume you can adjust these parameters independently.
Question 12
A bagging system fills bags with a target weight of 37.4 grams and the lower specification limit is 35.0 grams. Assume the bagging system fills the bags with a standard deviation of 0.8 grams:
What is the current Cpk of the process?
To what target weight would you have to set the bagging system to obtain Cpk=1.3?
How can you adjust the Cpk to 1.3 without adjusting the target weight (i.e. keep the target weight at 37.4 grams)?
Plastic sheets are manufactured on your blown film line. The Cp value is 1.7. You sell the plastic sheets to your customers with specification of 2 mm \(\pm\) 0.4 mm.
List three important assumptions you must make to interpret the Cp value.
What is the theoretical process standard deviation, \(\sigma\)?
What would be the Shewhart chart limits for this system using subgroups of size \(n=4\)?
Illustrate your answer from part 2 and 3 of this question on a diagram of the normal distribution.
Question 14
The following charts show the weight of feed entering your reactor. The variation in product quality leaving the reactor was unacceptably high during this period of time.
What can your group of process engineers learn about the problem, using the time-series plot (100 consecutive measurements, taken 1 minute apart).
Why is this variability not seen in the Shewhart chart?
Using concepts described elsewhere in this book, why might this sort of input to the reactor have an effect on the quality of the product leaving the reactor?
Question 15
You will come across these terms in the workplace. Investigate one of these topics, using the Wikipedia link below to kick-start your research. Write a paragraph that (a) describes what your topic is and (b) how it can be used when you start working in a company after you graduate, or how you can use it now if you are currently working.
In early 2010 Toyota experienced some of its worst press coverage on this very topic. Here is an article in case you missed it.
Question 16
The Kappa number is a widely used measurement in the pulp and paper industry. It can be measured on-line, and indicates the severity of chemical treatment that must be applied to a wood pulp to obtain a given level of whiteness (i.e. the pulp’s bleachability). Data on the website contain the Kappa values from a pulp mill. Use the first 2000 data points to construct a Shewhart monitoring chart for the Kappa number. You may use any subgroup size you like. Then use the remaining data as your phase 2 (testing) data. Does the chart perform as expected?
Short answer: Click to show answerThe intention of this question is for you to experience the process of iteratively calculating limits from phase 1 data and applying them to phase 2 data.
Question 17
In this section we showed how one can monitor any variable in a process. Modern instrumentation though capture a wider variety of data. It is common to measure point values, e.g. temperature, pressure, concentration and other hard-to-measure values. But it is increasingly common to measure spectral data. These spectral data are a vector of numbers instead of a single number.
Below is an example from a pharmaceutical process: a complete spectrum can be acquired many times per minute, and it gives a complete chemical fingerprint or signature of the system. There are 460 spectra in figure below; they could have come, for example, from a process where they are measured 5 seconds apart. It is common to find fibre optic probes embedded into pipelines and reactors to monitor the progress of a reaction or mixing.
Write a few bullet points how you might monitor a process where a spectrum (a vector) is your data source, and not a “traditional” single point measurement, like a temperature value.
Question 18
The carbon dioxide measurement is available from a gas-fired furnace. These data are from phase 1 operation.
Calculate the Shewhart chart upper and lower control limits that you would use during phase 2 with a subgroup size of \(n=6\).
Is this a useful monitoring chart? What is going in this data?
The Shewhart chart using a subgroup of size 6 is not a useful monitoring chart. There are too many false alarms, which will cause the operators to just ignore the chart. The problem is that the first assumption of independence is not correct and has a detrimental effect, as shown in a previous question.
One approach to fixing the problem is to subsample the data, i.e. only use every \(k^\text{th}\) data point as the raw data, e.g. \(k=10\), and then form subgroups from that sampled data.
Another is to use a larger subgroup size. Use the autocorrelation function, and the corresponding acf(...) function in R to verify the degree of relationship. Using this function we can see the raw data are unrelated after the 17th lag, so we could use subgroups of that size. However, even then we see the Shewhart chart showing frequent violation, though fewer than before.
Yet another alternative is to use an EWMA chart, which takes the autocorrelation into account. However, the EWMA chart limits are found from the assumption that the subgroup means (or raw data, if subgroup size is 1), are independent.
So we are finally left with the conclusion that perhaps there data really are not from in control operation, or, if they are, we must manually adjust the limits to be wider.
file <- 'http://openmv.net/file/gas-furnace.csv'
data <- read.csv(file)
CO2 <- data$CO2
N.raw <- length(CO2)
N.sub <- 6
# Change ``N.sub`` to 10, 15, 20, etc
# At N.sub <- 17 we see the
# autocorrelation disappear
# Plot all the data
par(mar=c(4.2, 4.2, 0.5, 0.5))
par(cex.lab=1.3, cex.main=1.5,
cex.sub=1.5, cex.axis=1.5)
plot(CO2, type="p", pch=".", cex=2,
main="", ylab="CO2: raw data",
xlab="Sequence order")
# Create the subgroups on ALL the raw data.
# Form a matrix with `N.subgroup` rows by
# placing the vector of data down each row,
# then going across to form the columns.
# Calculate the mean and standard deviation
# within each subgroup (columns of the matrix)
subgroups <- matrix(CO2, N.sub, N.raw/N.sub)
subgroups.S <- apply(subgroups, 2, sd)
subgroups.xbar <- apply(subgroups, 2, mean)
ylim <- range(subgroups.xbar) + c(-3, +3)
# Keep adjusting N.sub until you don't see
# any autocorrelation between subgroups
acf(subgroups.xbar)
# Create a function to calculate
# Shewhart chart limits
shewhart_limits <- function(xbar, S,
sub.n, N.stdev=3){
# Give the xbar and S vector containing
# the subgroup means and standard
# deviations. Also give the subgroup
# size used. Returns the lower and upper
# control limits for the Shewhart chart
# (UCL and LCL) which are N.stdev away
# from the target.
# xdb = x.double.bar = mean of means
xdb <- mean(xbar)
s.bar <- mean(S)
num.an <- sqrt(2)*gamma(sub.n/2)
den.an <- sqrt(sub.n-1)*gamma((sub.n-1)/2)
an <- num.an / den.an
LCL <- xdb - 3*s.bar/(an*sqrt(sub.n))
UCL <- xdb + 3*s.bar/(an*sqrt(sub.n))
return(list(LCL, xdb, UCL))
}
limits <- shewhart_limits(subgroups.xbar,
subgroups.S, N.sub)
LCL <- limits[1]
xdb <- limits[2]
UCL <- limits[3]
c(LCL, xdb, UCL)
# Any points outside these limits?
par(mar=c(4.2, 4.2, 0.5, 0.5))
par(cex.lab=1.3, cex.main=1.5,
cex.sub=1.5, cex.axis=1.5)
plot(subgroups.xbar, type="b", pch=".",
cex=5, main="", ylim=ylim,
ylab="Phase I subgroups: round 1",
xlab="Sequence order")
abline(h=UCL, col="red")
abline(h=LCL, col="red")
abline(h=xdb, col="green")
lines(subgroups.xbar, type="b", pch=".",
cex=5)
Question 19
The percentage yield from a batch reactor, and the purity of the feedstock are available as the Batch yield and purity data set. Assume these data are from phase 1 operation and calculate the Shewhart chart upper and lower control limits that you would use during phase 2. Use a subgroup size of \(n=3\).
What is phase 1?
What is phase 2?
Show your calculations for the upper and lower control limits for the Shewhart chart on the yield value.
Show a plot of the Shewhart chart on these phase 1 data.
Solution based on work by Ryan McBride, Stuart Young, and Mudassir Rashid (2011 class)
Phase 1 is the period from which historical data is taken that is known to be “in control”. From this data, upper and lower control limits can be established for the monitored variable that contain a specified percent of all in control data.
Phase 2 is the period during which new, unseen data is collected by process monitoring in real-time. This data can be compared with the limits calculated from the “in control” data.
Assuming the dataset was derived from phase 1 operation, the batch yield data was grouped into subgroups of size 3. However, since the total number of data points (N=241) is not a multiple of three, the data set was truncated to the closest multiple of 3, i.e. \(N_{new} = 240\), by removing the last data point. Subsequently, the mean and standard deviation were calculated for each of the 80 subgroups. From this data, the lower and upper control limits were calculated as follows:
Noticing that the mean for subgroup 42, \(\overline{x}_{42}=63.3\), falls below this LCL, the control limits were recalculated excluding this subgroup from phase 1 data (see R-code). Following this adjustment, the new control limits were calculated to be:
LCL = 65.0
UCL = 85.8
Shewhart charts for both rounds of the yield data (before and after removing the outlier):
# Thanks to Mudassir for his source code to
# recursively calculate the limits. Some
# updates were made.
file <- 'http://openmv.net/file/batch-yield-and-purity.csv'
data <- read.csv(file)
y <- data$yield
variable <- "Yield"
N <- 3
# No further changes required. The code
# below will work for any new data set
subgroups <- matrix(y, N, length(y)/N)
x.mean <- numeric(length(y)/N)
x.sd <- numeric(length(y)/N)
# Calculate mean and sd of subgroups
# (see R-tutorial)
x.mean <- apply(subgroups, 2, mean)
x.sd <- apply(subgroups, 2, sd)
ylim <- range(x.mean) + c(-5, +5)
k <- 1
doloop <- TRUE
# Prevent infinite loops
while (doloop & k < 5){
num.an <- sqrt(2)*gamma(N/2)
den.an <- sqrt(N-1)*gamma((N-1)/2)
an <- num.an / den.an
S <- mean(x.sd)
xdb <- mean(x.mean) # x-double bar
LCL <- xdb - (3*S/(an*sqrt(N)))
UCL <- xdb + (3*S/(an*sqrt(N)))
print(c(LCL, UCL))
# Create a figure on every loop
par(mar=c(4.2, 4.2, 0.5, 0.5))
par(cex.lab=1.3, cex.main=1.5,
cex.sub=1.5, cex.axis=1.5)
plot(x.mean, type="b", pch=".",
cex=5, main="",
ylab=paste("Phase I subgroups: round", k),
xlab="Sequence order", ylim=ylim)
abline(h=UCL, col="red")
abline(h=LCL, col="red")
abline(h=xdb, col="green")
lines(x.mean, type="b", pch=".", cex=5)
if (!(any(x.mean < LCL) | any(x.mean > UCL))){
# Finally! No more points to exclude
doloop <- FALSE
}
k <- k + 1
# Retain in x.sd and x.mean only those
# entries that are within the control
# limits
x.sd <- x.sd[x.mean>=LCL]
x.mean <- x.mean[x.mean>=LCL]
x.sd <- x.sd[x.mean<=UCL]
x.mean <- x.mean[x.mean<=UCL]
} # end: while doloop
Question 20
You will hear about 6-sigma processes frequently in your career. What does it mean exactly that a process is “6-sigma capable”? Draw a diagram to help illustrate your answer.