Difference between revisions of "Assignment 4 - 2010 - Solution"

From Process Model Formulation and Solution: 3E4
Jump to navigation Jump to search
m (Created page with "{{OtherSidebar | due_dates = 08 November 2010 | dates_alt_text = Due date(s) | questions_PDF = Assignment-4b-2010.pdf | questions_text_alt = Assignment questions | solutions_PDF ...")
 
m
Line 2: Line 2:
| due_dates = 08 November 2010
| due_dates = 08 November 2010
| dates_alt_text = Due date(s)
| dates_alt_text = Due date(s)
| questions_PDF = Assignment-4b-2010.pdf
| questions_PDF =  
| questions_text_alt = Assignment questions
| questions_text_alt =  
| solutions_PDF = Assignment-4b-Solutions-2010.pdf
| solutions_PDF = Assignment-4-Solutions-2010.pdf
| solutions_text_alt = Solutions, prepared by Ali and Kevin
| solutions_text_alt = Solutions, prepared by Ali and Kevin
| other_instructions = Hand-in at class.
| other_instructions = Hand-in at class.
Line 14: Line 14:
.. |m3| replace:: m\ :sup:`3`
.. |m3| replace:: m\ :sup:`3`


.. rubric:: Questions 1, 2 and 3 will be posted at the end of the semester in assignment 4(a).
.. rubric:: Solutions by Kevin and Ali
 
Question 1 [2]
==============
 
A new type of `thermocouple <http://en.wikipedia.org/wiki/Thermocouple>`_ is being investigated by your group.  These devices produce an *almost* linear voltage  (millivolt) response at different temperatures. In practice though it is used the other way around: use the millivolt reading to predict the temperature.  The process of fitting this linear model is called *calibration*. 
 
#. Use the following data to calibrate a linear model:
 
================= ==== ==== ==== ==== ==== ==== ==== ==== ==== ====
Temperature [K]  273  293  313  333  353  373  393  413  433  453
----------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
Reading [mV]   0.01 0.12 0.24 0.38 0.51 0.67 0.84 1.01 1.15 1.31 
================= ==== ==== ==== ==== ==== ==== ==== ==== ==== ====
Show the linear model and provide the predicted temperature when reading 1.00 mV.
#. Are you satisfied with this model, based on the coefficient of determination (:math:`R^2`) value? 
 
#. What is the model's standard error?  Now, are you satisfied with the model's prediction ability, given that temperatures can usually be recorded to an accuracy of :math:`\pm 0.1` K with most thermocouples.
 
Solution
--------
 
#. The linear model is used to predict temperature given the reading in millivolts.  The reason is that in modelling, in general, we specify as :math:`X` the variable(s) we always have available, while :math:`y` is the variable we would like to predict from the :math:`X`'s.
 
The model has the form: :math:`T = a_0 + a_1V`, where :math:`T` is temperature and :math:`V` is the voltage reading.  Coefficients in the linear model are:
.. math::
T = 278.6 + 135.3 V
indicating that an increase in 0.1 mV leads to an increase, on average, of 13.53 K in the temperature prediction.
.. figure:: ../che3e4/Assignments/Assignment-4/images/voltage-linear-model.png
:scale: 60
:align: center
The following code was used to fit the model and draw the plot.
.. twocolumncode::
    :code1: ../che3e4/Assignments/Assignment-4/code/voltage_linear_model.m
    :language1: matlab
    :header1: MATLAB code
    :code2: ../che3e4/Assignments/Assignment-4/code/voltage_linear_model.py
    :language2: python
    :header2: Python code
If you used ``R`` to fit the model, you would written something like this::
> V <- c(0.01, 0.12, 0.24, 0.38, 0.51, 0.67, 0.84, 1.01, 1.15, 1.31)
> T <- c(273, 293, 313, 333, 353, 373, 393, 413, 433, 453)
> model <- lm(T ~ V)
> summary(model)
 
Call:
lm(formula = T ~ V)
 
Residuals:
    Min      1Q  Median      3Q    Max
-6.9272 -2.1212 -0.1954  2.7480  5.4239
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  278.574      2.204  126.39 1.72e-14 ***
V            135.298      2.922  46.30 5.23e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 3.916 on 8 degrees of freedom
Multiple R-squared: 0.9963, Adjusted R-squared: 0.9958
F-statistic:  2144 on 1 and 8 DF,  p-value: 5.229e-11
 
#. The :math:`R^2` value from this linear fit is :math:`R^2 = 0.996`, which being so close to 1.0, implies the model fits the data very well - that's all.  However, one cannot be satisfied with only an :math:`R^2` value, as explained in class, it has nothing to do with whether the model's prediction accuracy is any good.  So we can't tell anything from this number.
 
#. The model's standard error is 3.9 K.  If we assume the prediction error is normally distributed around the linear fit, this corresponds to one standard deviation.  So 95% of our prediction error lies roughly within a range of :math:`\pm 2\times 3.92` or :math:`\pm 7.8` K.  These are the dashed red lines drawn on the figure.  (Please note: the true error intervals are not parallel to the regression line, they are curved; however the :math:`\pm 2S_E` limits are  good-enough for this course and most engineering applications.
 
This prediction ability of :math:`\pm 8` K is probably not satisfying for most engineering applications, since we can predict temperatures far more accurately, over the range from 273K to 453K, using off-the-shelf commercial thermocouples. 
The purpose of this question is to mainly point out the misleading nature of :math:`R^2` - this value looks really good: 99.6%, yet the actual purpose of the model, the ability to predict temperature from the millivolt reading has no relationship at all to this :math:`R^2` value.
.. sd(T) = 60.5
.. diff(range(T)) = 180
.. baseline_ratio = 60/180 = 0.3333
.. SE = 3.9
.. model_ratio = 3.9/180 = 0.0216
.. ratio = 1 - 0.0216/0.3333 = 0.935: seems pretty good
 
Question 2 [2]
==============
 
Batch reactors are often used to calculate reaction rate data, since the dynamic balances are a direct function of the reaction rate:
 
.. math::
 
\frac{dC_{\sf A}(t)}{dt} = r_{\sf A}
assuming constant volume and assuming all energy-related terms are negligible.  We are unsure about the order of our reaction rate, :math:`\alpha`, and we also don't know the reaction rate constant, :math:`k` in the expression :math:`r_A = - k C_{\sf A}^{\alpha}`.
 
Integrating the above differential equation from time :math:`t=0` to time :math:`t=t` gives:
 
.. math::
 
\int_{C_{\sf A,0}}^{C_{\sf A}}{ \frac{1}{C_{\sf A}^\alpha} dC_{\sf A}} &= -k \int_{0}^{t}{dt} \\
                          \left.          \frac{C_{\sf A}^{1-\alpha}}{1-\alpha} \right|_{C_{\sf A,0}}^{C_{\sf A}} &= -kt  \qquad \text{where}\,\, \alpha \neq 1
 
The following concentrations are collected as a function of time
 
====================== ==== ==== ==== ==== ==== ==== ==== ==== ====
Time [hours]          0    0.5  1    2    3    5    7.5  10  17
---------------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
Concentration [mol/L]  3.5  1.5  0.92 0.47 0.29 0.16 0.09 0.06 0.03
====================== ==== ==== ==== ==== ==== ==== ==== ==== ====
 
#. Use these concentration-time data to determine the values of :math:`\alpha` and :math:`k`, using a least squares model.
 
#. Once you have the rate constants, simulate the batch reactor (use the ``ode45`` or Python integrator) and superimpose the data from the above table on your concentration-time trajectory.  Do they agree?  Which regions of the plot have the worst agreement (this is where you would collect more data in the future).
 
 
Solution
--------
 
#. There are to approaches to this problem.  One is to calculate finite differences :math:`\left(\frac{dC_{\sf A}(t)}{dt}\right)_k` and plot these against the assumed rate expression :math:`\left(- k C_{\sf A}^{\alpha}\right)_k`.  The other approach is to first integrate the differential equation, as partly shown in the question, and then plot some function of concentration against time. 
 
The former approach is usually less accurate, because we are using concentration measurement (already measured with error) to estimate finite differences (which will have error), and then fit a linear model (which will have error).  The latter approach, called the integral method in reactor design, is more accurate because it works directly with the concentration and time measurements.  In this case the integral method is really messy, so we will use the easier, finite-difference method.
 
The finite difference approximations are:
.. math::
\begin{array}{llll}
t    & \text{Lowest error method} & \text{Calculation} & \text{Result}\\ \hline
0.5  & \text{Central diff}  & (0.92-3.5)/1  & -2.58\\
1    & \text{Backward diff}  & (0.92-1.5)/0.5 & -1.16\\
2    & \text{Central diff}  & (0.29-0.92)/2  & -0.315 \\
3    & \text{Backward diff}  & (0.29 -0.47)/1 & -0.18  \\
5    & \text{Backward diff}  & (0.16-0.29)/2  & -0.065 \\
7.5  & \text{Central diff}  & (0.06-0.16)/5  & -0.02  \\
10  & \text{Backward diff}  & (0.06-0.09)/2.5& -0.012\\
17  & \text{Backward diff}  & (0.03-0.06)/7  & -.0043 \\\hline
\end{array}
The linear model can now be derived as follows:
.. math::
\frac{dC_{\sf A}(t)}{dt}                  &= - k C_{\sf A}^{\alpha} \\
\ln\left(-\frac{dC_{\sf A}(t)}{dt}\right) &= \ln(k) + \alpha \ln(C_{\sf A}) \\
                                              y_k &= a_0 + a_1 x_k
Then in ``R`` one can write::
> log_dCdt <- log(c(2.58, 1.16, 0.315, 0.18, 0.065, 0.02, 0.012, 0.0043))
> log_C <- log(c(1.5,  0.92, 0.47,  0.29, 0.16,  0.09, 0.06,  0.03))
> model <- lm(log_dCdt ~ log_C)
> summary(model)
Call:
lm(formula = log_dCdt ~ log_C)
 
Residuals:
    Min      1Q  Median      3Q      Max
-0.17327 -0.04749  0.04231  0.06375  0.10477
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.23971    0.06295  3.808  0.00888 **
log_C        1.65222    0.03165  52.202 3.32e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 0.1139 on 6 degrees of freedom
Multiple R-squared: 0.9978, Adjusted R-squared: 0.9974
F-statistic:  2725 on 1 and 6 DF,  p-value: 3.316e-09
or in MATLAB and Python we can find the same result:
.. math::
\ln\left(-\frac{dC_{\sf A}(t)}{dt}\right) &= \ln(k) + \alpha \ln(C_{\sf A}) \\
\ln\left(-\frac{dC_{\sf A}(t)}{dt}\right) &= 0.2397 + 1.652 \ln(C_{\sf A}) \\
    k &= {\bf 1.271\,\,\text{hours}^{-1}}\\
  \alpha &= {\bf 1.65}
.. figure:: ../che3e4/Assignments/Assignment-4/images/rate-fitting-linear-model.png
:scale: 48
:align: center
 
#. We can simulate this and superimpose it on the given data to give the following plot. The agreement is good, probably OK for the purposes which we want this for: to know the rate constant and reaction rate. There is low agreement as the change in concentration is the greatest (probably due to error in our finite difference approximations). 
 
We should collect more data in the region between :math:`t=0` and :math:`t=5` hours, using equally spaced time steps, so we can accurately estimate the derivatives using the central difference approximation.
 
The code to calculate the linear model and simulate the system is also provided.
 
.. figure:: ../che3e4/Assignments/Assignment-4/images/rate-fitting-simulation.png
:scale: 60
:align: center
 
.. twocolumncode::
    :code1: ../che3e4/Assignments/Assignment-4/code/rate_fitting.m
    :language1: matlab
    :header1: MATLAB code
    :code2: ../che3e4/Assignments/Assignment-4/code/rate_fitting.py
    :language2: python
    :header2: Python code
Question 3 [3]
==============
 
The yield from your lab-scale bioreactor, :math:`y`, is a function of reactor temperature, impeller speed and reactor type (one with with baffles and one without).  You have collected these data from various experiments.
.. csv-table::
  :header: Temp = :math:`T` [°C], Speed = :math:`S` [RPM], Baffles = :math:`B` [Yes/No], Yield = :math:`y` [g]
  :widths: 30, 30, 30, 30
 
82,      4300,      No,      51
90,      3700,      Yes,    30
88,      4200,      Yes,    40
86,      3300,      Yes,    28
80,      4300,      No,      49
78,      4300,      Yes,    49
82,      3900,      Yes,    44
83,      4300,      No,      59
65,      4200,      No,      61
73,      4400,      No,      59
60,      4400,      No,      57
60,      4400,      No,      62
101,    4400,      No,      42
92,      4900,      Yes,    38
#. Use a built-in software function in MATLAB or Python (please don't use Excel) to fit a linear model that predicts the bioreactor yield from the above variables.  Your model should be of the form:
 
.. math::
\hat{y} = a_0 + a_\text{T} T + a_\text{S} S + a_\text{B} B
*Hint*: convert the No/Yes variables to integers with ``0=No`` and ``1=Yes``.
#. Interpret the meaning of each coefficient in the model and comment on the standard error.
 
#. Calculate and show the :math:`\mathbf{X}^T\mathbf{X}` and :math:`\mathbf{X}^T\mathbf{y}` matrices for this linear model in order to calculate the 4 coefficients in :math:`{\bf a} = \left[a_0, a_T, A_S, A_B\right]` yourself.  Your answer must include the MATLAB or Python code that computes these matrices and the :math:`{\bf a}` solution vector.
 
Solution
--------
 
.. note:: Initially, I assumed the lab computers had the Statistics Toolbox function in MATLAB so you could use the built-in ``regress.m`` function in MATLAB.  Python users should however use the built-in ``np.linalg.lstsq(...)`` function.  As a result all questions will be answered together below.
 
After converting the Yes/No variable to integers you can fit a linear model of the form :math:`\hat{y} = a_0 + a_\text{T} T + a_\text{S} S + a_\text{B} B`.  In R::
 
> y_yield <- c(51,30,40,28,49,49,44,59,61,59,57,62,42,38)
> x_temp <- c(82,90,88,86,80,78,82,83,65,73,60,60,101,92)
> x_speed <- c(4300,3700,4200,3300,4300,4300,3900,4300,4200,4400,4400,4400,4400,4900)
> x_baffles <- c(0,1,1,1,0,1,1,0,0,0,0,0,0,1)
> model = lm(y_yield ~ x_temp + x_speed + x_baffles)
> summary(model)
Call:
lm(formula = y_yield ~ x_temp + x_speed + x_baffles)
 
Residuals:
    Min      1Q  Median      3Q    Max
-6.0790 -3.4265 -0.7974  2.2026  7.9710
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|) 
(Intercept) 54.804336  18.404567  2.978  0.01386 *
x_temp      -0.486872  0.121692  -4.001  0.00251 **
x_speed      0.008520  0.003784  2.252  0.04804 *
x_baffles  -9.271748  3.073260  -3.017  0.01296 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 4.708 on 10 degrees of freedom
Multiple R-squared: 0.8647, Adjusted R-squared: 0.8241
F-statistic:  21.3 on 3 and 10 DF,  p-value: 0.0001156
which indicates that the model is given by:
 
.. math::
 
\hat{y} = 54.8 -0.487 T + 0.00852 S + -9.27 B
The coefficients are interpreted as:
 
* :math:`a_\text{S} = 0.0085`: a 100rpm increase in impeller speed serves to increase yield by 0.85g on average, keeping all other variables constant
* :math:`a_\text{B} = -9.3`: the use of baffles decreases yield, on average, by 9.1g, keeping all other variables constant
* :math:`a_\text{T} = -0.49`: each one degree increase in temperature lowers yield by 0.49g on average, keeping all other variables constant
This information can be quite valuable: we can now decide, using process economics, which is the most cost-effective way to boost the yield from our bioreactor.
 
The standard error is 4.7 g.  Since we aren't given any information on the accuracy of the yield measurement, it is hard to quantify whether this is reasonable or not.  In practice one would need to run several experiments under the same conditions and measure the yield.  The standard deviation of those repeated experimental yields would tell us what the noise in our measurement is.
We can fit the model in MATLAB or Python.  Let the coefficient vector be :math:`\mathrm{a} = [a_0, a_\text{S},  a_\text{B}, a_\text{T}]`, then we can write down the following X matrix to estimate it:
.. math::
\mathrm{X} = \begin{bmatrix}
1 &  82 & 4300 & 0\\
1 &  90 & 3700 & 1\\
1 &  88 & 4200 & 1\\
1 &  86 & 3300 & 1\\
1 &  80 & 4300 & 0\\
1 &  78 & 4300 & 1\\
1 &  82 & 3900 & 1\\
1 &  83 & 4300 & 0\\
1 &  65 & 4200 & 0\\
1 &  73 & 4400 & 0\\
1 &  60 & 4400 & 0\\
1 &  60 & 4400 & 0\\
1 &  101 & 4400 & 0\\
1 &  92 & 4900 & 1
\end{bmatrix}
Then we can construct the :math:`\mathrm{X}^T\mathrm{X}` matrix (called the *correlation matrix*), and the :math:`\mathrm{X}^T\mathrm{y}` vector (called the *covariance vector*).
 
.. math::
\mathrm{X}^T\mathrm{X} = \begin{bmatrix}
14      & 1120    & 59000    & 6 \\
1120    & 91480    & 4712500  & 516 \\
59000  & 4712500  & 250480000 & 24300    \\
6      & 516      & 24300    &  6
\end{bmatrix} 
\qquad \text{and} \qquad
\mathrm{X}^T\mathrm{y} = \begin{bmatrix}
669    \\
52207  \\
2847800 \\
229
\end{bmatrix}
Using these matrices to solve for :math:`\mathrm{a}`
.. math::
\mathrm{a} = \left(\mathrm{X}^T\mathrm{X} \right)^{-1}\mathrm{X}^T\mathrm{y} =  \begin{bmatrix} 54.80 \\ -0.4869 \\ 0.00852 \\ -9.271 \end{bmatrix}
 
This result matches the results from ``R``.  The MATLAB and Python code is shown below.


.. rubric:: Solutions by Kevin and Ali


.. twocolumncode::
    :code1: ../che3e4/Assignments/Assignment-4/code/batch_yields.m
    :language1: matlab
    :header1: MATLAB code
    :code2: ../che3e4/Assignments/Assignment-4/code/batch_yields.py
    :language2: python
    :header2: Python code
Question 4 [2]
Question 4 [2]
==============
==============
Line 44: Line 386:
Solution
Solution
---------
---------
Redo the solution with these new values.  The solution below is for old values.


#. Recall that an :math:`n-1` order Lagrange interpolating polynomial, for a set of :math:`n` data points, is given by the following relation:
#. Recall that an :math:`n-1` order Lagrange interpolating polynomial, for a set of :math:`n` data points, is given by the following relation:

Revision as of 14:05, 15 December 2010

Due date(s): 08 November 2010
Nuvola mimetypes pdf.png (PDF) Solutions, prepared by Ali and Kevin
Other instructions Hand-in at class.

<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> .. |m3| replace:: m\ :sup:`3`

.. rubric:: Solutions by Kevin and Ali

Question 1 [2]

==

A new type of `thermocouple <http://en.wikipedia.org/wiki/Thermocouple>`_ is being investigated by your group. These devices produce an *almost* linear voltage (millivolt) response at different temperatures. In practice though it is used the other way around: use the millivolt reading to predict the temperature. The process of fitting this linear model is called *calibration*.

  1. . Use the following data to calibrate a linear model:

================= ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== Temperature [K] 273 293 313 333 353 373 393 413 433 453 ----------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- Reading [mV] 0.01 0.12 0.24 0.38 0.51 0.67 0.84 1.01 1.15 1.31 ================= ==== ==== ==== ==== ==== ==== ==== ==== ==== ====


Show the linear model and provide the predicted temperature when reading 1.00 mV.

  1. . Are you satisfied with this model, based on the coefficient of determination (:math:`R^2`) value?
  1. . What is the model's standard error? Now, are you satisfied with the model's prediction ability, given that temperatures can usually be recorded to an accuracy of :math:`\pm 0.1` K with most thermocouples.

Solution


  1. . The linear model is used to predict temperature given the reading in millivolts. The reason is that in modelling, in general, we specify as :math:`X` the variable(s) we always have available, while :math:`y` is the variable we would like to predict from the :math:`X`'s.

The model has the form: :math:`T = a_0 + a_1V`, where :math:`T` is temperature and :math:`V` is the voltage reading. Coefficients in the linear model are:

.. math::

T = 278.6 + 135.3 V

indicating that an increase in 0.1 mV leads to an increase, on average, of 13.53 K in the temperature prediction.

.. figure:: ../che3e4/Assignments/Assignment-4/images/voltage-linear-model.png :scale: 60 :align: center

The following code was used to fit the model and draw the plot.

.. twocolumncode:: :code1: ../che3e4/Assignments/Assignment-4/code/voltage_linear_model.m :language1: matlab :header1: MATLAB code :code2: ../che3e4/Assignments/Assignment-4/code/voltage_linear_model.py :language2: python :header2: Python code

If you used ``R`` to fit the model, you would written something like this::

> V <- c(0.01, 0.12, 0.24, 0.38, 0.51, 0.67, 0.84, 1.01, 1.15, 1.31) > T <- c(273, 293, 313, 333, 353, 373, 393, 413, 433, 453) > model <- lm(T ~ V) > summary(model)

Call: lm(formula = T ~ V)

Residuals: Min 1Q Median 3Q Max -6.9272 -2.1212 -0.1954 2.7480 5.4239

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 278.574 2.204 126.39 1.72e-14 *** V 135.298 2.922 46.30 5.23e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.916 on 8 degrees of freedom Multiple R-squared: 0.9963, Adjusted R-squared: 0.9958 F-statistic: 2144 on 1 and 8 DF, p-value: 5.229e-11

  1. . The :math:`R^2` value from this linear fit is :math:`R^2 = 0.996`, which being so close to 1.0, implies the model fits the data very well - that's all. However, one cannot be satisfied with only an :math:`R^2` value, as explained in class, it has nothing to do with whether the model's prediction accuracy is any good. So we can't tell anything from this number.
  1. . The model's standard error is 3.9 K. If we assume the prediction error is normally distributed around the linear fit, this corresponds to one standard deviation. So 95% of our prediction error lies roughly within a range of :math:`\pm 2\times 3.92` or :math:`\pm 7.8` K. These are the dashed red lines drawn on the figure. (Please note: the true error intervals are not parallel to the regression line, they are curved; however the :math:`\pm 2S_E` limits are good-enough for this course and most engineering applications.

This prediction ability of :math:`\pm 8` K is probably not satisfying for most engineering applications, since we can predict temperatures far more accurately, over the range from 273K to 453K, using off-the-shelf commercial thermocouples.

The purpose of this question is to mainly point out the misleading nature of :math:`R^2` - this value looks really good: 99.6%, yet the actual purpose of the model, the ability to predict temperature from the millivolt reading has no relationship at all to this :math:`R^2` value.

.. sd(T) = 60.5 .. diff(range(T)) = 180 .. baseline_ratio = 60/180 = 0.3333 .. SE = 3.9 .. model_ratio = 3.9/180 = 0.0216 .. ratio = 1 - 0.0216/0.3333 = 0.935: seems pretty good

Question 2 [2]

==

Batch reactors are often used to calculate reaction rate data, since the dynamic balances are a direct function of the reaction rate:

.. math::

\frac{dC_{\sf A}(t)}{dt} = r_{\sf A}

assuming constant volume and assuming all energy-related terms are negligible. We are unsure about the order of our reaction rate, :math:`\alpha`, and we also don't know the reaction rate constant, :math:`k` in the expression :math:`r_A = - k C_{\sf A}^{\alpha}`.

Integrating the above differential equation from time :math:`t=0` to time :math:`t=t` gives:

.. math::

\int_{C_{\sf A,0}}^{C_{\sf A}}{ \frac{1}{C_{\sf A}^\alpha} dC_{\sf A}} &= -k \int_{0}^{t}{dt} \\

                          \left.          \frac{C_{\sf A}^{1-\alpha}}{1-\alpha} \right|_{C_{\sf A,0}}^{C_{\sf A}} &= -kt  \qquad \text{where}\,\, \alpha \neq 1

The following concentrations are collected as a function of time

================== ==== ==== ==== ==== ==== ==== ==== ====

Time [hours] 0 0.5 1 2 3 5 7.5 10 17


---- ---- ---- ---- ---- ---- ---- ---- ----

Concentration [mol/L] 3.5 1.5 0.92 0.47 0.29 0.16 0.09 0.06 0.03

================== ==== ==== ==== ==== ==== ==== ==== ====

  1. . Use these concentration-time data to determine the values of :math:`\alpha` and :math:`k`, using a least squares model.
  1. . Once you have the rate constants, simulate the batch reactor (use the ``ode45`` or Python integrator) and superimpose the data from the above table on your concentration-time trajectory. Do they agree? Which regions of the plot have the worst agreement (this is where you would collect more data in the future).


Solution


  1. . There are to approaches to this problem. One is to calculate finite differences :math:`\left(\frac{dC_{\sf A}(t)}{dt}\right)_k` and plot these against the assumed rate expression :math:`\left(- k C_{\sf A}^{\alpha}\right)_k`. The other approach is to first integrate the differential equation, as partly shown in the question, and then plot some function of concentration against time.

The former approach is usually less accurate, because we are using concentration measurement (already measured with error) to estimate finite differences (which will have error), and then fit a linear model (which will have error). The latter approach, called the integral method in reactor design, is more accurate because it works directly with the concentration and time measurements. In this case the integral method is really messy, so we will use the easier, finite-difference method.

The finite difference approximations are:

.. math::

\begin{array}{llll} t & \text{Lowest error method} & \text{Calculation} & \text{Result}\\ \hline 0.5 & \text{Central diff} & (0.92-3.5)/1 & -2.58\\ 1 & \text{Backward diff} & (0.92-1.5)/0.5 & -1.16\\ 2 & \text{Central diff} & (0.29-0.92)/2 & -0.315 \\ 3 & \text{Backward diff} & (0.29 -0.47)/1 & -0.18 \\ 5 & \text{Backward diff} & (0.16-0.29)/2 & -0.065 \\ 7.5 & \text{Central diff} & (0.06-0.16)/5 & -0.02 \\ 10 & \text{Backward diff} & (0.06-0.09)/2.5& -0.012\\ 17 & \text{Backward diff} & (0.03-0.06)/7 & -.0043 \\\hline \end{array}

The linear model can now be derived as follows:

.. math::

\frac{dC_{\sf A}(t)}{dt} &= - k C_{\sf A}^{\alpha} \\ \ln\left(-\frac{dC_{\sf A}(t)}{dt}\right) &= \ln(k) + \alpha \ln(C_{\sf A}) \\

                                             y_k &= a_0 + a_1 x_k

Then in ``R`` one can write::

> log_dCdt <- log(c(2.58, 1.16, 0.315, 0.18, 0.065, 0.02, 0.012, 0.0043)) > log_C <- log(c(1.5, 0.92, 0.47, 0.29, 0.16, 0.09, 0.06, 0.03)) > model <- lm(log_dCdt ~ log_C) > summary(model)

Call: lm(formula = log_dCdt ~ log_C)

Residuals: Min 1Q Median 3Q Max -0.17327 -0.04749 0.04231 0.06375 0.10477

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.23971 0.06295 3.808 0.00888 ** log_C 1.65222 0.03165 52.202 3.32e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1139 on 6 degrees of freedom Multiple R-squared: 0.9978, Adjusted R-squared: 0.9974 F-statistic: 2725 on 1 and 6 DF, p-value: 3.316e-09

or in MATLAB and Python we can find the same result:

.. math::

\ln\left(-\frac{dC_{\sf A}(t)}{dt}\right) &= \ln(k) + \alpha \ln(C_{\sf A}) \\ \ln\left(-\frac{dC_{\sf A}(t)}{dt}\right) &= 0.2397 + 1.652 \ln(C_{\sf A}) \\ k &= {\bf 1.271\,\,\text{hours}^{-1}}\\ \alpha &= {\bf 1.65}

.. figure:: ../che3e4/Assignments/Assignment-4/images/rate-fitting-linear-model.png :scale: 48 :align: center

  1. . We can simulate this and superimpose it on the given data to give the following plot. The agreement is good, probably OK for the purposes which we want this for: to know the rate constant and reaction rate. There is low agreement as the change in concentration is the greatest (probably due to error in our finite difference approximations).

We should collect more data in the region between :math:`t=0` and :math:`t=5` hours, using equally spaced time steps, so we can accurately estimate the derivatives using the central difference approximation.

The code to calculate the linear model and simulate the system is also provided.

.. figure:: ../che3e4/Assignments/Assignment-4/images/rate-fitting-simulation.png :scale: 60 :align: center

.. twocolumncode:: :code1: ../che3e4/Assignments/Assignment-4/code/rate_fitting.m :language1: matlab :header1: MATLAB code :code2: ../che3e4/Assignments/Assignment-4/code/rate_fitting.py :language2: python :header2: Python code

Question 3 [3]

==

The yield from your lab-scale bioreactor, :math:`y`, is a function of reactor temperature, impeller speed and reactor type (one with with baffles and one without). You have collected these data from various experiments.

.. csv-table:: :header: Temp = :math:`T` [°C], Speed = :math:`S` [RPM], Baffles = :math:`B` [Yes/No], Yield = :math:`y` [g] :widths: 30, 30, 30, 30

82, 4300, No, 51 90, 3700, Yes, 30 88, 4200, Yes, 40 86, 3300, Yes, 28 80, 4300, No, 49 78, 4300, Yes, 49 82, 3900, Yes, 44 83, 4300, No, 59 65, 4200, No, 61 73, 4400, No, 59 60, 4400, No, 57 60, 4400, No, 62 101, 4400, No, 42 92, 4900, Yes, 38

  1. . Use a built-in software function in MATLAB or Python (please don't use Excel) to fit a linear model that predicts the bioreactor yield from the above variables. Your model should be of the form:

.. math::

\hat{y} = a_0 + a_\text{T} T + a_\text{S} S + a_\text{B} B

*Hint*: convert the No/Yes variables to integers with ``0=No`` and ``1=Yes``.

  1. . Interpret the meaning of each coefficient in the model and comment on the standard error.
  1. . Calculate and show the :math:`\mathbf{X}^T\mathbf{X}` and :math:`\mathbf{X}^T\mathbf{y}` matrices for this linear model in order to calculate the 4 coefficients in :math:`{\bf a} = \left[a_0, a_T, A_S, A_B\right]` yourself. Your answer must include the MATLAB or Python code that computes these matrices and the :math:`{\bf a}` solution vector.

Solution


.. note:: Initially, I assumed the lab computers had the Statistics Toolbox function in MATLAB so you could use the built-in ``regress.m`` function in MATLAB. Python users should however use the built-in ``np.linalg.lstsq(...)`` function. As a result all questions will be answered together below.

After converting the Yes/No variable to integers you can fit a linear model of the form :math:`\hat{y} = a_0 + a_\text{T} T + a_\text{S} S + a_\text{B} B`. In R::

> y_yield <- c(51,30,40,28,49,49,44,59,61,59,57,62,42,38) > x_temp <- c(82,90,88,86,80,78,82,83,65,73,60,60,101,92) > x_speed <- c(4300,3700,4200,3300,4300,4300,3900,4300,4200,4400,4400,4400,4400,4900) > x_baffles <- c(0,1,1,1,0,1,1,0,0,0,0,0,0,1) > model = lm(y_yield ~ x_temp + x_speed + x_baffles) > summary(model)

Call: lm(formula = y_yield ~ x_temp + x_speed + x_baffles)

Residuals: Min 1Q Median 3Q Max -6.0790 -3.4265 -0.7974 2.2026 7.9710

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 54.804336 18.404567 2.978 0.01386 * x_temp -0.486872 0.121692 -4.001 0.00251 ** x_speed 0.008520 0.003784 2.252 0.04804 * x_baffles -9.271748 3.073260 -3.017 0.01296 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.708 on 10 degrees of freedom Multiple R-squared: 0.8647, Adjusted R-squared: 0.8241 F-statistic: 21.3 on 3 and 10 DF, p-value: 0.0001156

which indicates that the model is given by:

.. math::

\hat{y} = 54.8 -0.487 T + 0.00852 S + -9.27 B

The coefficients are interpreted as:

  • :math:`a_\text{S} = 0.0085`: a 100rpm increase in impeller speed serves to increase yield by 0.85g on average, keeping all other variables constant
  • :math:`a_\text{B} = -9.3`: the use of baffles decreases yield, on average, by 9.1g, keeping all other variables constant
  • :math:`a_\text{T} = -0.49`: each one degree increase in temperature lowers yield by 0.49g on average, keeping all other variables constant

This information can be quite valuable: we can now decide, using process economics, which is the most cost-effective way to boost the yield from our bioreactor.

The standard error is 4.7 g. Since we aren't given any information on the accuracy of the yield measurement, it is hard to quantify whether this is reasonable or not. In practice one would need to run several experiments under the same conditions and measure the yield. The standard deviation of those repeated experimental yields would tell us what the noise in our measurement is.

We can fit the model in MATLAB or Python. Let the coefficient vector be :math:`\mathrm{a} = [a_0, a_\text{S}, a_\text{B}, a_\text{T}]`, then we can write down the following X matrix to estimate it:

.. math:: \mathrm{X} = \begin{bmatrix} 1 & 82 & 4300 & 0\\ 1 & 90 & 3700 & 1\\ 1 & 88 & 4200 & 1\\ 1 & 86 & 3300 & 1\\ 1 & 80 & 4300 & 0\\ 1 & 78 & 4300 & 1\\ 1 & 82 & 3900 & 1\\ 1 & 83 & 4300 & 0\\ 1 & 65 & 4200 & 0\\ 1 & 73 & 4400 & 0\\ 1 & 60 & 4400 & 0\\ 1 & 60 & 4400 & 0\\ 1 & 101 & 4400 & 0\\ 1 & 92 & 4900 & 1 \end{bmatrix}

Then we can construct the :math:`\mathrm{X}^T\mathrm{X}` matrix (called the *correlation matrix*), and the :math:`\mathrm{X}^T\mathrm{y}` vector (called the *covariance vector*).

.. math:: \mathrm{X}^T\mathrm{X} = \begin{bmatrix} 14 & 1120 & 59000 & 6 \\ 1120 & 91480 & 4712500 & 516 \\ 59000 & 4712500 & 250480000 & 24300 \\ 6 & 516 & 24300 & 6 \end{bmatrix} \qquad \text{and} \qquad \mathrm{X}^T\mathrm{y} = \begin{bmatrix} 669 \\ 52207 \\ 2847800 \\ 229 \end{bmatrix}

Using these matrices to solve for :math:`\mathrm{a}`

	.. math::

\mathrm{a} = \left(\mathrm{X}^T\mathrm{X} \right)^{-1}\mathrm{X}^T\mathrm{y} = \begin{bmatrix} 54.80 \\ -0.4869 \\ 0.00852 \\ -9.271 \end{bmatrix}

This result matches the results from ``R``. The MATLAB and Python code is shown below.


.. twocolumncode:: :code1: ../che3e4/Assignments/Assignment-4/code/batch_yields.m :language1: matlab :header1: MATLAB code :code2: ../che3e4/Assignments/Assignment-4/code/batch_yields.py :language2: python :header2: Python code

Question 4 [2]

==

.. Similar to Tutorial 8, 2009

The viscosity of sulphuric acid, :math:`\nu`, varies with purity, :math:`p` in the following manner:

======================== ===== =====
math:`p` [%] 20 60 80

----- ----- -----

math:`\nu` [millipascal] 1.40 5.37 17.4
======================== ===== =====
  1. . Express :math:`\nu(p)` as a quadratic function using Lagrange interpolating polynomials. Do not simplify the polynomial.
  2. . Express :math:`\nu(p)` as a quadratic function using Newton interpolating polynomials. Do not simplify the polynomial.
  3. . Fit a cubic spline through the data points: clearly show your :math:`{\bf Xa = y}` linear system of equations, then solve them using computer software; finally report your spline coefficients.
  4. . Use computer software to plot:
	*	the Newton interpolating polynomial 

* the cubic spline, * and the 3 data points on the same graph.

  1. . What is the estimated viscosity at :math:`p = 40\%` purity using linear interpolation?
  2. . Which of the estimation procedures that you used above has the closest estimate to the true value of 2.51 millipascal?

Solution


  1. . Recall that an :math:`n-1` order Lagrange interpolating polynomial, for a set of :math:`n` data points, is given by the following relation:

.. math::

P_{n-1}(x) = y_{1}\ell_{1}(x) + y_{2}\ell_{2}(x) + \ldots + y_{n}\ell_{n}(x) = \sum_{i=1}^{n}y_{i}\ell_{i}(x) \qquad \text{where} \qquad \ell_{i}(x) = \prod_{j=1, j\neq i}^{n} \frac{x - x_{i}}{x_{i} - x_{j}}

For our system the quadratic Lagrange interpolating polynomial is the following:

.. math::

P_{2}(x) &= y_{1}\ell_{1}(x) + y_{2}\ell_{2}(x) + \ldots + y_{n}\ell_{n}(x) \\ &= 1.40 \cdot \frac{(x-60)(x-80)}{(20-60)(20-80)} + 5.37 \cdot \frac{(x-20)(x-80)}{(60-20)(60-80)} + 17.4 \cdot \frac{(x-20)(x-60)}{(80-20)(80-60)}\\


  1. . Recall that an :math:`n-1` order Newton interpolating polynomial, for a set of :math:`n` data points, is given by the following relation:

.. math::

P_{n-1}(x) = b_{0}+b_{1}(x-x_{1})+b_{2}(x-x_{1})(x-x_{2})+ \ldots +b_{n-1}(x-x_{1})\cdots(x-x_{n-1})

The coefficients, :math:`b_i` are best calculated in table form, known as "Newton's divided differences":

.. math::

\begin{array}{cccc} \hline \hline p_{i} & \nu_{i} & \nu[p_{i},p_{i-1}] & \nu[p_{i}, p_{i-1}, p_{i-2}] \\ \hline \hline 20 & \textbf{1.40} & \frac{5.37-1.40}{60-20}=\textbf{0.09925} & \frac{0.6015-0.09925}{80-20}=\textbf{0.008371} \\ 60 & 5.37 & \frac{17.4-5.37}{80-60}=0.6015 & \\ 80 & 17.4 & & \\ \hline \hline \end{array}


So the Newton interpolating polynomial is:

.. math::

P_{2}(x) = 1.40 + 0.09925(x-x_{1}) + 0.008371(x-x_{1})(x-x_{2})

  1. . The cubic spline through 3 data points consists of two cubic polynomials -- one connecting points 1 and 2, and another connecting points 2 and 3.

.. math::

s_3(x) = \left\{ \begin{array}{l} a_1 + b_1x + c_1x^2 + d_1x^3 \qquad\,\,x_1 \leq x \leq x_2 \\ a_2 + b_2x + c_2x^2 + d_2x^3 \qquad\,\,x_2 \leq x \leq x_3 \end{array} \right.

So we have 8 unknown coefficients in these 2 polynomials. The 8 linear equations that can be used to solve for the coefficients are:

.. math::

\begin{array}{ll} \text{Starting knot} & s_3(x_1) = y_1 = a_1 + b_1x_1 + c_1x_1^2 + d_1x_1^3\\ \text{End knot} & s_3(x_3) = y_3 = a_2 + b_2x_3 + c_2x_3^2 + d_2x_3^3\\ \text{Continuity} & s_3(x_2) = \left\{ \begin{array}{l} y_2 = a_1 + b_1x_2 + c_1x_2^2 + d_1x_2^3 \\ y_2 = a_2 + b_2x_2 + c_2x_2^2 + d_2x_2^3 \end{array} \right.\\ \text{1st derivative continuity} & s'_3(x_2) = b_1 + 2c_1x_2 + 3d_1x_2^2 = b_2 + 2c_2x_2 + 3d_2x_2^2 \\ \text{2nd derivative continuity} & s_3(x_2) = 2c_1 + 6d_1x_2 = 2c_2 + 6d_2x_2 \\ \text{$s_3(x)=0$ at the ends} & s''_3(x_1) = 0 = 2c_1 + 6d_1x_1 \\ & s''_3(x_3) = 0 = 2c_2 + 6d_2x_3 \end{array} Expressing this as a linear system of equations, :math:`{\bf Xa = y}`: .. math:: {\bf Xa} &= {\bf y} \\ \left( \begin{array}{cccccccc} 0 & 0 & 2 & 6x_1 \\ 1 & x_1 & x_1^2 & x_1^3 \\ 1 & x_2 & x_2^2 & x_2^3 \\ 0 & 1 & 2x_2 & 3x_2^2 & 0 & -1 & -2x_2 & -3x_2^2 \\ 0 & 0 & 2 & 6x_2 & 0 & 0 & -2 & -6x_2 \\ & & & & 1 & x_2 & x_2^2 & x_2^3 \\ & & & & 1 & x_3 & x_3^2 & x_3^3 \\ & & & & & & 2 & 6x_3 \end{array}\right) \left( \begin{array}{c} a_1 \\ b_1 \\ c_1 \\ d_1 \\ a_2 \\ b_2 \\ c_2 \\ d_2 \end{array} \right) &= \left( \begin{array}{c} 0 \\ y_1 \\ y_2 \\ 0 \\ 0 \\ y_2 \\ y_3 \\ 0 \end{array} \right) \\ \left( \begin{array}{cccccccc} 0 & 0 & 2 & 120 \\ 1 & 20 & 400 & 8000 \\ 1 & 60 & 3600 & 216000 \\ 0 & 1 & 120 & 10800 & 0 & -1 & -120 & -10800 \\ 0 & 0 & 2 & 360 & 0 & 0 & -2 & -360 \\ & & & & 1 & 60 & 3600 & 216000 \\ & & & & 1 & 80 & 6400 & 512000 \\ & & & & 0 & 0 & 2 & 480 \end{array}\right) \left( \begin{array}{c} a_1 \\ b_1 \\ c_1 \\ d_1 \\ a_2 \\ b_2 \\ c_2 \\ d_2 \end{array} \right) &= \left( \begin{array}{c} 0 \\ 1.40 \\ 5.37 \\ 0 \\ 0 \\ 5.37 \\ 17.4 \\ 0 \end{array} \right) Solving this system of equations with computer software gives: .. math:: s_3(x) = \left\{ \begin{array}{l} 1.93 + 0.0579x - 6.28\times10^{-3} x^2 + 1.05\times10^{-4}x^3 \qquad\,\,x_1 \leq x \leq x_2 \\ 69.73 - 3.33x + 5.02\times10^{-2} x^2 - 2.09\times10^{-4}x^3 \qquad\,\,x_2 \leq x \leq x_3 \end{array} \right. #. A plot of the Lagrange (or Newton) polynomial, the spline, the linear interpolation (not required) superimposed on the 3 original data points is shown below. .. figure:: ../che3e4/Assignments/Assignment-4/images/q1_plot.png :scale: 45 :align: center The code used to generate this plot is similar to that shown on `course software tutorial <http://modelling3e4.connectmv.com/wiki/Software_tutorial/Creating_and_saving_plots>`_. #. The linear interpolation for :math:`\nu(p=40)` has slope = :math:`\displaystyle \frac{5.37 - 1.40}{60 - 20} = 0.09925` and intercept = :math:`1.40 - 0.09925 \times 20 = -0.585`. So :math:`\nu(p=40) = 0.09925 \times 40 - 0.585 = 3.385` millipascal. #. The question was a bit unclear; the intention was to compare all estimates of :math:`\nu(p=40)`: * Lagrange polynomial estimate = :math:`1.40 \cdot \frac{(40-60)(40-80)}{(20-60)(20-80)}+ 5.37 \cdot \frac{(40-20)(40-80)}{(60-20)(60-80)} + 17.4 \cdot \frac{(40-20)(40-60)}{(80-20)(80-60)} = 1.40(0.3333) + 5.37(1.0) + 17.4(-0.333) = 0.0366` millipascals. * Newton polynomial estimate = :math:`1.40 + 0.09925(40-20) + 0.008371(40-20)(40-60) = 0.0366` millipascals. * Cubic spline estimate = :math:`1.93 + 0.0579\times 40 - 6.28\times10^{-3} \times 40^2 + 1.05\times10^{-4} \times 40^3 = 0.918` millipascals. * Linear interpolation = 3.385 millipascal. All estimates had large error when compared to the true value of 2.51, because the viscosity is changing fairly rapidly in this neighbourhood. The estimates would be improved if there was another data point close by. The Lagrange/Newton estimates are especially poor, since they are negative in the range from approximately 30% to 40% purity. Question 5 [2] ======'"`UNIQ--h-8--QINU`"'========= The following data are collected from a bioreactor experiment, during the growth phase. ====='"`UNIQ--h-9--QINU`"'============= ===== ===== ===== ===== ===== Time [hours] 0 1.0 2.0 4.0 6.0 ------------------ ----- ----- ----- ----- ----- :math:`C` [g/L] 0.1 0.341 1.102 4.95 11.24 ====='"`UNIQ--h-10--QINU`"'============= ===== ===== ===== ===== ===== Fit a natural cubic spline for these data and use it to estimate the number of cells at time 3, 5, and 7 hours. Show your matrix derivation for the linear system of equations, and solve it using computer software. Plot the cubic spline over the time range 0 to 8 hours. Solution --------- The cubic spline through 5 data points consists of four cubic polynomials -- connecting points 1 and 2, 2 and 3, 3 and 4, and 4 and 5. .. math:: s_3(x) = \left\{ \begin{array}{l} a_1 + b_1x + c_1x^2 + d_1x^3 \qquad\,\,x_1 \leq x \leq x_2 \\ a_2 + b_2x + c_2x^2 + d_2x^3 \qquad\,\,x_2 \leq x \leq x_3\\ a_3 + b_3x + c_3x^2 + d_3x^3 \qquad\,\,x_3 \leq x \leq x_4\\ a_4 + b_4x + c_4x^2 + d_4x^3 \qquad\,\,x_4 \leq x \leq x_5 \end{array} \right. So we have 16 unknown coefficients in these 4 polynomials. The 16 linear equations that can be used to solve for the coefficients are: .. math:: \begin{array}{ll} \text{Starting knot} & s_3(x_1) = y_1 = a_1 + b_1x_1 + c_1x_1^2 + d_1x_1^3\\ \text{End knot} & s_3(x_5) = y_5 = a_2 + b_2x_5 + c_2x_5^2 + d_2x_5^3\\ \text{Continuity} & s_3(x_2) = \left\{ \begin{array}{l} y_2 = a_1 + b_1x_2 + c_1x_2^2 + d_1x_2^3 \\ y_2 = a_2 + b_2x_2 + c_2x_2^2 + d_2x_2^3 \end{array} \right.\\ \text{Continuity} & s_3(x_3) = \left\{ \begin{array}{l} y_3 = a_2 + b_2x_3 + c_2x_3^2 + d_2x_3^3 \\ y_3 = a_3 + b_3x_3 + c_3x_3^2 + d_3x_3^3 \end{array} \right.\\ \text{Continuity} & s_3(x_4) = \left\{ \begin{array}{l} y_4 = a_3 + b_3x_4 + c_3x_4^2 + d_3x_4^3 \\ y_4 = a_4 + b_4x_4 + c_4x_4^2 + d_4x_4^3 \end{array} \right.\\ \text{1st derivative continuity} & s'_3(x_2) = b_1 + 2c_1x_2 + 3d_1x_2^2 = b_2 + 2c_2x_2 + 3d_2x_2^2 \\ \text{1st derivative continuity} & s'_3(x_3) = b_2 + 2c_2x_3 + 3d_2x_3^2 = b_3 + 2c_3x_3 + 3d_3x_3^2 \\ \text{1st derivative continuity} & s'_3(x_4) = b_3 + 2c_3x_4 + 3d_3x_4^2 = b_4 + 2c_4x_4 + 3d_4x_4^2 \\ \text{2nd derivative continuity} & s''_3(x_2) = 2c_1 + 6d_1x_2 = 2c_2 + 6d_2x_2 \\ \text{2nd derivative continuity} & s''_3(x_3) = 2c_2 + 6d_2x_3 = 2c_3 + 6d_3x_3 \\ \text{2nd derivative continuity} & s''_3(x_4) = 2c_3 + 6d_3x_4 = 2c_4 + 6d_4x_4 \\ \text{$s_3(x)=0$ at the ends} & s_3(x_1) = 0 = 2c_1 + 6d_1x_1 \\ & s_3(x_5) = 0 = 2c_2 + 6d_2x_5 \end{array}

Expressing this as a linear system of equations, :math:`{\bf Xa = y}` (see the lecture notes for the matrix pattern):

.. math::

{\bf Xa} &= {\bf y} \\ \left( \begin{array}{cccccccccccccccc} 0 & 0 & 2 & 6x_1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & x_1 & x_1^2 & x_1^3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & x_2 & x_2^2 & x_2^3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 2x_2 & 3x_2^2 & 0 & -1 & -2x_2 & -3x_2^2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 6x_2 & 0 & 0 & -2 & -6x_2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & x_2 & x_2^2 & x_2^3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & x_3 & x_3^2 & x_3^3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 2x_3 & 3x_3^2 & 0 & -1 & -2x_3 & -3x_3^2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 2 & 6x_3 & 0 & 0 & -2 & -6x_3 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & x_3 & x_3^2 & x_3^3 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & x_4 & x_4^2 & x_4^3 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2x_4 & 3x_4^2 & 0 & -1 & -2x_4 & -3x_4^2 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 6x_4 & 0 & 0 & -2 & -6x_4 \\ 0 & 0 & 0 & 0 &0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & x_4 & x_4^2 & x_4^3 \\ 0 & 0 & 0 & 0 &0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & x_5 & x_5^2 & x_5^3 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 6x_5 \end{array}\right) \left( \begin{array}{c} a_1 \\ b_1 \\ c_1 \\ d_1 \\ a_2 \\ b_2 \\ c_2 \\ d_2 \\ a_3 \\ b_3 \\ c_3 \\ d_3 \\ a_4 \\ b_4 \\ c_4 \\ d_4 \end{array} \right) &= \left( \begin{array}{c} 0 \\ y_1 \\ y_2 \\ 0 \\ 0 \\ y_2 \\ y_3 \\ 0 \\ 0 \\ y_3 \\ y_4 \\ 0 \\ 0 \\ y_4 \\ y_5 \\ 0 \\ \end{array} \right) \\ \left( \begin{array}{cccccccccccccccc} 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 2 & 3 & 0 & -1 & -2 & -3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 6 & 0 & 0 & -2 & -6 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 2 & 4 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 4 & 12 & 0 & -1 & -4 & -12 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 2 & 12 & 0 & 0 & -2 & -12 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 2 & 4 & 8 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 4 & 16 & 64 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 8 & 48 & 0 & -1 & -8 & -48 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 24 & 0 & 0 & -2 & -24 \\ 0 & 0 & 0 & 0 &0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 4 & 16 & 64 \\ 0 & 0 & 0 & 0 &0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 6 & 36 & 216 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 36 \end{array}\right) \left( \begin{array}{c} a_1 \\ b_1 \\ c_1 \\ d_1 \\ a_2 \\ b_2 \\ c_2 \\ d_2 \\ a_3 \\ b_3 \\ c_3 \\ d_3 \\ a_4 \\ b_4 \\ c_4 \\ d_4 \end{array} \right) &= \left( \begin{array}{c} 0 \\ 0.1 \\ 0.341 \\ 0 \\ 0 \\ 0.341 \\ 1.102 \\ 0 \\ 0 \\ 1.102 \\ 4.95 \\ 0 \\ 0 \\ 4.95 \\ 11.24 \\ 0 \\ \end{array} \right) \\

Solving this system of equations with computer software gives:

.. math::

s_3(x) = \left\{ \begin{array}{l} 0.1000 + 0.1457x + 0\times x^2 + 0.0953x^3 \qquad\,\,0 \leq x \leq 1 \\ 0.1521 -0.0106x + 0.1562 x^2 + 0.0433x^3 \qquad\,\,1 \leq x \leq 2\\ 0.5809 -0.6537x + 0.4778x^2 -0.0103x^3 \qquad\,\,2 \leq x \leq 4\\ 3.6951 + -2.9895x + 1.0617x^2 + -0.0590x^3 \qquad\,\,4 \leq x \leq 6 \end{array} \right.

Now we can estimate the number of cells at time 3, 5, and 7 hours. All we need is to see which of the above cubic polynomials are appropriate for each point:

math:`2\leq x=3\leq 4\rightarrow` :math:`C(3)\approx 0.5809 -0.6537(3) + 0.4778(3)^2 -0.0103(3)^3=2.6411`
math:`4\leq x=5\leq 6\rightarrow` :math:`C(5)\approx 3.6951 + -2.9895(5) + 1.0617(5)^2 + -0.0590(5)^3=7.9180`

The last point 7 is not within the domain of interpolation for any of the intervals. So, we need to extrapolate the last polynomial (which is the closet to :math:`x=7`):

math:`C(7)\approx 3.6951 + -2.9895(7) + 1.0617(7)^2 + -0.0590(7)^3=14.5620`

The cubic spline is plotted below, followed by the code used to generate the estimate values.

.. figure:: ../che3e4/Assignments/Assignment-4/images/spline_q5.png :scale: 45 :align: center

    • Note**: Polynomial coefficients in MATLAB and Python are sorted from the highest order to the lowest one, which is in the reverse direction we wrote the above polynomials. So, make sure you flip the coefficients (in the decreasing order) when using these polynomials in software.

.. twocolumncode:: :code1: ../che3e4/Assignments/Assignment-4/code/cubic_spline_5pts.m :language1: matlab :header1: MATLAB code :code2: ../che3e4/Assignments/Assignment-4/code/cubic_spline_5pts_python.py :language2: python :header2: Python code

Bonus question [0.5]

========

Use the cubic spline from the previous question and find the time where the cell count was approximately 10.0 g/L. Do not solve the equation by hand, but investigate `MATLAB's <http://www.mathworks.com/help/techdoc/ref/roots.html>`_ or `Python's <http://docs.scipy.org/doc/numpy/reference/generated/numpy.roots.html>`_ polynomial root finding function: ``roots(...)``

Solution


The value of :math:`C=10` falls between 4.95 (at :math:`t=4`) and 11.24 (at :math:`t=6`) in the table of experimental data . So, we use the last spline polynomial that represents the points in this range:

math:`C(t)\approx 3.6951 + -2.9895(t) + 1.0617(t)^2 + -0.0590(t)^3=10.0 \rightarrow`
math:`C(t)\approx -6.3049 + -2.9895(t) + 1.0617(t)^2 + -0.0590(t)^3=0`

This is a simple polynomial root finding problem which can be solved using either

  • MATLAB: ``roots([-0.0590, 1.0617, -2.9895, -6.3049])``
  • Python: ``np.roots([-0.0590, 1.0617, -2.9895, -6.3049])``

Since it is a cubic function, three roots are returned: :math:`t_1 = 13.7417, t_2 = 5.6336, t_3 = -1.3804`.

In order to decide which one is the desired solution, we again look at the tabulated data. Since the :math:`C` value falls between 4.95 and 11.24, the corresponding :math:`t` value must also fall between 4.0 and 6.0. So, the desired root is :math:`t_2 = 5.6336`.

.. raw:: latex

\vspace{0.5cm} \hrule \begin{center}END\end{center}

</rst>