# Course outline

# Logistics

**Instructor**- Kevin Dunn, kevin.dunn@mcmaster.ca (no office on campus)
**Class time and location**- Friday afternoons from 14:00 to 17:00 (some classes will run from 16:00 to 19:00)
- We will meet in JHE342.
- First class starts on 9 September and will be a full 3-hour class.

# About the course

- Official description
- This course is based around multivariate latent variable models which assume low dimensional latent variable structures for the data. Multivariate statistical methods including Principal Component Analysis (PCA), and Projection to Latent Structures (PLS) are used for the efficient extraction of information from large databases, typically collected by on-line process computers. These models are used for the analysis of process problems, for on-line process monitoring, and for process improvement.
- Prerequisites
A `basic course in statistics is a

*definite requirement*. You must be comfortable with univariate distributions, data visualization, linear regression and process monitoring. However, these topics are covered again in this course, focusing on their practical application to engineering problems. An excellent understanding of matrix methods is required.Programming skills of any type (MATLAB, Python, R) is extremely desirable, as we will be manipulating (largish) datasets to extract information.

- Course materials
The course website will be permanently available: http://latent.connectmv.com

Course materials, readings, assignments, and datasets will be available from the website. Course announcements will be posted to the main page of the website - students are expected to check the website several times per week.

- Required textbook
There is no official course textbook. We will use the instructor's slides for the course. You will supplement these slides with notes from the class. The definitive reference sources will be a variety of journal articles that are listed on the literature website. The instructor will point out which readings correspond to each section of the course.

Other reference texts are generally available in Thode Library.

- Course software
- Software is obviously critical when dealing with data analysis. However this course will focus on the methods and in particular
*understanding*exactly what the methods are doing and*how to interpret*the results. This means you can pretty much use any software package in the future. We will use a variety of software packages in the course, but the main one will be ProSensus Multivariate which has a 1 year academic license. - Course outline
The course is divided into several sections, taught over 12 weeks. The exact outline is still to be announced, but these topics will be covered.

- Justification for using multivariate methods
- Large datasets and different ways to visualize them: sparklines, scatterplot matrices, histograms, box plots
- PCA: preprocessing, conceptual, geometric and algebraic interpretations
- Interpreting scores, loadings, SPE, \(T^2\) and contribution plots
- PCA: different ways to calculate the PCA model
- PCA: explaining variance and when to stop: scree plot, \(Q^2\), and other methods
- PCA applications: learning from data, troubleshooting, process improvement (e.g. early release of a manufactured product); incorporating first-principles models
- Regression modelling: OLS, MLR, PCR, introducing PLS
- PLS: calculating the model, interpreting weights, difference between loadings and weights; cautions regarding empirical models
- PLS applications: Monitoring with a PCA and PLS model; classification: PCA, PLS-DA, PLS
- Multiblock data sets (models from many data sources): process understanding, process monitoring
- Time series analysis (process dynamics via lagging) and batch data (how unfolding is just another form of lagging): feature extraction, alignment, missing data imputation
- Kernel methods for extremely large data sets; model updating with new data (adaptive modelling)
- The latent variable space: DOE's in the latent space, QSAR, principle properties
- Process control, product design and optimization in the latent variable space

# Grading

To assess your understanding of the course materials, the grading for the course will be:

Component | Fraction | Notes |
---|---|---|

Assignments | 20% | Expect around 4 to 6 assignments, to be completed individually |

Class participation | 10% | Discussion of assignments and assigned readings, questions and overall participation is required from all students. |

Final project | 70% | An in-class presentation and written project report. |

Policies regarding grading

- Readings will be assigned each week, and then discussed in class the following week.
- The final grades will be converted to letter grades using the Registrar's recommended procedure.
- Adjustment to the final grades may be done at the discretion of the instructor.

# Important notes

- Class participation
- Please bring a laptop to every class. Various software packages will be demonstrated during class time, and it will be to your advantage to following along with the instructor. The instructor will present the solutions to the assignments in the software (written solutions are not provided), so again it will be to your advantage to follow along with this.
- Out-of-class access
- Since the course instructor does not have an office on campus, office hours will be after the class time.
- Disclaimer
- The above outline
**may be modified**, as circumstances change.

# Academic integrity

You are expected to exhibit honesty and use ethical behaviour in all aspects of the learning process. Academic credentials you earn are rooted in principles of honesty and academic integrity.

Academic dishonesty is to knowingly act or fail to act in a way that results or could result in unearned academic credit or advantage. This behaviour can result in serious consequences, e.g. the grade of zero on an assignment, loss of credit with a notation on the transcript (notation reads: “Grade of F assigned for academic dishonesty”), and/or suspension or expulsion from the university.

It is your responsibility to understand what constitutes academic dishonesty. For information on the various types of academic dishonesty please refer to the Academic Integrity Policy, located at http://www.mcmaster.ca/academicintegrity

The following illustrates only three forms of academic dishonesty:

- Plagiarism, e.g. the submission of work that is not one’s own or for which other credit has been obtained.
- Improper collaboration in group work: this point is particularly important and will be strongly penalized in this course.
- Copying or using unauthorized aids in tests and examinations.