| 22. Quadratic forms | blank notes | annotated notes |
| 23. Mean, variance, covariance | blank notes | annotated notes |
| 24. Data matrices | blank notes | annotated notes |
| 25. PCA: the first principal component | blank notes | annotated notes |
Homework 5 problems are posted here.
Homework solutions need to be submitted though Gradescope.
Note. The following notebook that explains some Python tools for linear algebra computations may be helpful in computing solutions for this assignment: eigenvectors_hw5.ipynb
What fields are these topics most important in?
Data analysis, mathematical modeling, optimization, machine learning etc.
- What other techniques besides PCA are there for dimensionality reduction?
- Is it a common practice to use multiple dimensional reduction techniques to enhance the overall effectiveness?
- Whats the advantage of finding the trace of a matrix?
- Could you give a practical example of how the first principle component helps us out?
How do we determine the definiteness of a quadratic form, and what implications does it have for the associated matrix?
Classification of a quadratic form depends on eigenvalues of the symmetric matrix defining the form. If all eigenvalues are positive, the form is positive definite etc. This fact gives a relationship between properties of matrices and properties of their quadratic forms.
How will we decide the number of principal components to retain?
The answer will depend on a particular situation - the data being analyzed and what one wants to do with the data. There are some rules that people sometime follow - e.g. that one should use enough components to capture at least 90% or 95% of the total variance. One can also plot eigenvalues of the covariance matrix and see if at some point they start decreasing less rapidly. Sometimes it is a matter of experimentation, using different numbers of principal components with some training data, and checking which number works best.
What is a covariance matrix?
The definiton is in the lecture notes 24, page 102.
- In lecture note 24 with the mention of a trace value. Would the trace of any matrix be the sum of values from the first number going down in a diagonal motion or does that vary depending on the matrix?
- Also, when finding the \(m_X\) value as shown in lecture note 23, are we always going to have to round down for the \(m_X\) value or do we just follow the traditional number rounding rule?
How to combine PCA with the graph and use in real life.
I am not very familiar with application of PCA to graphs, but apparently there are some. See, for example, this paper
- How does PCA determine the importance of each principal component in a dataset?
- Can PCA be used effectively on datasets where the variables have different units of measurement, and if so, what preprocessing steps might be necessary before applying PCA?
Is the value of the slope of the best fit line of covariance the actual value?
I am not sure if this is what you mean, but finding the first principal component of data is not the same as finding the best fit line in the linear regression sense.
On page 107, notes 25, when \(||u_1|| = 1\) and \(Var(Y) =u^T(1/N * A * A^T)u\), why \(Var(Y)\) is the largest if \(u\) is a unit eigenvector corresponding to the largest eigenvalue \(\lambda_1\) of \(C_A\)?
This follows from the constrained optimization of quadratic forms that I talked about previously. See lecture notes 22, page 94.