STA 414/2104 Winter 2021:

Statistical Methods for Machine Learning II

This course introduces commonly used machine learning algorithms such as linear and logistic regression, random forests, decision trees, neural networks, support vector machines, boosting etc. It will also offer a broad view of model-building and optimization techniques that are based on probabilistic building blocks which will serve as a foundation for more advanced machine learning courses.

The first half of the course focuses on supervised learning. We begin with nearest neighbours, decision trees, and ensembles. Then we introduce parametric models, including linear regression, logistic and softmax regression, and neural networks. We then move on to unsupervised learning, focusing in particular on probabilistic models, but also principal components analysis and K-means. We will later consider matrix factorization, reinforcement learning, and conclude with algorithmic fairness. More details can be found in syllabus and piazza.


Announcements:


Instructors:

Prof Murat A. Erdogdu
Email sta414-2021-prof@cs.toronto.edu
Office hours W 10-12 online

Teaching Assistants:

Yuehuan He, Mufan Li, Harsh Panchal, Lu Yu


Time & Location:

Section Room Lecture time
414 L0101 & 2104 L9101 online M 14-17
414 L5101 & 2104 L6101 online Tu 18-21

Zoom links for each lecture will be sent through quercus every week.


Suggested Reading

No required textbooks. Suggested reading will be posted after each lecture (See lectures below).


Lectures and timeline

Week Topics Lectures Suggested reading Timeline
1 Introduction to ML & Least Squares slides PRML 1.1-3
preliminaries
 
2 Probabilistic Models slides PRML 2, 3.1  
3 Regularization and Bayesian Methods slides PRML 3.1, 3.3 hw1 out
4 Linear Methods for Classification slides PRML 4.1-3  
5 Optimization in ML & Decision Theory slides PRML 1.5, 3.2 hw1 due & hw2 out
6 Reading week (no class)      
7 Neural Networks & Backpropagation slides notes on NNs & article hw2 due
8 Midterm (in class)     midterm
9 Decision Trees, Ensembles,
Support Vector Machines
slides PRML 7.1 & 14.4 hw3 out
10 Unsupervised learning,
Latent variable models, k-Means, EM algoritm
slides PRML 9  
11 PCA, Autoencoders,
Recommender Systems
slides PRML 12.1,2 hw3 due & hw4 out
12 Reinforcement Learning slides RL 3, 4.1, 4.4, 6.1-6.5  
13 Algorithmic Fairness
Final Exam Review
slides Zemel et al & Hardt et al hw4 due

Homeworks

Homework # Out Due Materials TA Office Hours
Homework 1 - V0 Jan 25, 00:30 Feb 08, 13:59 data Th 12pm & F 1pm
Homework 2 - V1 Feb 6, 21:00 Feb 22, 13:59 code Th 2pm & F 4pm
Homework 3 - V1 Mar 7, 21:00 Mar 22, 13:59 data Th 12pm & F 9am
Homework 4 - V0 Mar 21, 23:30 Apr 5, 13:59 code Th 1pm, F 1pm

Computing Resources

For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn. You have two options: