STA 414/2104 Winter 2021:

Statistical Methods for Machine Learning II

This course introduces commonly used machine learning algorithms such as linear and logistic regression, random forests, decision trees, neural networks, support vector machines, boosting etc. It will also offer a broad view of model-building and optimization techniques that are based on probabilistic building blocks which will serve as a foundation for more advanced machine learning courses.

The first half of the course focuses on supervised learning. We begin with nearest neighbours, decision trees, and ensembles. Then we introduce parametric models, including linear regression, logistic and softmax regression, and neural networks. We then move on to unsupervised learning, focusing in particular on probabilistic models, but also principal components analysis and K-means. We will later consider matrix factorization, reinforcement learning, and conclude with algorithmic fairness. More details can be found in syllabus and piazza.

Announcements:

Final exam will be held on 4/20, at 9am EDT.
- Students taking the exam will be on the same zoom call during the exam (link to be shared on quercus). You will submit your work using crowdmark.
- Exam will be 150 mins long, which includes the time you need to scan and upload your work to crowdmark. If you run into technical difficulties with crowdmark, you may submit your solutions to sta414-2021-tas@cs.toronto.edu before the exam is officially over. Late submissions will receive 2 points per late min penalty (no exceptions).
- Exam covers all lectures (except week 13), it is closed book/internet. You can use two optional A4 aid sheets - double-sided. You are not responsible for the concepts introduced only in suggested readings. However, practicing those and solving the practice midterm questions would give you a significant advantage in the exam.
- A representative practice final exam is here. Solutions will be posted.

Instructors:

Prof	Murat A. Erdogdu
Email	sta414-2021-prof@cs.toronto.edu
Office hours	W 10-12 online

Teaching Assistants:

Yuehuan He, Mufan Li, Harsh Panchal, Lu Yu

Email: sta414-2021-tas@cs.toronto.edu

Time & Location:

Section	Room	Lecture time
414 L0101 & 2104 L9101	online	M 14-17
414 L5101 & 2104 L6101	online	Tu 18-21

Zoom links for each lecture will be sent through quercus every week.

Lectures and timeline

Week	Topics	Lectures	Suggested reading	Timeline
1	Introduction to ML & Least Squares	slides	PRML 1.1-3 preliminaries
2	Probabilistic Models	slides	PRML 2, 3.1
3	Regularization and Bayesian Methods	slides	PRML 3.1, 3.3	hw1 out
4	Linear Methods for Classification	slides	PRML 4.1-3
5	Optimization in ML & Decision Theory	slides	PRML 1.5, 3.2	hw1 due & hw2 out
6	Reading week (no class)
7	Neural Networks & Backpropagation	slides	notes on NNs & article	hw2 due
8	Midterm (in class)			midterm
9	Decision Trees, Ensembles, Support Vector Machines	slides	PRML 7.1 & 14.4	hw3 out
10	Unsupervised learning, Latent variable models, k-Means, EM algoritm	slides	PRML 9
11	PCA, Autoencoders, Recommender Systems	slides	PRML 12.1,2	hw3 due & hw4 out
12	Reinforcement Learning	slides	RL 3, 4.1, 4.4, 6.1-6.5
13	Algorithmic Fairness Final Exam Review	slides	Zemel et al & Hardt et al	hw4 due

Homeworks

Homework #	Out	Due	Materials	TA Office Hours
Homework 1 - V0	Jan 25, 00:30	Feb 08, 13:59	data	Th 12pm & F 1pm
Homework 2 - V1	Feb 6, 21:00	Feb 22, 13:59	code	Th 2pm & F 4pm
Homework 3 - V1	Mar 7, 21:00	Mar 22, 13:59	data	Th 12pm & F 9am
Homework 4 - V0	Mar 21, 23:30	Apr 5, 13:59	code	Th 1pm, F 1pm

Computing Resources

For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn. You have two options:

The easiest option is probably to install everything yourself on your own machine.
- If you don’t already have python, install it. We recommend using Anaconda. You can also install python directly if you know how.
- Use pip to install the required packages pip install scipy numpy autograd matplotlib jupyter sklearn