This course covers several topics in classical machine learning theory. We will try to answer questions like:

- What is the generalzization of a particular learning algorithm?
- How much data do you need to get good prediction results?
- What is the performance of your algorithm on test data?

Topics may include: Asymptotic statistics, Uniform Convergence, Generalization, Complexity measures, Kernel Methods, Online Learning, Sampling. More details can be found in syllabus. Please also sign up for Piazza.

This class requires a good informal knowledge of probability theory, linear algebra, real analysis (at least Masters level). Homework 0 is a good way to check your background.

- Final reports are due on April 9, 23:00.
- Project proposals are due on March 24, 11pm.
- Midterm exams will be held in class on March 15th.
- Students will be on the same zoom call during the exam (link to be shared on quercus). You will submit your work using crowdmark.
- Exam will be 110 mins long, which includes the time you need to scan and upload your work to crowdmark. If you run into technical difficulties with crowdmark, you may submit your solutions to csc2532ta@cs.toronto.edu before the exam is over. Late policy: 2 points/late minute (no exceptions).
- Exam covers all lectures (including 3/08), it is closed book/internet. You can use an optional A4 aid sheet - double-sided.

- HW3 is out and due on 3/12 23:00, to be submitted through crowdmark. TA OHs will be held on 3/10 3pm-4pm.
- There won’t be a class on the Family day 2/15.
- HW2 is out and due on 2/25 23:00, to be submitted through crowdmark. TA OHs will be held on 2/24 3pm-4pm.
- HW1 is out and due on 2/09 23:00, to be submitted through crowdmark. TA OHs will be held on 2/04 2pm-4pm.
- Zoom links for the lecture and office hours will be sent out through quercus every week.

- Email: csc2532prof@cs.toronto.edu
- Office hours: Tu 10:00-11:00 online

- Email: csc2532ta@cs.toronto.edu

Section | Room | Lecture time |
---|---|---|

L0101 | online | M 10-12 |

No required textbooks. Suggested reading will be posted after each lecture (See lectures below).

- (ESL) Hastie, Tibshirani, Friedman (2009) The Elements of Statistical Learning
- (ITIL) MacKay (2003) Information Theory, Inference, and Learning Algorithms
- (UML) Shalev-Shwartz, Ben-David (2014) Understanding Machine Learning: From Theory to Algorithms
- (HDP) Vershynin (2018) High Dimentional Probability
- (HDS) Wainwright (2019) High Dimentional Statistics

Week | Day | Topics and Lecture notes | Lectures | Timeline |
---|---|---|---|---|

1 | 1/11 | Introduction & Warm-up: Gaussian Mean Estimation | lecture 1 | syllabus |

2 | 1/18 | Exponential Families and Information Inequality | lecture 2 | - |

3 | 1/25 | Asymptotic statistics | lecture 3 | hw1 out |

4 | 2/01 | Uniform convergence & Generalization | lecture 4 | - |

5 | 2/08 | Covering with epsilon-nets | lecture 5 | hw1 due & hw2 out |

7 | 2/22 | Rademacher complexity I | lecture 6 | - |

8 | 3/01 | Rademacher complexity II | lecture 7 | hw2 due & hw3 out |

9 | 3/08 | Combinatorial Measures of Complexity | lecture 8 | - |

10 | 3/15 | Midterm (in class) | hw 3 due | |

11 | 3/22 | Chaining and Dudley’s theorem | lecture 9 | project proposal due |

12 | 3/29 | Kernel Methods I | lecture 11 | (lec10 is postponed) |

13 | 4/05 | Kernel Methods II | lecture 12 | Final reports due |

Homework # | Out | Due | TA Office Hours |
---|---|---|---|

Homework 0 - V0 | 1/11 | - | - |

Homework 1 - V0 | 1/23 | 2/09 23:00 | 2/04 2pm-4pm |

Homework 2 - V2 | 2/09 | 2/25 23:00 | 2/24 3pm-4pm |

Homework 3 - V1 | 3/01 | 3/12 23:00 | 3/10 3pm-4pm |

Latex template can be found here.

Your project goal is to read and write a comprehensive review of a theoretical machine learning paper, and understand the main building blocks.

**Project Inspiration:**
You can go through recent papers on COLT, NeurIPS, ICML, ICLR, JMLR to get project ideas and pick a paper to review. Several research directions from last year can be found here, but the list is by no means comprehensive. If you have suggestions, let me know.

Latex template for reports can be found here.

For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn.

- If you don’t already have python, install it. We recommend using Anaconda. You can also install python directly if you know how.
- Use pip to install the required packages
`pip install scipy numpy matplotlib jupyter sklearn`