"It's tough to make predictions, especially about the future."
– Yogi Berra
Instructor: Brian Spiering
Contact: Slack @Brian Spiering (more preferred) | [email protected] (less preferred)
Office hours: Wednesdays 12n-1p in 522 & By Appointment
Grader: Sangyu Shen
Contact: Slack @michiko | [email protected]
Website: github.com/brianspiering/
intro-to-ml
Communciation: Slack #msds_621_2018
Location: 101 Howard, San Francisco, CA
Sections:
- Tuesdays & Thursdays at 10:00-11:55 in Room 154
 - Tuesdays & Thursdays at 1:10-3:00 in Room 154
 
This course focuses on the implementation and application of supervised and unsupervised machine learning algorithms using Python and related libraries. Students learn to properly select features and evaluate model accuracy. Models include at least kNN, naive Bayes, random forests, and clustering.
- Working knowledge of probability and statistics
 - Introductory knowledge of linear algebra (e.g., determinants and Singular Value Decomposition)
 - Intermediate level of Python (e.g., ability to create to classes)
 - No previous knowledge of machine learning required
 
By the end of the course, you should be able to:
- Apply fundamental machine learning models and methodology to solve real-world problems.
 - Write idiomatic Python code to model data. Primarily using the scikit-learn package. Occasionally implementing algorithms from scratch.
 - Define common machine learning terms and identify applied examples.
 - Explain common regression, classification and clustering algorithms.
 - Recognize when to and when not to apply machine learning algorithms.
 - Build end-to-end machine learning models to an answer meaningful Data Science questions.
 
- (10/18) Welcome ∧ What is ML? ∧ Data Science Workflow
 - (10/23) ML Workflow ∧ k-nearest neighbors (k-NN)
 - (10/25) Regression ∧ Regularization ∧ Bias-Variance
 - (10/30) Naive Bayes ∧ Evaluation Metrics
 - (11/01) Support Vector Machines (SVM) ∧ Kernels
 - (11/06) Information Theory ∧ Decision Trees I
 - (11/08) Decision Trees II
 - (11/13) Feature Engineering ∧ Cross-Validation ∧ Pipelines
 - (11/15) Ensemble Methods ∧ Random Forest I
 - (11/20) NO CLASS SESSION: Classes canceled due to smoke
 - (11/22) NO CLASS SESSION: Thanksgiving Holiday 🦃 🍗 😴
 - (11/27) Supervised ML Learning Potpourri
 - (11/29) Unsupervised Learning ∧ PCA
 - (12/04) K-Means Clustering
 - (12/07) Final Project Group Presentations
 
Possible additional session to make-up for missed session on 11/20 is to be determined
- Theory (no proofs 🙂)
 - Research (this is an applied course 🔨)
 - R programming language (Python only 🐍)
 - Data acquisition (assume tabular data 📋)
 - Visualization (just basic plotting with matplotlib and Seaborn 📊)
 - Optimization (assume that we have decent solver 📉)
 - Productizing models (let the Data Engineers do that 👷)
 - Distributing models (let AMZN and GOOGL do that for you 📈)
 - Bayesian approach (I wish we could… 😫)
 - Anomaly Detection (not enough time to get strange 👽)
 - Recommender Systems (wait for ML 2 ⌛)
 - Reinforcement Learning (we don't have time to play games 👾)
 - Ethics (not enough time to think about implications 🤔)
 - Algorithms
- Boosting
 - Neural Networks / Deep Learning
 - Graphical Models / Bayes Nets
 - Linear Discriminant Analysis (LDA)
 - Expectation–Maximization (EM)
 - Gaussian Mixture Models (GMM)
 - Advanced clustering:
- DBSCAN
 - Hierarchical
 - Mean-Shift
 
 
 
| Item | Weight | 
|---|---|
| Participation | 10% | 
| Quizzes | 30% | 
| Labs | 30% | 
| Final Project | 30% | 
I try to create an active learning environment in my classroom, which is incentivized with the Participation grade. Attendance is mandatory, you can't participate if you don't attend. It is the responsibility of the student to attend all classes. If you have to miss class, due to sickness or other circumstances, please notify your instructor by Slack in advance. Supporting documents (e.g., doctor’s notes) should accompany absences due to sickness.
Tardiness negatively impacts an active learning environment, thus will impact your participation grade.
You must show up to each session prepared. Each person is important to the dynamic of the class, and therefore students are required to participate in class activities. Expect to be "cold called". I call on students at random not to put you on the spot but to keep you engaged in the material at all times.
Weekly quizzes will be held every week (including first week) on Thursdays from 8:55 am to 9:45 am. They are intended to test your understanding of the material. This includes recent material and all material from previous classes.
Please use the restroom before the quiz. If you have to use the restroom, surrender your cellphone to the instructor before leaving the room.
There are 3 parts to each quiz session: individual, small-group, and class.
- Individually, each student will answered all the questions on the quiz.
 - In small-groups, teams of 3-4 will answer the same questions again, the goal is to reach consensus. This is an opportunity for peer-to-peer instruction which is often more effective than just hearing me prattle on!
 - As a class, we'll go over the answers to the questions. Taking time to resolve any remaining misunderstandings.
 
The labs will be hands-on activities. They will require a combination of coding and writing. The coding sections will be implementing algorithms from scratch or applying common libraries (e.g., scikit-learn). The writing sections will focus on communication to technical and nontechnical audiences.
The labs for each week, both Tuesday and Thursday, will be due on Sunday at 10 pm.
Late assignments will only be accepted for medical emergencies.
In lieu of a Final Exam, there will be a Final Project. Details in Final Project Folder.
| Grade | Final Percentage | 
|---|---|
| A | ≥ 98% | 
| A | ≥ 93% and < 98% | 
| A- | ≥ 90% and < 93% | 
| B+ | ≥ 87% and < 90% | 
| B | ≥ 83% and < 87% | 
| B- | ≥ 80% and < 83% | 
| C+ | ≥ 77% and < 80% | 
| C | ≥ 73% and < 77% | 
| C- | ≥ 70% and < 73% | 
| F | < 70% | 
Course grades range from "A" to "F". The MSDS program considers a grade of "A" to represent exceptional work with respect to both the instructor's expectations and peer student achievements. I consider an "A" grade to be above and beyond what most students achieve. A grade of "B" represents the expected outcome, what is called "competence" in a business setting. A "C" grade represents achievements lower than the instructor's expectations for competence in the subject. A grade of "F" represents little or no work in the course.
If you are a student with a disability or disabling condition, or if you think you may have a disability, please contact USF Student Disability Services (SDS) for information about accommodations.
All students are expected to behave in accordance with the Student Conduct Code and other University policies.
USF upholds the standards of honesty and integrity from all members of the academic community. All students are expected to know and adhere to the University's Honor Code.
CAPS provides confidential, free counseling to student members of our community.
For information and resources regarding sexual misconduct or assault visit the Title IX coordinator or USF's Callisto website.
