Santiago – Machine Learning School
Many people know how to train Machine Learning models.
Unfortunately, this is around 5% of the work required to build an end-to-end system.
This program will show you the other 95%.
Three things will happen when you finish this program:
You’ll have a solid understanding of most theoretical aspects concerning Machine Learning systems.
You’ll have experience building an end-to-end system using SageMaker. You’ll understand how to process data, train, tune, evaluate, deploy, and monitor models in a production environment. You’ll know a few tricks from somebody who spent many nights trying to figure these things out.
You’ll build connections with like-minded professionals working in the industry.
What You’ll Learn In Machine Learning School?
Session 1 – Production Machine Learning is Different
- What makes production machine learning different from what you’ve learned
- Unlearning what you think Machine Learning is and how to start thinking like an engineer
- Sampling strategies when collecting data. An introduction to nonprobability sampling, random sampling, stratified sampling, and weighted sampling
- Labeling strategies. An introduction to weak supervision, active learning, and the blessing of natural labels
- Building good features. An introduction to data imputation, standardization, and encoding
- The importance of splitting data and why you should always do it before transforming your data. How data leakage can destroy your models
- How to use pipelines to orchestrate machine learning workflows. Preparing a transformation pipeline, a training pipeline, and an inference pipeline
- A template architecture to solve some of the most critical aspects of any production machine learning system
- How to process incoming data automatically without having people work on weekends. Handling large amounts of data using Distributed Processing
- Understanding the SageMaker’s Processing Step and Processing Jobs
- A quick look into Data Wrangler for data preparation and feature engineering
Session 2 – Building a Model and the Training Pipeline
- Handling class imbalances and dealing with rare events. Choosing the right metric, sampling, cost-sensitive learning, and class weighting
- An introduction to data augmentation
- The first rule of Machine Learning Engineering and the reason you don’t want to use machine learning in the first place
- Building a model from simple heuristics to complex machine learning algorithms
- How to train a model when we can’t fit the data or the model in a single node. Distributed Training using Data and Model Parallelism
- Squeezing a bit more performance using Hyperparameter Tuning in a training pipeline
- How to track, recreate, and compare experiments. Tracking and versioning everything you need to go back in time
- Understanding SageMaker’s Training Step and Training Jobs
- Understanding SageMaker’s Tuning Step and Tuning Jobs
Session 3 – Model Evaluation and Versioning
- Why good models aren’t necessarily useful and useful models aren’t necessarily good
- Dealing with competing priorities when building machine learning systems. Decoupling objectives
- A different way to apply machine learning in the real world. Augmenting and creating instead of replacing
- Framing evaluation metrics to affect business performance
- Contextualizing evaluation metrics with a baseline. Human and random baselines, simple heuristics, and using existing systems for context
- Evaluating the robustness and fairness of a model. Techniques to identify biases
- Evaluating whether individual predictions are useful
- An introduction to backtests and how to use them to evaluate models
- The importance of versioning models
- Understanding SageMaker’s Model Registry
- Understanding SageMaker’s Condition Step
- Understanding SageMaker’s Model Step
Session 4 – Model Deployment and Inference Pipelines
- On-demand predictions versus batch inference. Understanding when to use each of them and how to combine them
- The disadvantages of batch inference and how to work around them
- Making models run fast. Model compression and an introduction to Quantization and Knowledge Distillation
- Deploying multiple models that work together. A comparison between dedicated and multi-model endpoints
- Designing an Inference Pipeline using the transformation pipeline we used to preprocess the data
- Understanding the SageMaker Lambda Step. A quick introduction to serverless functions
- The internal structure of a SageMaker Endpoint
- Customizing SageMaker models with a custom inference process
- Understanding SageMaker’s PipelineModel
Session 5 – Data Distribution Shifts and Model Monitoring
- An introduction to data distribution shifts
- Catastrophic predictions and the problem with edge cases
- Unintended feedback loops and how to work around them
- An introduction to covariate shift and concept drift. How can these changes happen
- How to identify data distribution shifts. An introduction to model monitoring
- How to respond to data distribution shifts. An introduction to defensive design, retraining, and the advantage of additional data
- Making the case for Continual Learning
- Understanding SageMaker’s Transform Step and Transform Jobs
- Understanding SageMaker’s QualityCheck Step
- Understanding SageMaker’s Data and Model Monitoring Jobs
Session 6 – Continual Learning and Testing in Production
- The importance of Continual Learning and why every company wants to be here
- The three main challenges when implementing Continual Learning
- A four-step plan to implement Continual Learning
- How to determine what data to use to retrain a model
- How frequently should we retrain a model
- Retraining strategies. Training from scratch and incremental training. Advantages and disadvantages
- Using offline evaluation and backtests during Continual Learning
- An introduction to Testing in Production
- Five strategies to test models in production. An introduction to A/B testing, Shadow deployments, Canary releases, Interleaving experiments, and Multi-armed bandits.
Sale Page: Santiago – Machine Learning School