In [8]:

import youtube_dl
ydl = youtube_dl.YoutubeDL({
    'ignoreerrors': True,
    'quiet': True
})
url = "https://www.youtube.com/playlist?list=PLfYUBJiXbdtSyktd8A_x0JNd6lxDcZE96"

In [9]:

with ydl:
    r = ydl.extract_info(url, download=False)  # don't download, much faster

In [12]:

r.keys()

Out[12]:

dict_keys(['_type', 'entries', 'id', 'title', 'uploader', 'uploader_id', 'uploader_url', 'extractor', 'webpage_url', 'webpage_url_basename', 'extractor_key'])

In [13]:

r['title']

Out[13]:

'Introduction to Machine Learning for Coders'

In [25]:

for i, e in enumerate(r['entries']):
    print(''.join(['-'] * 80))
    print(e['id'])
    print(e['title'])
    print(e['description'])

--------------------------------------------------------------------------------
CzdWqFTmn0Y
Intro to Machine Learning: Lesson 1
Introduction to Random Forests.
Welcome to Introduction to Machine Learning for Coders! Lesson 1 will show you how to create a "random forest" - perhaps the most widely applicable machine learning model - to create a solution to the "Bull Book for Bulldozers" Kaggle competition, which will get you in to the top 25% on the leaderboard. You'll learn how to use a Jupyter Notebook to build and analyze models, how to download data, and other basic skills you need to get started with machine leraning in practice.
--------------------------------------------------------------------------------
blyXCk4sgEg
Intro to Machine Learning: Lesson 2
Random Forest Deep Dive.
Today we start by learning about metrics, loss functions, and (perhaps the most important machine learning concept) overfitting. We discuss using validation and test sets to help us measure overfitting.

Then we'll learn how random forests work - first, by looking at the individual trees that make them up, then by learning about "bagging", the simple trick that lets a random forest be much more accurate than any individual tree.

Next up, we look at some helpful tricks that random forests support for making them faster, and more accurate.
--------------------------------------------------------------------------------
YSFG_W8JxBo
Intro to Machine Learning: Lesson 3
Today we'll see how to read a much larger dataset - one which may not even fit in the RAM on your machine! And we'll also learn how to create a random forest for that dataset. We also discuss the software engineering concept of "profiling", to learn how to speed up our code if it's not fast enough - especially useful for these big datasets.

Next, we do a deeper dive in to validation sets, and discuss what makes a good validation set, and we use that discussion to pick a validation set for this new data.

In the second half of this lesson, we look at "model interpretation" - the critically important skill of using your model to better understand your data. Today's focus for interpretation is the "feature importance plot", which is perhaps the most useful model interpretation technique.
--------------------------------------------------------------------------------
0v93qHDqq_g
Intro to Machine Learning: Lesson 4
Today we do a deep dive in to feature importance, including ways to make your importance plots more informative, how to use them to prune your feature space, and the use of a "dendrogram" to understand feature relationships.

In the second half of the lesson we'll learn about two more really important interpretation techniques: partial dependence plots, and the "tree interpreter".
--------------------------------------------------------------------------------
3jl2h9hSRvc
Intro to Machine Learning: Lesson 5
In today's lesson we start by learning more about the "tree interpreter", including the use of "waterfall charts" to analyze their output.

Next up, we look into the subtle but important issue of extrapolation. This is the weak point of random forests - they can't predict values outside the range of the input data. We study ways to identify when this problem happens, and how to deal with it.

In the second half of this lesson, we start writing our very own random forest from scratch!
--------------------------------------------------------------------------------
BFIYUvBRTpE
Intro to Machine Learning: Lesson 6
In the first half of today's lesson we'll learn about how to create "data products" based on machine learning models, based on "The Drivetrain Method", and in particular how model interpretation is an important part of this approach.

Next up, we'll explore the issue of extrapolation more deeply, using a Live Coding approach - we'll also take this opportunity to learn a couple of handy numpy tricks.
--------------------------------------------------------------------------------
O5F9vR2CNYI
Intro to Machine Learning: Lesson 7
Today we'll finish off our "from scratch" random forest interpretation! We'll also briefly look at the amazing "cython" library that you can use to get the same speed as C code with minimal changes to your python code.

Then we'll start on the next stage of our journey - gradient descent based methods such as logistic regression and neural networks...
--------------------------------------------------------------------------------
DzE0eSdy5Hk
Machine Learning 1: Lesson 8
Today we start the second half of the course - we're moving from decision tree based approaches like random forests, to gradient descent based approaches like deep learning.

Our first step in this journey will be to use Pytorch to help us implement logistic regression from scratch. We'll be building a model for the classic MNIST dataset of hand-written digits.
--------------------------------------------------------------------------------
PGC0UxakTvM
Machine Learning 1: Lesson 9
Today we continue building our logistic regression from scratch, and we add the most important feature to it: regularization. We'll learn about L1 vs L2 regularization, and how they can be implemented.

We also talk more about how learning rates work, and how to pick one for your problem.

In the second half of the lesson, we start our discussion of natural language processing (NLP). We'll build a "bag of words" representation of the popular IMDb text dataset, using sparse matrices to ensure good performance and reasonable memory use.

We'll build a number of models from this, including naive bayes and logistic regression, and will improve these models by adding ngram features.
--------------------------------------------------------------------------------
37sFIak42Sc
Machine Learning 1: Lesson 10
In today's lesson we'll further develop our NLP model by combining the strengths of naive bayes and logistic regression together, creating the hybrid "NB-SVM" model, which is a very strong baseline for text classification.

To do this, we'll create a new `nn.Module` class in pytorch, and look at what it's doing behind the scenes.

In the second half of the lesson we'll start our study of tabular and relational data using deep learning, by looking at the "Rossmann" Kaggle competition dataset. Today, we'll start down the feature engineering path on this interesting dataset.

We'll look at continuous vs categorical variables, and what kinds of feature engineering can be done for each, with a particular focus on using embedding matrices for categorical variables.
--------------------------------------------------------------------------------
XJ_waZlJU8g
Machine Learning 1: Lesson 11
Today, after a review of the math behind naive bayes, we'll do a deep dive into embeddings - both as used for categorical variables in tabular data, and as used for words in NLP.
--------------------------------------------------------------------------------
5_xFdhfUnvQ
Machine Learning 1: Lesson 12
In the first half of today's class we'll put everything we've learned together to create a complete model for the Rossmann dataset, including both categorical and continuous features, and careful feature engineering for all columns.

In the second half of the class we'll study some ethical issues that arise when implementing machine learning models, and we'll see why they should matter to practitioners, and ways of thinking about them. Many students have told us they found this the most important part of the course!

In [21]:

e.keys()

Out[21]:

dict_keys(['id', 'uploader', 'uploader_id', 'uploader_url', 'channel_id', 'channel_url', 'upload_date', 'license', 'creator', 'title', 'alt_title', 'thumbnail', 'description', 'categories', 'tags', 'subtitles', 'automatic_captions', 'duration', 'age_limit', 'annotations', 'chapters', 'webpage_url', 'view_count', 'like_count', 'dislike_count', 'average_rating', 'formats', 'is_live', 'start_time', 'end_time', 'series', 'season_number', 'episode_number', 'track', 'artist', 'extractor', 'webpage_url_basename', 'extractor_key', 'n_entries', 'playlist', 'playlist_id', 'playlist_title', 'playlist_uploader', 'playlist_uploader_id', 'playlist_index', 'thumbnails', 'display_id', 'requested_subtitles', 'requested_formats', 'format', 'format_id', 'width', 'height', 'resolution', 'fps', 'vcodec', 'vbr', 'stretched_ratio', 'acodec', 'abr', 'ext'])

In [ ]: