Training Neural Networks: Crash Course AI #4

Hey, I’m Jabril and welcome to Crash Course AI! One way to make an artificial brain is by
creating a neural network, which can have millions of neurons and billions (or trillions)
of connections between them. Nowadays, some neural networks are fast and
big enough to do some tasks even better than humans can, like for example playing chess
or predicting the weather! But as we’ve talked about in Crash Course
AI, neural networks don’t just work on their own. They need to learn to solve problems by making
mistakes. Sounds kind of like us, right? INTRO Neural networks handle mistakes. using an algorithm called backpropagation
to make sure all the neurons that contributed to an error get their math adjusted, and we’ll
unpack this a bit later. And neural networks have two main parts: the
architecture and the weights. The architecture includes neurons and their
connections. And the weights are numbers that fine-tune
how the neurons do their math to get an output. So if a neural network makes a mistake, this
often means that the weights aren’t adjusted correctly and we need to update them so they
make better predictions next time. The task of finding the best weights for a
neural network architecture is called optimization. And the best way to understand some basic
principles of optimization is with an example with the help of my pal John Green Bot. Say that I manage a swimming pool, and I want
to predict how many people will come next week, so that I can schedule enough lifeguards. A simple way to do this is by graphing some
data points, like the number of swimmers and the temperature in fahrenheit for every day
over the past few weeks. Then, we can look for a pattern in that graph
to make predictions. A way computers do this is with an optimization
strategy called linear regression. We start by drawing a random straight line
on the graph, which kind of fits the data points. To optimize though, we need to know how incorrect
this guess is. So we calculate the distance between the line
and each of the data points, add it all up, and that gives us the error. We’re quantifying how big of a mistake we
made. The goal of linear regression is to adjust
the line to make the error as small as possible. We want the line to fit the training data
as much as it can. The result is called the line of best fit. We can use this straight line to predict how
many swimmers will show up for any temperature, but parts of it defy logic. For example, super cold days have a negative
number, while dangerously hot days have way more people than the pool can handle. To get more accurate results, we might want
to consider more than two features, like for example adding the humidity which would turn
our 2d graph into 3d. And our line of best fit would be more like
a plane of best fit. But if we added a fourth feature, like whether
it’s raining or not, suddenly we can’t visualize this anymore. So as we consider more features, we add more
dimensions to the graph, the optimization problem gets trickier, and fitting the training
data is tougher. This is where neural networks come in handy. Basically, by connecting together many simple
neurons with weights, a neural network can learn to solve complicated problems, where
the line of best fit becomes a weird multi-dimensional function. Let’s give John Green-bot an untrained neural
network. To stick with the same example, the input
layer of this neural network takes features like temperature, humidity, rain, and so on. And the output layer predicts the number of
swimmers that will come to the pool. We’re not going to worry about designing
the architecture of John Green-bot’s neural network right now. Let’s just focus on the weights. He’ll start, as always, by setting the weights
to random numbers, like the random line on the graph we drew earlier. Only this time, it’s not just one random
line. Because we have lots of inputs, it’s lots
of lines that are combined to make one big, messy function. Overall, this neural network’s function
resembles some weird multi-dimensional shape that we don’t really have a name for. To train this neural network, we’ll start
by giving John Green-bot a bunch of measurements from the past 10 days at the swimming pool,
because these are the days where we also know the output attendance. We’ll start with one day, where it was 80
degrees Fahrenheit, 65% humidity, and not raining (which we’ll represent with 0). The neurons will do their thing by multiplying
those features by the weights, adding the results together, and passing information
to the hidden layers until the output neuron has an answer. What do you think, John Green-bot? John Green-bot: 145 people were at the pool! Just like before, there is a difference between
the neural network’s output and the actual swimming pool attendance — which was recorded
as 100 people. Because we just have one output neuron, that
difference of 45 people is the error. Pretty simple. In some neural networks though, the output
layer may have a lot of neurons. So the difference between the predicted answer
and the correct answer is more than just one number. In these cases, the error is represented by
what’s known as a loss function. Moving forward, we need to adjust the neural
network’s weights so that the next time we give John Green-bot similar inputs, his
math and final output will be more accurate. Basically, we need John Green-bot to learn
from his mistakes, a lot like when we pushed a button to supervise his learning when he
had the perceptron program. But this is trickier because of how complicated
neural networks are. To help neural networks learn, scientists
and mathematicians came up with an algorithm called backpropagation of the error, or just
backpropagation. The basic goal is to look at the loss function
and then assign blame to neurons back in the previous layers of the network. Some neurons’ calculations may have been
more to blame for the error than others, so their weights will be adjusted more. This information is fed backwards, which is
where the idea of backpropagation comes from. So for example, the error from our output
neuron would go back a layer and adjust the weights that get applied to our hidden layer
neuron outputs. And the error from our hidden layer neurons
would go back a layer and adjust the weights that get applied to our features. Remember: our goal is to find the best combination
of weights to get the lowest error. To explain the logic behind optimization with
a metaphor, let’s send John Green Bot on a metaphorical journey through the Thought
Bubble. Let’s imagine that weights in our neural
network are like latitude and longitude coordinates on a map. And the error of our neural network is the
altitude — lower is better. John Green-bot the explorer is on a quest
to find the lowest point in the deepest valley. The latitude and longitude of that lowest
point — where the error is the smallest — are the weights of the neural network’s global
optimal solution. But John Green-bot has no idea where this
valley actually is. By randomly setting the initial weights of
our neural network, we’re basically dumping him in the middle of the jungle. All he knows is his current latitude, longitude,
and altitude. Maybe we got lucky and he’s on the side
of the deepest valley. But he could also be at the top of the highest
mountain far away. The only way to know is to explore! Because the jungle is so dense, it’s hard
to see very far. The best John Green-bot can do is look around
and make a guess. He notices that he can descend down a little
by moving northeast, so he takes a step down and updates his latitude and longitude. From this new position, he looks around and
picks another step that decreases his altitude a little more. And then another… and another. With every brave step, he updates his coordinates
and decreases his altitude. Eventually, John Green-bot looks around and
finds that he can’t go down anymore. He celebrates, because it seems like he found
the lowest point in the deepest valley! Or… so he thinks. If we look at the whole map, we can see that
John Green-bot only found the bottom of a small gorge when he ran out of “down.” It’s way better than where he started, but
it’s definitely not the lowest point of the deepest valley. So he just found a local optimal solution,
where the weights make the error relatively small, but not the smallest it could be. Sorry, buddy. Thanks, Thought Bubble. Backpropagation and learning always involves
lots of little steps, and optimization is tricky with any neural network. If we go back to our example of optimization
as exploring a metaphorical map, we’re never quite sure if we’re headed in the right
direction or if we’ve reached the lowest valley with the smallest error — again that’s
the global optimal solution. But tricks have been discovered to help us
better navigate. For example, when we drop an explorer somewhere
on the map, they could be really far from the lowest valley, with a giant mountain range
in the way. So it might be a good idea to try different
random starting points to be sure that the neural network isn’t getting stuck at a
locally optimal solution. Or instead of restarting over and over again,
we could have a team of explorers that start from different locations and explore the jungle
simultaneously. This strategy of exploring different solutions
at the same time on the same neural network is especially useful when you have a giant
computer with lots of processors. And we could even adjust the explorer’s
step size, so that they can step right over small hills as they try to find and descend
into a valley. This step size is called the learning rate,
and it’s how much the neuron weights get adjusted every time backpropagation happens. We’re always looking for more creative ways
to explore solutions, try different combinations of weights, and minimize the loss function
as we train neural networks. But even if we use a bunch of training data
and backpropagation to find the global optimal solution… we’re still only halfway done. The other half of training an AI is checking
whether the system can answer new questions. It’s easy to solve a problem we’ve seen
before, like taking a test after studying the answer key. We may get an A, but we didn’t actually
learn much. To really test what we’ve learned, we need
to solve problems we haven’t seen before. Same goes for neural networks. This whole time, John Green-bot has been training
his neural network with swimming pool data. His neural network has dozens of features
like temperature, humidity, rain, day of the week, and wind speed… but also grass length,
number of butterflies around the pool, and the average GPA of the lifeguards. More data can be better for finding patterns
and accuracy, as long as the computer can handle it! Over time, backpropagation will adjust the
neuron weights, so that neural network’s output matches the training data. Remember, that’s called fitting to the training
data, and with this complicated neural network, we’re looking for a multi-dimensional function. And sometimes, backpropagation is too good
at making a neural network fit to certain data. See, there are lots of coincidental relationships
in big datasets. Like for example, the divorce rate in Maine
may be correlated with U.S. margarine consumption, or skiing revenue may be correlated with the
number of people dying by getting trapped in their bedsheets. Neural networks are really good at finding
these kinds of relationships. And it can be a big problem, because if we
give a neural network some new data that doesn’t adhere to these silly correlations, then it
will probably make some strange errors. That’s a danger known as overfitting. The easiest way to prevent overfitting is
to keep the neural network simple. If we retrain John Green-bot’s swimming
pool program /without/ data like grass length and number of butterflies, and we observe
that our accuracy doesn’t change, then ignoring those features is best. So training a neural network isn’t just
a bunch of math! We need to consider how to best represent
our various problems as features in AI systems, and to think carefully about what mistakes
these programs might make. Next time, we’ll jump into our very first
lab of the course, where we’ll apply all this knowledge and build a neural network
together. Crash Course Ai is produced in association
with PBS Digital Studios. If you want to help keep Crash Course free
for everyone, forever, you can join our community on Patreon. And if you want to learn more about the math
of k-means clustering, check out this video from Crash Course Statistics.

73 thoughts on “Training Neural Networks: Crash Course AI #4

  1. I work in CS, and this is an accurate and concise way of describing neural nets

  2. What happen to Forrest ai make it more it was good in your channel jebril??

  3. The subject is interesting, but honestly he doesnt feel like he actually knows what he is talking about, but much more like he just reads stuff other people wrote. It is a combination of his intonation, vocal rythm and (quite frankly overly) exaggerated body language – especially with his hands. Not a single motion or phrase is made without him moving his hands a lot and it is offputting.

    I am sure he knows his stuff, but his demeanor doesnt support much of this notion.

  4. Episode 100 of CC AI: * John Green bot is solving climate change whilst baking 1000 cookies of all different flavors on his way to Pluto *

  5. While this what a great explanation of what backpropigation is, I still feel like I don’t know how to actually do it.

  6. Im a software engineer by trade and have to say I love this series thus far. Shows a great simplification of neural networks and the concepts that help them run. I cant wait to see how the lab is put together 😀

  7. This is wonderful! I think I’m finally understanding this stuff! Great video!

  8. I unsubscribed from this channel over a year ago. So why am i suddenly re-subscribed? Imagine my confusion when I saw this video pop up in my subscription box… I've heard of other people mentioning things like this before but i always thought they were either lying or just not remembering correctly. Today i eat crow.
    Good-bye again and hopefully this time it will stick.

  9. Can't wait for the next episode! I love how this series makes this topic super accessible while not glimpsing over important information.

  10. Thank you for pointing out possible unlrelated correlation when working on big datasets.

  11. Software Engineer here. I've already recommended this crash course to so many young engineers who want to get into ML and AI. Great series.

  12. "Hi, my name is Jabril."
    Hi, my name is Gibran.
    Our names have the same origin, the arabic name for the archangel Gabriel.

  13. This is excellent. It reminds me of many of the concepts I learned as part of how WinBugs works when doing Network Meta Analysis. Thank you.

  14. Just goes to show how important it is, to talk to industry SME's (Subject Matter Experts) when determining what's important. Anyone who has worked at a pool would tell you that determining whether a day is a weekend/school holiday or not, and whether the Swim school or Aqua classes are running (or not), will have a significant effect on customer numbers, and therefore on staffing requirements. Basically, if you're not reducing your error bars, maybe you've missed a significant factor.

  15. The majority of this video is essentially about gradient descent, but without actually mentioning it…

    PS: if you have a very large data set it can be faster to use stocastic gradient decent.

  16. Hey, this was a crazy simple explanation I fell in love with it. I am definitely going to follow and looking forward more such videos.

  17. John green bot: I’ve finally mastered the human universe. Now u can call me 🤙 skynet

  18. So far this is the best explanation of a Neural Network I have seen. You guys did a good job.

  19. I dont know why you guys making those videos but they are educative and easy to understand even by me.Thank you

  20. I just can’t take it anymore. “Connections AMONG many neurons”. Not “between”.

  21. Add-on fact: Computational optimization of molecular stability (i.e. in drug development) is done with a similar algorithm. The "plane" used in that case is entropy (or energy). Re-inserting energy into the system to watch it change molecular structure to a possibly lower energy level (higher stability) is a common step in that process.

  22. Loving the John Green Bot in the jungle metaphor, you guys are doing an awesome job!

  23. This is a great and simple introduction to some ot hef tricky parts in Neural Networks.

  24. is it my speaker (and earphone) got problem or the sound really low? i could barely hear what he talking

  25. for anyone interested I also recommend the machine learning series from 3blue1brown, covers many similar concepts in a different angle to help solidify the information!

  26. I'm a college student learning this stuff in class and you have helped me so much; also the altitude metaphor is an amazing way of visualizing this thanks Jabrils!

  27. I love that you use cassettes to programme Johngreenbot, when half your audience have probably never even come across a cassette!

  28. The over-fitting explanation is a bit under-fitted, but great video!
    Hope you'll explain the bias-variance trade-off later on 🙂

  29. As an AI scientist, I'm echoing the comments from others: this was an excellent explanation of the basics of neural network training, without digging into the complexities of gradient descent. Great job, can't wait for the lab!

  30. Thumbs up if you change which swimming pool you go to based on the number of butterflies around the pool.

  31. Love these videos. Ive done enough math to know about local vs global max/min so I was curious to see how john green bot would deal with that. 10 seconds later I HAD MY ANSWER.

  32. ski causes death by bed sheets, just as vaccines cause autism, they're common facts bro (JOKING)

  33. This isn't CrachCourse: Artificial intelligence, this isn't even CrashCourse: Machine learning, this is CrashCourse: Neural networks.

  34. A question: Do biases also get updated or just weights?

    PS: love the series so far 🙂

  35. hey crash course, i just want to say something. just read this.
    okay i think that you probably remember the start of your youtube channel. the subscriptions that won't increase just by one for months. but however you have achieved so many subscribers. i were there at that point. so i think now you can do a special favour. support me by subscribing. no one is subscribing. i think done my first leap. i got my first comment . but however i don't think that many other large youtubers will subscribe my tiny channel since they don't remember the start. please a one little subscriptionn

  36. Will you be discussing bias in neural networks? This episode can bring up a lot of issues with unintentional bias.

  37. I had high hopes for this series, but tbh, Im very disappointed.. way too diluted with John green it and poor storytelling analogies .. the material is exciting enough.. I will try and stick with it (no promises ) mostly because your computer science series was so fantastic

  38. Can you do a crash course literature of Watchmen? I have to read it in my English class

  39. Wow~ this is by far the most intuitive analogies and best graphics/animations I have seen to explain AI! Thank you!

  40. i still don't really understand backwards propagation…. how does it know what weights to change?

Leave a Reply

Your email address will not be published. Required fields are marked *