Hey, I’m Jabril and welcome to Crash Course AI! One way to make an artificial brain is by

creating a neural network, which can have millions of neurons and billions (or trillions)

of connections between them. Nowadays, some neural networks are fast and

big enough to do some tasks even better than humans can, like for example playing chess

or predicting the weather! But as we’ve talked about in Crash Course

AI, neural networks don’t just work on their own. They need to learn to solve problems by making

mistakes. Sounds kind of like us, right? INTRO Neural networks handle mistakes. using an algorithm called backpropagation

to make sure all the neurons that contributed to an error get their math adjusted, and we’ll

unpack this a bit later. And neural networks have two main parts: the

architecture and the weights. The architecture includes neurons and their

connections. And the weights are numbers that fine-tune

how the neurons do their math to get an output. So if a neural network makes a mistake, this

often means that the weights aren’t adjusted correctly and we need to update them so they

make better predictions next time. The task of finding the best weights for a

neural network architecture is called optimization. And the best way to understand some basic

principles of optimization is with an example with the help of my pal John Green Bot. Say that I manage a swimming pool, and I want

to predict how many people will come next week, so that I can schedule enough lifeguards. A simple way to do this is by graphing some

data points, like the number of swimmers and the temperature in fahrenheit for every day

over the past few weeks. Then, we can look for a pattern in that graph

to make predictions. A way computers do this is with an optimization

strategy called linear regression. We start by drawing a random straight line

on the graph, which kind of fits the data points. To optimize though, we need to know how incorrect

this guess is. So we calculate the distance between the line

and each of the data points, add it all up, and that gives us the error. We’re quantifying how big of a mistake we

made. The goal of linear regression is to adjust

the line to make the error as small as possible. We want the line to fit the training data

as much as it can. The result is called the line of best fit. We can use this straight line to predict how

many swimmers will show up for any temperature, but parts of it defy logic. For example, super cold days have a negative

number, while dangerously hot days have way more people than the pool can handle. To get more accurate results, we might want

to consider more than two features, like for example adding the humidity which would turn

our 2d graph into 3d. And our line of best fit would be more like

a plane of best fit. But if we added a fourth feature, like whether

it’s raining or not, suddenly we can’t visualize this anymore. So as we consider more features, we add more

dimensions to the graph, the optimization problem gets trickier, and fitting the training

data is tougher. This is where neural networks come in handy. Basically, by connecting together many simple

neurons with weights, a neural network can learn to solve complicated problems, where

the line of best fit becomes a weird multi-dimensional function. Let’s give John Green-bot an untrained neural

network. To stick with the same example, the input

layer of this neural network takes features like temperature, humidity, rain, and so on. And the output layer predicts the number of

swimmers that will come to the pool. We’re not going to worry about designing

the architecture of John Green-bot’s neural network right now. Let’s just focus on the weights. He’ll start, as always, by setting the weights

to random numbers, like the random line on the graph we drew earlier. Only this time, it’s not just one random

line. Because we have lots of inputs, it’s lots

of lines that are combined to make one big, messy function. Overall, this neural network’s function

resembles some weird multi-dimensional shape that we don’t really have a name for. To train this neural network, we’ll start

by giving John Green-bot a bunch of measurements from the past 10 days at the swimming pool,

because these are the days where we also know the output attendance. We’ll start with one day, where it was 80

degrees Fahrenheit, 65% humidity, and not raining (which we’ll represent with 0). The neurons will do their thing by multiplying

those features by the weights, adding the results together, and passing information

to the hidden layers until the output neuron has an answer. What do you think, John Green-bot? John Green-bot: 145 people were at the pool! Just like before, there is a difference between

the neural network’s output and the actual swimming pool attendance — which was recorded

as 100 people. Because we just have one output neuron, that

difference of 45 people is the error. Pretty simple. In some neural networks though, the output

layer may have a lot of neurons. So the difference between the predicted answer

and the correct answer is more than just one number. In these cases, the error is represented by

what’s known as a loss function. Moving forward, we need to adjust the neural

network’s weights so that the next time we give John Green-bot similar inputs, his

math and final output will be more accurate. Basically, we need John Green-bot to learn

from his mistakes, a lot like when we pushed a button to supervise his learning when he

had the perceptron program. But this is trickier because of how complicated

neural networks are. To help neural networks learn, scientists

and mathematicians came up with an algorithm called backpropagation of the error, or just

backpropagation. The basic goal is to look at the loss function

and then assign blame to neurons back in the previous layers of the network. Some neurons’ calculations may have been

more to blame for the error than others, so their weights will be adjusted more. This information is fed backwards, which is

where the idea of backpropagation comes from. So for example, the error from our output

neuron would go back a layer and adjust the weights that get applied to our hidden layer

neuron outputs. And the error from our hidden layer neurons

would go back a layer and adjust the weights that get applied to our features. Remember: our goal is to find the best combination

of weights to get the lowest error. To explain the logic behind optimization with

a metaphor, let’s send John Green Bot on a metaphorical journey through the Thought

Bubble. Let’s imagine that weights in our neural

network are like latitude and longitude coordinates on a map. And the error of our neural network is the

altitude — lower is better. John Green-bot the explorer is on a quest

to find the lowest point in the deepest valley. The latitude and longitude of that lowest

point — where the error is the smallest — are the weights of the neural network’s global

optimal solution. But John Green-bot has no idea where this

valley actually is. By randomly setting the initial weights of

our neural network, we’re basically dumping him in the middle of the jungle. All he knows is his current latitude, longitude,

and altitude. Maybe we got lucky and he’s on the side

of the deepest valley. But he could also be at the top of the highest

mountain far away. The only way to know is to explore! Because the jungle is so dense, it’s hard

to see very far. The best John Green-bot can do is look around

and make a guess. He notices that he can descend down a little

by moving northeast, so he takes a step down and updates his latitude and longitude. From this new position, he looks around and

picks another step that decreases his altitude a little more. And then another… and another. With every brave step, he updates his coordinates

and decreases his altitude. Eventually, John Green-bot looks around and

finds that he can’t go down anymore. He celebrates, because it seems like he found

the lowest point in the deepest valley! Or… so he thinks. If we look at the whole map, we can see that

John Green-bot only found the bottom of a small gorge when he ran out of “down.” It’s way better than where he started, but

it’s definitely not the lowest point of the deepest valley. So he just found a local optimal solution,

where the weights make the error relatively small, but not the smallest it could be. Sorry, buddy. Thanks, Thought Bubble. Backpropagation and learning always involves

lots of little steps, and optimization is tricky with any neural network. If we go back to our example of optimization

as exploring a metaphorical map, we’re never quite sure if we’re headed in the right

direction or if we’ve reached the lowest valley with the smallest error — again that’s

the global optimal solution. But tricks have been discovered to help us

better navigate. For example, when we drop an explorer somewhere

on the map, they could be really far from the lowest valley, with a giant mountain range

in the way. So it might be a good idea to try different

random starting points to be sure that the neural network isn’t getting stuck at a

locally optimal solution. Or instead of restarting over and over again,

we could have a team of explorers that start from different locations and explore the jungle

simultaneously. This strategy of exploring different solutions

at the same time on the same neural network is especially useful when you have a giant

computer with lots of processors. And we could even adjust the explorer’s

step size, so that they can step right over small hills as they try to find and descend

into a valley. This step size is called the learning rate,

and it’s how much the neuron weights get adjusted every time backpropagation happens. We’re always looking for more creative ways

to explore solutions, try different combinations of weights, and minimize the loss function

as we train neural networks. But even if we use a bunch of training data

and backpropagation to find the global optimal solution… we’re still only halfway done. The other half of training an AI is checking

whether the system can answer new questions. It’s easy to solve a problem we’ve seen

before, like taking a test after studying the answer key. We may get an A, but we didn’t actually

learn much. To really test what we’ve learned, we need

to solve problems we haven’t seen before. Same goes for neural networks. This whole time, John Green-bot has been training

his neural network with swimming pool data. His neural network has dozens of features

like temperature, humidity, rain, day of the week, and wind speed… but also grass length,

number of butterflies around the pool, and the average GPA of the lifeguards. More data can be better for finding patterns

and accuracy, as long as the computer can handle it! Over time, backpropagation will adjust the

neuron weights, so that neural network’s output matches the training data. Remember, that’s called fitting to the training

data, and with this complicated neural network, we’re looking for a multi-dimensional function. And sometimes, backpropagation is too good

at making a neural network fit to certain data. See, there are lots of coincidental relationships

in big datasets. Like for example, the divorce rate in Maine

may be correlated with U.S. margarine consumption, or skiing revenue may be correlated with the

number of people dying by getting trapped in their bedsheets. Neural networks are really good at finding

these kinds of relationships. And it can be a big problem, because if we

give a neural network some new data that doesn’t adhere to these silly correlations, then it

will probably make some strange errors. That’s a danger known as overfitting. The easiest way to prevent overfitting is

to keep the neural network simple. If we retrain John Green-bot’s swimming

pool program /without/ data like grass length and number of butterflies, and we observe

that our accuracy doesn’t change, then ignoring those features is best. So training a neural network isn’t just

a bunch of math! We need to consider how to best represent

our various problems as features in AI systems, and to think carefully about what mistakes

these programs might make. Next time, we’ll jump into our very first

lab of the course, where we’ll apply all this knowledge and build a neural network

together. Crash Course Ai is produced in association

with PBS Digital Studios. If you want to help keep Crash Course free

for everyone, forever, you can join our community on Patreon. And if you want to learn more about the math

of k-means clustering, check out this video from Crash Course Statistics.

I work in CS, and this is an accurate and concise way of describing neural nets

Do one on relativity n string theory

YASSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS

What happen to Forrest ai make it more it was good in your channel jebril??

Yeah but what if neurons go git blame-someone-else ?

Very helpful!

9:10 Now I finally fully understand how GPUs are used in machine learning

Great now my teachers gonna show us this video

AI is so fascinating and terrifying at the same time.

The subject is interesting, but honestly he doesnt feel like he actually knows what he is talking about, but much more like he just reads stuff other people wrote. It is a combination of his intonation, vocal rythm and (quite frankly overly) exaggerated body language – especially with his hands. Not a single motion or phrase is made without him moving his hands a lot and it is offputting.

I am sure he knows his stuff, but his demeanor doesnt support much of this notion.

*bagels exist

this guy: imma end this man's whole career

Episode 100 of CC AI: * John Green bot is solving climate change whilst baking 1000 cookies of all different flavors on his way to Pluto *

While this what a great explanation of what backpropigation is, I still feel like I don’t know how to actually do it.

Im a software engineer by trade and have to say I love this series thus far. Shows a great simplification of neural networks and the concepts that help them run. I cant wait to see how the lab is put together 😀

Very interesting, ty!

This is wonderful! I think I’m finally understanding this stuff! Great video!

Aha! I recognize Spurious Correlation's data when I see it

I unsubscribed from this channel over a year ago. So why am i suddenly re-subscribed? Imagine my confusion when I saw this video pop up in my subscription box… I've heard of other people mentioning things like this before but i always thought they were either lying or just not remembering correctly. Today i eat crow.

Good-bye again and hopefully this time it will stick.

I've been loving this course! Well done!

cool

Can't wait for the next episode! I love how this series makes this topic super accessible while not glimpsing over important information.

How much °C is 80°F?

Thank you for pointing out possible unlrelated correlation when working on big datasets.

Software Engineer here. I've already recommended this crash course to so many young engineers who want to get into ML and AI. Great series.

great video , what happen to Mr Green , was he terminated

"Hi, my name is Jabril."

Hi, my name is Gibran.

Our names have the same origin, the arabic name for the archangel Gabriel.

This is excellent. It reminds me of many of the concepts I learned as part of how WinBugs works when doing Network Meta Analysis. Thank you.

Just goes to show how important it is, to talk to industry SME's (Subject Matter Experts) when determining what's important. Anyone who has worked at a pool would tell you that determining whether a day is a weekend/school holiday or not, and whether the Swim school or Aqua classes are running (or not), will have a significant effect on customer numbers, and therefore on staffing requirements. Basically, if you're not reducing your error bars, maybe you've missed a significant factor.

Very educational very presented

The majority of this video is essentially about gradient descent, but without actually mentioning it…

PS: if you have a very large data set it can be faster to use stocastic gradient decent.

Educational!

Hey, this was a crazy simple explanation I fell in love with it. I am definitely going to follow and looking forward more such videos.

John green bot: I’ve finally mastered the human universe. Now u can call me 🤙 skynet

do a video about the spartans!

Commenting for more Melee on Crash Course

That look. 10:02.

So far this is the best explanation of a Neural Network I have seen. You guys did a good job.

Thanks for information to nureal network

Ai and 5g solve problems nobody had

I dont know why you guys making those videos but they are educative and easy to understand even by me.Thank you

I just can’t take it anymore. “Connections AMONG many neurons”. Not “between”.

Add-on fact: Computational optimization of molecular stability (i.e. in drug development) is done with a similar algorithm. The "plane" used in that case is entropy (or energy). Re-inserting energy into the system to watch it change molecular structure to a possibly lower energy level (higher stability) is a common step in that process.

Loving the John Green Bot in the jungle metaphor, you guys are doing an awesome job!

This is a great and simple introduction to some ot hef tricky parts in Neural Networks.

is it my speaker (and earphone) got problem or the sound really low? i could barely hear what he talking

for anyone interested I also recommend the machine learning series from 3blue1brown, covers many similar concepts in a different angle to help solidify the information!

I'm a college student learning this stuff in class and you have helped me so much; also the altitude metaphor is an amazing way of visualizing this thanks Jabrils!

I love that you use cassettes to programme Johngreenbot, when half your audience have probably never even come across a cassette!

Speaking my language really helped me understand this thanks haha

The over-fitting explanation is a bit under-fitted, but great video!

Hope you'll explain the bias-variance trade-off later on 🙂

10:04 when someone's checking you out

As an AI scientist, I'm echoing the comments from others: this was an excellent explanation of the basics of neural network training, without digging into the complexities of gradient descent. Great job, can't wait for the lab!

My AIs don’t make mistakes in training, they make epoch oopsies

Thumbs up if you change which swimming pool you go to based on the number of butterflies around the pool.

Love these videos. Ive done enough math to know about local vs global max/min so I was curious to see how john green bot would deal with that. 10 seconds later I HAD MY ANSWER.

I love this series, thanks a lot!

I feel like this guy is the most chill guy on crash course ever

ski causes death by bed sheets, just as vaccines cause autism, they're common facts bro (JOKING)

This isn't CrachCourse: Artificial intelligence, this isn't even CrashCourse: Machine learning, this is CrashCourse: Neural networks.

thx a lot, a lot inspirations. :^)

Great series!

A question: Do biases also get updated or just weights?

PS: love the series so far 🙂

When new episodes will be released , wil it be one by one oure a week whole

10:04 OMG Guys I think she likes me!!!

hey crash course, i just want to say something. just read this.

okay i think that you probably remember the start of your youtube channel. the subscriptions that won't increase just by one for months. but however you have achieved so many subscribers. i were there at that point. so i think now you can do a special favour. support me by subscribing. no one is subscribing. i think done my first leap. i got my first comment . but however i don't think that many other large youtubers will subscribe my tiny channel since they don't remember the start. please a one little subscriptionn

Will you be discussing bias in neural networks? This episode can bring up a lot of issues with unintentional bias.

I had high hopes for this series, but tbh, Im very disappointed.. way too diluted with John green it and poor storytelling analogies .. the material is exciting enough.. I will try and stick with it (no promises ) mostly because your computer science series was so fantastic

Can you do a crash course literature of Watchmen? I have to read it in my English class

Wow~ this is by far the most intuitive analogies and best graphics/animations I have seen to explain AI! Thank you!

i still don't really understand backwards propagation…. how does it know what weights to change?

10:35 “as long as the neural network can handle it”

I'm not used to seeing Jabril speaking and his lips moving.

Awesome video