Hello World – Machine Learning Recipes #1

[MUSIC PLAYING] Six lines of code
is all it takes to write your first
Machine Learning program. My name’s Josh
Gordon, and today I’ll walk you through writing Hello
World for Machine learning. In the first few
episodes of the series, we’ll teach you how to
get started with Machine Learning from scratch. To do that, we’ll work with
two open source libraries, scikit-learn and TensorFlow. We’ll see scikit in
action in a minute. But first, let’s talk quickly
about what Machine Learning is and why it’s important. You can think of Machine
Learning as a subfield of artificial intelligence. Early AI programs typically
excelled at just one thing. For example, Deep
Blue could play chess at a championship level,
but that’s all it could do. Today we want to
write one program that can solve many problems without
needing to be rewritten. AlphaGo is a great
example of that. As we speak, it’s competing
in the World Go Championship. But similar software can also
learn to play Atari games. Machine Learning is what
makes that possible. It’s the study of
algorithms that learn from examples
and experience instead of relying
on hard-coded rules. So that’s the state-of-the-art. But here’s a much
simpler example we’ll start coding up today. I’ll give you a problem
that sounds easy but is impossible to solve
without Machine Learning. Can you write code to
tell the difference between an apple and an orange? Imagine I asked you to write
a program that takes an image file as input,
does some analysis, and outputs the types of fruit. How can you solve this? You’d have to start by
writing lots of manual rules. For example, you
could write code to count how many orange pixels
there are and compare that to the number of green ones. The ratio should give you a
hint about the type of fruit. That works fine for
simple images like these. But as you dive deeper
into the problem, you’ll find the real world
is messy, and the rules you write start to break. How would you write code to
handle black-and-white photos or images with no apples
or oranges in them at all? In fact, for just about
any rule you write, I can find an image
where it won’t work. You’d need to write
tons of rules, and that’s just to tell the
difference between apples and oranges. If I gave you a new problem, you
need to start all over again. Clearly, we need
something better. To solve this, we
need an algorithm that can figure out
the rules for us, so we don’t have to
write them by hand. And for that, we’re going
to train a classifier. For now you can think of a
classifier as a function. It takes some data as input
and assigns a label to it as output. For example, I
could have a picture and want to classify it
as an apple or an orange. Or I have an email, and
I want to classify it as spam or not spam. The technique to
write the classifier automatically is called
supervised learning. It begins with examples of
the problem you want to solve. To code this up, we’ll
work with scikit-learn. Here, I’ll download and
install the library. There are a couple
different ways to do that. But for me, the easiest
has been to use Anaconda. This makes it easy to get
all the dependencies set up and works well cross-platform. With the magic of
video, I’ll fast forward through downloading
and installing it. Once it’s installed,
you can test that everything is
working properly by starting a Python script
and importing SK learn. Assuming that worked, that’s
line one of our program down, five to go. To use supervised
learning, we’ll follow a recipe with
a few standard steps. Step one is to
collect training data. These are examples of the
problem we want to solve. For our problem, we’re
going to write a function to classify a piece of fruit. For starters, it will take
a description of the fruit as input and
predict whether it’s an apple or an orange as
output, based on features like its weight and texture. To collect our
training data, imagine we head out to an orchard. We’ll look at different
apples and oranges and write down measurements
that describe them in a table. In Machine Learning
these measurements are called features. To keep things simple,
here we’ve used just two– how much each fruit weighs in
grams and its texture, which can be bumpy or smooth. A good feature makes
it easy to discriminate between different
types of fruit. Each row in our training
data is an example. It describes one piece of fruit. The last column is
called the label. It identifies what type
of fruit is in each row, and there are just
two possibilities– apples and oranges. The whole table is
our training data. Think of these as
all the examples we want the classifier
to learn from. The more training data you
have, the better a classifier you can create. Now let’s write down our
training data in code. We’ll use two variables–
features and labels. Features contains the
first two columns, and labels contains the last. You can think of
features as the input to the classifier and labels
as the output we want. I’m going to change the
variable types of all features to ints instead of strings,
so I’ll use 0 for bumpy and 1 for smooth. I’ll do the same for our
labels, so I’ll use 0 for apple and 1 for orange. These are lines two and
three in our program. Step two in our recipes to
use these examples to train a classifier. The type of classifier
we’ll start with is called a decision tree. We’ll dive into
the details of how these work in a future episode. But for now, it’s OK to think of
a classifier as a box of rules. That’s because there are many
different types of classifier, but the input and output
type is always the same. I’m going to import the tree. Then on line four of our script,
we’ll create the classifier. At this point, it’s just
an empty box of rules. It doesn’t know anything
about apples and oranges yet. To train it, we’ll need
a learning algorithm. If a classifier
is a box of rules, then you can think of
the learning algorithm as the procedure
that creates them. It does that by finding
patterns in your training data. For example, it might notice
oranges tend to weigh more, so it’ll create a rule saying
that the heavier fruit is, the more likely it
is to be an orange. In scikit, the
training algorithm is included in the classifier
object, and it’s called Fit. You can think of Fit as being
a synonym for “find patterns in data.” We’ll get into
the details of how this happens under the
hood in a future episode. At this point, we have
a trained classifier. So let’s take it for a spin and
use it to classify a new fruit. The input to the classifier is
the features for a new example. Let’s say the fruit
we want to classify is 150 grams and bumpy. The output will be 0 if it’s an
apple or 1 if it’s an orange. Before we hit Enter and see
what the classifier predicts, let’s think for a sec. If you had to guess, what would
you say the output should be? To figure that out, compare
this fruit to our training data. It looks like it’s
similar to an orange because it’s heavy and bumpy. That’s what I’d guess
anyway, and if we hit Enter, it’s what our classifier
predicts as well. If everything
worked for you, then that’s it for your first
Machine Learning program. You can create a new
classifier for a new problem just by changing
the training data. That makes this approach
far more reusable than writing new rules
for each problem. Now, you might be wondering
why we described our fruit using a table of features
instead of using pictures of the fruit as training data. Well, you can use
pictures, and we’ll get to that in a future episode. But, as you’ll see later
on, the way we did it here is more general. The neat thing is that
programming with Machine Learning isn’t hard. But to get it right,
you need to understand a few important concepts. I’ll start walking you through
those in the next few episodes. Thanks very much for watching,
and I’ll see you then. [MUSIC PLAYING]

100 thoughts on “Hello World – Machine Learning Recipes #1

  1. I had installed anaconda and and other packages with it and when I tried the example got this as output sir

    C:Userspikachu>python H:py.py

    File "H:py.py", line 6

    print clf.predict([[150, 0]])


    SyntaxError: invalid syntax

    my code

    from sklearn import tree

    features =[[140,1],[130,1],[170,0],[150,0]]

    labels =[0,0,1,1]

    clf =tree.DecisionTreeClassifier()

    clf =clf.fit(features,labels)

    print clf.predict([[150, 0]])
    please help me

  2. I have completed Coursera, udemy machine learning A-Z. Udacity course.
    But this is a
    Best getting started I have ever seen for a developer.

  3. this is how i did it and its working
    but make sure u are using python 3
    and also have installed scikit-learn

    import sklearn

    from sklearn import tree

    features = [[140,1],[130,1],[150,0],[170,0]]

    labels = [0,0,1,1]

    clf = tree.DecisionTreeClassifier()

    clf = clf.fit(features, labels)

    [x]=(clf.predict(X = [[150, 1]]))

    if [x]==1:


    else :


  4. so with this code can i make a tiny robot project which can differentiate between fruits?

  5. I'm a mathematician and proper ML is too difficult . No way I'm going to learn the inner nature of the perceptor, neural layers and the stochastic models for the approximators and classifiers. Good luck learning the core of the inference behind GANs and convolutional models without years of intense study.

  6. It would be fun to program an AI which can play LoL and learn from every game from scratch.I dont know anything about self learning AIs my knowledge is like how to program on java a tree, a house and how to get them on another position or change their colour and i know how to start a database that's , literally , it.So my question is : Is it even realistic or is it something i need to go to a university for?

  7. Brilliant way to number your videos — with Roman numbers (of an irregular number of digits) tacked on at the right-hand end of the title.

  8. Hey you might want to show what to do when you download anaconda because when I did it didn’t show up nor did anything else work

  9. I was thinking of Machine learning and boom here it is! What kind of sorcery is this Google?

  10. Why do we pass 2D array in this line instead of 1D array? – clf.predict([[150, 0]])

  11. #version: 3.7.2

    from sklearn import tree

    features = [[140, 1], [130, 1], [150, 0], [170, 0]]

    labels = ['Orange', 'Orange', 'Apple', 'Apple']

    clf = tree.DecisionTreeClassifier()

    clf = clf.fit(features, labels)

    print clf.predict([[145,1]])

  12. 6:40 – The text got blurred, the ai doesnt want people to learn to code them lol

  13. Hey this is awesome! =)

    Oh by the way, is it ok if I can also seek your advice in this open source android app I have posted below? Just need some feedback about it…

    Kindly search ' pub:Path Ahead ' in Play Store (P & A are case sensitive).

    give_thanks you !

  14. In the video he told that : "The more training data you have – the better", however what about overfitting?

  15. -Don't think; let the machine do it for you!
    -Thanks "They Live-movie"-Cow.

  16. Problem: every time i type import sklearn or from sklearn imort tree, it gives me a unresolved error. Please help.

  17. Wow, google developers that use a mac book. How do I take your video seriously after this?

  18. Fire whoever used a red apple in the green and orange pixel example.

  19. code in the video didnt work for me. it shows msg like this >
    AttributeError: module 'sklearn.tree' has no attribute 'DecisionTreeClassifier'

    after that i make some changes and its work!
    from sklearn.tree import DecisionTreeClassifier
    clf = DecisionTreeClassifier()

    #weight in grams
    #0 – bumby 1 – smooth
    labels = [0,0,1,1]
    # 1 – org 0 – apples

    clf = clf.fit(features,labels)
    print (clf.predict([[130,1]]))

  20. No, the easiest way to download any python library is to use Pycharm.

  21. But let me know what decision it makes if the data input is out of training data, <100,bumpy what decision it takes? thats why we study machine learning else i could have satisfied with c program itself atleast DOS

  22. Awesome! Just wrote a bot for 2048 which learns from me (And I suck lol) using the sklearn toolkit to predict the best move 🙂

  23. Can somebody tell me the best source to learn machine learning….Please provide its link too i would be greatful

  24. Doesn't make sense. Nothing was stored in memory or a database. So how does the "machine" retain information in order to learn from the previous guesses?

  25. would you recommend making a graph like such when making machine learning?

  26. if it says python 3.7 (32-bit) on windows is that the same as hello-world.py on mac?

  27. Great video but just two things i wanna point out:
    1. While technically not wrong, when you labeled bumpy as 0 but orange as 1, then smooth as 1 but apple as 0. Idk just since you want them to correlate makes more sense to have them 0:0 1:1
    2. The example you wanted it yo predict was already one of the small sample size so it didn't really show its capabilities as well as it could've.
    Nice vid regardless though.
    edit few missed words :p

  28. Excuse me! For the training statement, "clf = clf.fit(features, labels)", is the assignment required? I see in the later recipes, the training statement is simply "clf.fit(…, …)" without assignment. Could you please help? Thanks!

  29. can someone tell me , what is the IDE he is using there !!!

  30. Hello little dude, ML is not a recipe. Stop confusing people, this is a disgrace to the field.

  31. what is the difference between data mining and machine learning?

  32. Build your first music recommendation system model. Feel free to fork and star this project: github.com/rkat

  33. Well done sir. Thanks for the help. I really appreciate it and considering I'm understanding this at a young age (12) tells me that other people should understand it.

  34. cant import sklearn on windows i have everything installed why?

  35. I feel like you skipped the billion steps it takes to open your magical python file. I'm on a windows, but how did you go from Anaconda to a normal python file?

  36. help, i keep getting this error

    Traceback (most recent call last):
    File "C:UserskaiserDesktopmlscript.py", line 1, in <module>
    from sklearn import tree
    File "C:UserskaiserAnaconda3libsite-packagessklearn__init__.py", line 76, in <module>
    from .base import clone
    File "C:UserskaiserAnaconda3libsite-packagessklearnbase.py", line 13, in <module>
    import numpy as np
    File "C:UserskaiserAnaconda3libsite-packagesnumpy__init__.py", line 140, in <module>
    from . import _distributor_init
    File "C:UserskaiserAnaconda3libsite-packagesnumpy_distributor_init.py", line 34, in <module>
    from . import _mklinit
    ImportError: DLL load failed: The specified module could not be found.


    this is my code

    from sklearn import tree

    info = [31, 21, 40, 71, 60, 80]
    labels = [1, 1, 1, 0, 0, 0]
    clf = tree.DecisionTreeClassifier()
    clf = clf.fit(info, labels)
    print (clf.predict([27]))

  37. What is the System.out.println(“hello world”) of machine learning?

  38. I was trying to follow this project on my Raspberry Pi 2 Raspbian OS but couldn't even install scikit-learn. Any suggestions? I tried a few forums but couldn't find a solution that worked.

  39. 6
    Of Code
    And 6
    Of Video

    Did you see my comment was 7-1 lines a second ago????

  40. Does Transfer learning comes in Machine Learning or Deep learning? Neural networks come in Deep learning and transfer learning uses
    Neural networks..ugh I'm confused 🙆🙆🙆 please help..

Leave a Reply

Your email address will not be published. Required fields are marked *