Image Synthesis From Text With Deep Learning | Two Minute Papers #116


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. This is what we have been waiting for. Earlier, we talked about a neural network
that was able to describe in a full sentence what we can see on an image. And it had done a damn good job at that. Then, we have talked about a technique that
did something really crazy, the exact opposite: we wrote a sentence, and it created new images
according to that. This is already incredible. And we can create an algorithm like this by
training not one, but two neural networks: The first is the generative network, that
creates millions of new images, and the discriminator network judges whether these are real or fake
images. The generative network can improve its game
based on the feedback and will create more and more realistic looking images, while the
discriminator network gets better and better at telling real images from fake ones. Like humans, this rivalry drives both neural
networks towards perfecting their crafts. This architecture is called a generative adversarial
network. It is also like the classical, evergoing arms
race between criminals who create counterfeit money and the government, which seeks to implement
newer and newer measures to tell a real hundred dollar bill from a fake one. The previous generative adversarial networks
were adept at creating new images, but due to their limitations, their image outputs
were the size of a stamp at best. And we were wondering, how long until we get
much higher resolution images from such a system? Well, I am delighted to say that apparently,
within the same year. In this work, a two-stage version of this
architecture is proposed. The stage 1 network is close to the generative
adversarial network we described. And most of the fun happens in the stage 2
network, that takes this rough, low resolution image and the text description and is told
to correct the defects of the previous output and create a higher resolution version of
it. In the video, the input text description and
the stage-1 results are shown, and building on that, the higher resolution stage-2 images
are presented. And the results are… unreal. There was a previous article and Two Minute
Papers episode on the unreasonable effectiveness of recurrent neural networks. If that is unreasonable effectiveness, then
what is this? The rate of progress in machine learning research
is unlike any other field I have ever seen. I honestly can’t believe what I am seeing
here. Dear Fellow Scholars, what you see might very
well be history in the making. Are there still faults in the results? Of course there are. Are they perfect? No, they certainly aren’t. However, research is all about progress and
it’s almost never possible to go from 0 to a 100% with one new revolutionary idea. However, I am sure that in 2017, researchers
will start working on generating full HD animations with an improved version of this architecture. Make sure to have a look at the paper, where
the ideas, challenges, and possible solutions are very clearly presented. And for now, I need some time to digest these
results. Currently I feel like being dropped into the
middle of a science-fiction movie. And, this one will be our last video for this
year. We have had an amazing year with some incredible
growth on the channel, way more of you Fellow Scholars decided to come with us on our journey
than I would have imagined. Thank you so much for being a part of Two
Minute Papers, we’ll be continuing full steam ahead next year, and for now, I wish you a
Merry Christmas and happy holidays. 2016 was an amazing year for research, and
2017 will be even better. Stay tuned! Thanks for watching and for your generous
support, and I’ll see you next time!

72 thoughts on “Image Synthesis From Text With Deep Learning | Two Minute Papers #116

  1. We have a Patreon post on the improvements you can expect from Two Minute Papers in 2017. Lots of goodies behind the link, have a look! https://www.patreon.com/posts/7607896

  2. Thanks! Need more information about the second stage. How it can generate high res images with lots of details?

  3. This seems unbelievable, I wonder how the images compare to the datasets, are there any similer photos I wonder?

  4. How soon until we can ask the computer for a version of Lord of the Rings featuring dogs and the algorithm spits it out?

    Man, the entertainment of the future will be weird. This is so cool!

  5. This looks amazing! Is it only trained on birds and flowers or were those just the examples that they used?

  6. Holy ****. This also is a bit scary considering how much easier it will be to use this to fool people and/or to create fake evidence. Fake news 2017 here we go. 😉

  7. Wow, those results are beautiful! I'm waiting for a long-time coherent version which you can feed entire books to and turn them into movies. I suspect that will have to wait for quite a bit longer though.

  8. "A black goat with brown beard is sodomized by a cute little squirrel with long fluffy tail"

  9. This is mind-blowing! Maybe we will see the historic problem of perception cracked in our lifetime 🙂 I wonder if this method can be used to extend previously published methods like conditional GANs

  10. Am I the only one who thought of the applications this could have for porn. If there's some porn you want, just tell the computer and it'll create it for you…

  11. Nice to see you can share your passion with so many people! keep up the amazing work

  12. "The ball is barely (if at all) visible on the image!"

    If the ball was missing, what else would you conclude a figure clad in a baseball uniform was doing in that posture? It's nothing to do with the ball.

  13. This is amazing. The concepts are so simple to understand. Thank you Two Minute Papers

  14. Thank you for this update. I believe this is a crucial step to create AI that simulate possible scenarios prior to tackling an unfamiliar problem the way humans do.

  15. This is great, however, I believe that algorithm should go beyond birds and flowers before having some real-life applications… For example, imagine gathering the concept art for movie scripts just by analyzing the script or related books/texts. For that, however, it would take too much data and time to train the neural nets so we won't be seeing such technology anytime soon, unless big companies such as Google step-in.

  16. 50 years from now it's just gonna be used to create 3 boobed woman gifs and you know it.

  17. I'd love to see a two minute paper about this topic: https://www.sciencedaily.com/releases/2017/01/170103122333.htm

  18. I wonder how much overfitting is in this (if you can call it that in this case?). In other words, how much does this act as a search engine for the training set?

  19. It seems like the only thing that actually matches the text description in generated pictures is body color.

  20. I think we got the main components to make an AI alive.
    When it can turn its dreams to words and pictures….

  21. I'd love to see how the generated images compare to training data.I've shown this video to my colleagues and some of my more skeptical colleagues think that the net might simply be somehow encoding training examples in the parameters and recalling them to generate the images. (of course the information in the training data is being captured in the network parameters, but I don't think it's in the sense that my colleagues are implying)

    I wonder if you could grab the top N images from the training set that match each generated image the closest. It seems like you could use something like the locality-sensitive hashing scheme described here: https://www.youtube.com/watch?v=AyzOUbkUf3M&t=35m40s

    EDIT: added clarification

  22. This is the beginning of Artificial Intelligence. Much like the room sized Gigabyte computers of days past now fitting Terabytes into your pocket; what renders images of birds today will conceptualize technologies human minds could never imagine tomorrow.

  23. Simply mind blowing to see how fast neural networks are advancing. I thought we were at least another 5 years away from this amount of detail in NN generated images.

  24. Soon enough you'll be able to convert a book into a movie with an algorithm. I hope to be still there when it happens so I can witness it.

  25. This is amazing!
    Question: (Not sure if you are already doing this but:) can you generate a picture of a non-existing bird just as easily as describing an existing species with the neural network finding a match basically? What are the limits, if you ask more unreal things like a bird with the head of a dog, will it work but just at a lesser quality (for now)?

  26. wait till we get this running creating augmented reality experiences in realtime!

  27. Only a matter of time till sudo gets an easter egg where "sudo make me a sandwich" generates realistic images of sandwiches.

  28. Thank you for all your machine learning videos so far. Check out this paper for image construction using recurrent networks. would be awesome if you made a video on it! arxiv.org/pdf/1502.04623.pdf

  29. @[email protected]

    One question, the examples in the paper are only birds and flowers. If the system is so awesome, I'd expect them to be willing to share more results. Are there important limitations? I don't want to read the paper (yet?), but am interested in how much this amazing AI really accomplished

  30. Finally!! I have been looking for a channel like this for ages…

  31. I have tried to express my excitement and concern to people about the AI like this, which have popped up in the last year. When I have shown folk the most recent advancements in AI, the common response is, "So I can do that too." Which not only shows that those folk don't get it, but makes it easy for me to say, "That's exactly the point! Now show me anything else on the planet besides you and this AI that can do that." Then they kind of get it, sort of. Then I tell them to look up Uber's recent investment in driverless cars and they still don't fully get it. I tell them that Uber is investing in a huge fleet of driverles cars, each car puts one a human out of work. Then they rattle off several sentences about how their job is safe. Then I ask them what is easier, driving a car across the country without crashing, or pulling groceries over a scanner in an automated supermarket and asking to be paid? What happens next is usually, "I don't want to talk about it." BUT WE RALLY NEED TO TALK ABOUT IT!

  32. Interesting stuff. Are there any online demo's of this? When I download the source code it turns out it depends on all sorts of stuff which depends on yet more stuff. Further more, most of these things such as Tensorflow and CUDA will not install or run properly. I have to search for workaround upon workaround until I have downloaded and installed all sorts of software and the demo still refuses to do anything but spit out error messages. Has any one else gotten the source code to work, properly?

  33. New paper called "Text-to-image Synthesis via Symmetrical Distillation Networks" has been released. Image synthesis has been improved by a lot!

Leave a Reply

Your email address will not be published. Required fields are marked *