RubyConf 2018 – Make Ruby Write Your Code for You by Alex Stephen

(upbeat music) – Hi everyone, my name is Alex Stephen. I go by @rambleraptor
throughout the interwebs. And today I’m going to talk to you about how you can make Ruby
write your code for you. So some quick background about myself. I work at Google, I’m
apart of Google Cloud and for those of you that don’t know much about Google
Cloud, it does a lot. You can make virtual machines, you can make container clusters, and you can make load balancers and databases and all this stuff. And my team in particular does open source integrations. So we look at open source tools, like Puppet or Shuf or
Ansible or Terraform. And we try to give those tools, add features to those tools, so that you can use them to create virtual machines and sequel databases, all the different things. And so just kind of way back when, when I first started on the team, we added support for virtual machines over to Puppet and Shuf. And then we followed that up with I believe, load balancers And by the time we got to the third or fourth feature
that we are adding in, we said hey, if you kind of take the code for those different things,
for those different features, and put them next to each other and kind of squint at them a little bit, they basically all look the exact same. Like yeah, they’re a little bit different. They’re trying to do different things but they’re interacting
with very similar APIs, they’re doing it in very similar manners, like all of this is, it looks almost the exact same. Why are we writing all of this code over and over again? Especially when we’re anticipating making 50 or 60 different features. And so we actually went about
building a code generator. Code generator called Magic Modules. It’s on GitHub, it’s open sourced, And Magic Modules is a
Ruby based code generator that takes in some
information about Google Cloud and it spits out actual Ruby code that we develop, that
we contribute upstream to these tools that they
can interact with GCP. And we’ve since extended it, so now it spits out some Python code that we ship off to Ansible and some Go code that we
ship off to Terraform. And so code generation, like writing code that in turn writes other code is something that I’ve been doing now for a really long time. And so today, I’m gonna kinda tell you more about the ideas that, the ideas that we use in order to create Magic Modules, how approachable this whole
code generation process is, and when or when not, when or times that you should not actually use this process when you’re developing your own code. So I’ve used this word
auto-generation a lot. It sounds super scary. I said I was writing Ruby
that outputs Python and Go, and a lot of you must be like what is going on here,
this sounds absolutely, absolutely wacky. So auto-generation, big word. And what this means is that computers are writing your code for you. And this still, this doesn’t
sound very approachable. This sounds very black boxy, it sounds kind of like
the Matrix is happening. It makes you sound very powerless, like nothing that you do actually matters and so I’m going to reframe this idea of auto-generation then. And I’m gonna reframe it in terms of what you’re actually doing throughout this auto-generation process. And so really when you
are auto-generating code, when you want your computer
to write code for you, you’re telling the computer exactly what you want it to write. And you’re telling the computer exactly how you want the
computer to write it. And the computer is going to go ahead and it’s actually going
to perform those actions, it’s gonna write what you want exactly where you want it. And so hopefully this feels
a little less black box, this feels more like yeah, I know kung fu, this is just another tool
in my awesome Ruby toolbox. I don’t think I’m using this
GIF in the right context. I don’t know The Matrix that well. But hopefully the juxtaposition works. So I feel like in tech we love this meme of like something once,
something anywhere. And I think the way that I kind of think about auto-generation is you’re writing something once and the computer is going to write it anywhere else with about 17 different asterisks on it. So that’s what auto-generation is. And the next thing you
must be wondering is well, how do you auto-generate code? This sounds super, super complicated, like what are the actual steps involved in this black magic? And so has anybody here
every done a Mad Lib before? Okay, so that’s like half the room. So half of the room has
been to middle school. So Mad Libs, you get
like a packet of them, you get about 20 of them and each one is a piece
of paper with a template, so it’s a couple sentences with various words that have
been replaced by blanks and then a word bank. So you know, a bunch of different nouns and a bunch of different
adverbs and adjectives, et cetera, et cetera, et cetera. And so then you take a couple
words from the word bank and you fill them into the template. So you know, I’ll take a couple nouns and we’ll say the dog ran up the hill. Then it, I’ll grab an adverb, then it angrily jumped under the chair. And hopefully you made
something funny out of it and then you rip it out,
throw it in the trash, and move onto the next Mad Lib. What we kind of ignored here is I have one template and I’ve got four, six, eight different words. And I can actually use this template and this set of words to create a whole mess of different
sentences, different paragraphs. And so these are a bunch of the different paragraphs that I can create with one template and a batch of eight words. And some of these sentences
are utter nonsense and some of them are
grammatically correct, but ultimately, this is
kind of an interesting idea that I have one template,
I have a word bank, and I can use that to create many, many different kinds of sentences. Cool, so now let’s try applying this to something like code. And for the rest of this talk, I’m gonna kind of use this, it’s a little bit of, it’s not, anyways, I’m gonna use this
example relating to food. So I’m gonna have all of these different food APIs that allow me to like add toppings
to food and order food. So you can see in my
word bank on the right, I’ve got pizza class
and and pizza can take various API options like
pepperoni and sausage and salad, et cetera, et cetera. On the left, we’ve got something that actually kind of looks like Ruby code. And it’s Ruby code with a
bunch of different words or a bunch of different key
words taken out of them. So the reason we can kinda look at this and see Ruby code, you know, the first line we see .new, that’s obviously a Ruby-ism. And then we set something to an object, the second line we can see that we’re calling this add function and we’re calling with
some kind of parameter. The third line is just
completely valid Ruby, so we’re just saying on
some object called A, we’re gonna call the order function. So what I could do is I could take various parts from this word bank and I could apply it to my template just as I have been in the last example where I was using English. So let’s see what that looks like. So I’ve got these six examples and all of these look
like valid Ruby code, provided that I have
some kind of pizza object or some kind of salad object, some kind of breadstick object. And so this is kind of
an interesting idea that, you know, instead of having
a template full of words, which is kind of like a text file, instead I’ve got a template full of words that when everything’s all said and done, it just happens to look like code and it just happens to be
code that a Ruby interpreter could perfectly handle. And so if there’s anything that I really want you to get out of this talk today, it’s that auto-generation
is just like Mad Libs. It’s taking a template file that looks kind of like Ruby if
you squint at it a little bit, and it’s a file that you
have written out by hand, it’s a text file, and you’re injecting in
certain Ruby key words such that when you have a final result, it’s valid Ruby code that can be run through a Ruby interpreter. Now I think the important
thing to think about is like what isn’t auto-generation? Auto-generation means a lot of things to a lot of different people and in the context of Magic Modules, in the context of how I’ve
been doing auto-generation, these are the things
that aren’t being used. I’m not doing any blockchain, there is no Bitcoin, there
is no initial point offering, insert Bitcoin joke here. But there’s also no machine learning. I haven’t talked at all about linear regression models or tensor flow or black boxes or artificial intelligence. That’s because none of those things are necessary in order for a computer to be writing a code. There’s also no compiler magic. I’m not gonna be building out an abstract syntax tree full of my Ruby code and trying to trans-pile it
over to a different language or anything like that. And there’s also no meta-programming. I’m not gonna be writing Ruby code that on the fly creates classes and adds methods to those classes. There are auto-generation approaches that use all of these. Maybe not blockchain, I don’t
really know what blockchain is but there are auto-generation approaches that try to use machine learning and try to use meta-programming and those are perfectly valid options. It’s not gonna be what I’m
gonna be talking about today. So auto-generation, cool little trick, I can build this template
and I can get something that happens to look like Ruby code. When is this actually useful? When do I want to be trying this approach? And so before I dive into that, I kinda want to talk a little bit about the abstractions
that we have in our toolbox and the things that we use every day as Ruby programmers because I think it’s really useful to look at the tools we already have and say, “Oh, this is why we use X, this is why we use Z.” And then understand where those approaches don’t necessarily work and why we might be
searching for other options. So I’ve got a little bit of Ruby code here using the same example as before and I’ve got a little script and at some point during my script, I call the same block of code twice. I order a pepperoni pineapple pizza twice. And so most of us would say, “Hey, wait a second, that’s code reuse.” We’re using the same code
in two different places. You know what we should do with that? We should apply a function. So we talk all that code, we remove it, we put it into a function. In this instance it’s called
the orderPizza function and then we replace those two instances of code with function calls. And as Rubyists, I think
we’ve all done this before. And if we wanna get really fancy, we can actually say oh, okay, well a function can take in parameters and so let’s go ahead
and instead of having all those constants in my function, we’ll have them inserted
in as a function parameter and then we can call this
function with that parameter. So this code and this code functionally do the exact same thing. And so why do we actually
choose to use a function here? I think we all kind of have built up this gut intuition of
when to use functions and when they are good
to avoid code reuse. But we use functions, for one thing, to provide us the ability to reuse code. To provide us the ability to
have the same blocks of code in one spot that can be
called from multiple areas. But they also provide us an abstraction and they provide us an
abstraction over actions. I can call the order pizza function and I really don’t need to know what happens under the hood. Is Ruby firing up a pizza oven? Like what is happening, I don’t know, I don’t really care. I just know that if I call
the order pizza function, I will get a pizza. So let’s move onto the other abstraction that we use very commonly
which is classes. So in this case I can
actually create a class called a pizza orderer class and I can initialize it with toppings and then I can create a new
instantiation of this class, certain list of toppings
and I can call order. And again this functionally does the exact same thing that I noticed that I was doing a couple slides ago. So when we think about classes, we have always been taught in object oriented programming that classes are an abstraction over things. Classes, we, if our code
is going to be using pizza, pizza is something we’re going to be interacting with in our code, let’s create a pizza class and let’s put all of the functionality relating to pizza inside of this class. And then we can create
various instantiations of this and we don’t really have to worry too much about the underlying ideas behind what this class is doing. But classes also provide
an abstraction over inputs. Once you instantiate a class with a particular set of options, a couple slides ago that
happened to be toppings, I no longer have to think about what toppings are inside my pizza and do I have to interact
with my pizza differently if there are lot of
toppings or no toppings. At this point it’s just a pizza object. I interact with it using the standard API that we’ve defined regardless of what is actually within this pizza. So now I’m gonna move on to what if we actually had multiple different variations of this? So I’ve got that pizza
function from before where I’m interacting with a pizza and now I’m gonna add a new set of ideas. So in a slightly separate script, I’m now interacting with salads. And so it’s the same idea, I have an order salad function that interacts with a salad class and I’m calling this order salad function a couple times. Now how many of you have ever taken a block of code that you’ve written, copied and pasted it somewhere else, and changed it just a little bit? Cool, and of you, if you could all keep your hands up for just a quick second. How many of you think
that is a good practice? Okay, cool. So just for everybody, about three quarters of the room put their hands up for
that first question, a quarter of you are lying. And pretty much everybody put their hands down for the second question. So ultimately, you know, that is the same approach that I would take looking at these two sets of code, is I would copy and paste
everything from the first script, copy everything from the first script, paste it over to the second script, and then just change a
couple things by hand. You know, I’ll change pizza to salad, and I’ll change pepperoni to lettuce, et cetera, et cetera. And then we’ve got a
third one, breadsticks. Okay, well same idea. I’ll copy and paste
salad over to breadsticks and I’ll change salad to breadsticks and oh, you know, instead of using the .add function, it turns out breadsticks actually has a .add sauce option. Okay, well fine, that’s a one line change. I’ll make that change just fine. And we can kind of see
what this would look like is one large code base. So you know, this is kinda the same idea just rearranged a little bit. But as we keep going, what happens if we add in a fourth option? What if we add in
interacting with hamburgers? Or a fifth option, interacting with pasta? Do we wanna just keep copying and pasting these blocks of code over and over again and slightly tweaking them. And then comes the question of, well what happens when we
decide to do a refactor? And we decide that oh,
all of these different order functions actually need to work a bit differently because
all of the underlying code within these functions changed. When I’m interacting
with two or three things, that’s just fine, I can go ahead and make two or three manual changes. If I’m interacting with 50 or 100, now I’ve actually given myself a huge maintenance burden to deal with because every time I copy and paste code, that’s giving me an obligation to go and change that
code later on, if need be. So what we’re really looking for is we’re looking for a way to
fix this copy-paste problem. How do you deal with the situations where you’re copying and pasting code and then just making very
slight variations to them? Variations that are slight enough that if you squint at
your two various places that you put code, they pretty much look the exact same. And so going back to abstractions, we need some way to abstract
over similar functionality. And more importantly we need a way to abstract over the problem of similar functionality at scale. At a very small scale, we have a great way to deal
with similar functionality. It’s copy and paste. It works very, very well when you only have to copy or paste two or three times. So now why talk about
this idea of at scale? Like what exactly does scale look like? And I’m gonna show you
this lovely unlabeled graph which is going to make everything clearer. So this graph is kind of showing the amount of effort something takes versus the number of features
that you’re adding in, versus the number of similar features that you’re adding in. So if you look at this first line, if I’m adding very
similar features by hand, every single time I do it it’s gonna take roughly the same amount of time. You know I’m gonna copy and paste the block of code and I’m gonna slightly alter a couple things. Sometimes those alterations
might be very major, sometimes they might be very minor. And every time I do this, it’s gonna take roughly
the same amount of time and it’s not gonna take
a huge amount of effort. And gradually, as I get up to, gradually when I get up to the point that I’ve done this 20
times, 30 times, 40 times, I’m gonna have the problem
of tech debt occurring. And so gradually as more code is introduced into my code base, it’s going to take more effort and more time for me to actually introduce new things into my code base. Now the problem of auto-generation is the first time you
auto-generate something, it takes a tremendous amount of effort. You have to write up some templates and you have to write up a word bank and you have to figure out how the two interact with each other and then you’ve gotta use some bug bashing and then you get to the second time that you’re going to add a feature and oh, now you’ve gotta alter your code generator a little bit, you’ve gotta alter your
templates a little bit. And then you get to the third time, and okay, I’m starting
to get the swing of this. And eventually you hit an inflection point where you’ve generated enough features that alright, it doesn’t
take any effort really, a slight tweak of my word bank and I can spit out a new feature. A slight tweak, a slight
addition to my word bank, I’ve got another new feature. Over and over and over again. So when do you actually auto-generate? And I think the key thing here is you auto-generate when you have lots and lots of similar looking code. And because of that the overhead of writing out templates and writing out word banks and getting them to coalesce
properly makes sense. And ultimately you’re gonna be asking this question of should I be auto-generating pretty much every time that you add a new feature. Not just at the beginning of the process. Should I be auto-generating this feature or is it so different from the features that I’m currently auto-generating, that I should just do this one by hand and leave the rest of
them to be auto-generated. That’s not necessarily a question that I’ve got a great answer for you for. It’s really gonna depend on
your particular situation because the other thing to remember is once you start to auto-generate, there really is no going back. Once you have this
template and this word bank and you start spitting
out auto-generated code, you can’t just make
handwritten alterations to auto-generated code. You run your auto-generator, you get all of this auto-generated code, and then you make an
alteration to that code. Well the next time you try
to rerun your auto-generator, your auto-generator is
going to rewrite that code. That little alteration you made isn’t in your original template so everything that you write has to be back-ported to your template or you’re going to lose it. There is no way to merge
auto-generated code and handwritten code. The auto-generated code always wins. So the topic of my talk was make Ruby write your code for you and a lot of you a this
point might be thinking, “Hey, wait a second, where is the Ruby?” So for the rest of this talk, I’m going to be talking about why is Ruby a great choice for
writing out code generators and for doing this template and word bank Mad Lib-ing process. So Ruby. When we’re trying to auto-generate code, we need three distinct things and I think we’ve talked about all three of these so far. We need a word bank, so somehow computationally we need a way to define this word bank. We need a way to write
out these templates. These templates that are text files that happen to look like Ruby code. And then we need some kind of a tool that can inject values from my word bank into my templates. And I’m not gonna talk too much about that third bullet point just because depending how you built out your word
bank and your templates, the actual little script that injects values from one to another might not actually be all that complicated and it also might wildly vary depending on what your templates
and word banks look like. So let’s start out with templates. Ruby has this amazing
library built in called ERB. ERB stands for embedded
Ruby and out of curiosity, how many of you have ever
done any rails development, like webdev before? Wow, that is a lot of hands. Okay, so that’s everybody. So this is all a pretty familiar process to most of you where you write out a template file in ERB and it’s gonna be a text file that’s a combination of both text as well as some Ruby statements that are noted via this
bracket percent mark, that’s what those are called. Bracket percent mark thing. And then you’re going
to hand the ERB code, you’re gonna hand Ruby
both your ERB template as well as some Ruby objects. And so in this case I’m gonna hand Ruby this array of four different things and I’m gonna hand it this template and it’s gonna run through and it’s gonna return
back to me a text file that has no embedded Ruby in it, it just has a bunch of text. And if you’re thinking, “Man, this sounds a lot like what we’re trying to do with
this Mad Lib-ing thing,” you are completely correct. So now what if my template
actually looked like Ruby code? I think that’s what we’re
getting towards here. And so on the left here, I’ve got this segment of code that we’ve been using throughout this talk and on the right here, I’ve got a templatized version of it. And you can start to see the
similarities between them. You can start to see, you know, various Ruby keywords, def order, okay, now I’m injecting
the name of my food, which is pizza, and then I’m going to loop
over some various toppings and I’m going to do a.add the statement and then at the end here, I have a.order followed by end, which is used to denote the
end of a block or a function. And so the template doesn’t look, it doesn’t look very clean. It’s a little bit difficult to parse. But you can definitely get the sense that this is Ruby code. Now everybody raised their hands and said that they were a rails developer and so a lot of you must be having your minds blown right now thinking, “Oh hey, webdev, I’ve done this before, I’ve used views, I’ve used ERB, I have all of these thoughts
and all of these opinions and I can’t wait to start
auto-generating using them.” And just all kinds of deja-vu. And the ideas that we use
towards creating our templates are actually fairly different. So I used to do some rails
development back in the day, not a ton by any means, and some of the opinions that I got while trying to build
out rails application are very different from the opinions that I have been getting when
trying to auto-generate code. And the big thing is in your templates that are being used to auto-generate code, you want to put as much
logic as possible in there. You don’t want to be clever, you don’t want to be making function calls that are writing out full functions. You want these to be
as verbose as possible. And that’s because once
you start auto-generating, your code base isn’t the code
that you’re auto-generating. That’s just the end product. The code base that you’re working on on a daily basis really is your template. The alterations that you
make to your template are going to be what you’re
doing on a daily basis in order to get to your final state, which is almost a byproduct. And so your biggest goal should be I want my template to be human readable. I want them to have good coding standards, I want them to be something that I find hopefully joyful to interact with on a daily basis as I’m
auto-generating code. So now we get to the second part which is your word bank. And there’s a million different ways to define your word bank. The way that I’ve been using in my work with Magic Modules is, oh, we’re not quite there yet. Okay, so word banks. Your word banks contain everything that you need in your template. And so I think the best way to figure out, what information do I
need in my word bank, is to actually reverse
engineer your word bank from your template. So you start out with one feature that you wrote out by hand, just to understand what
does my template look like, and then you take out some of the keywords in that handwritten example and you replace them with blanks. You say “Okay, well this
is now my template.’ And then you look at your
template and you say, “Okay, well in this blank I’m going to need this API information, for this blank I’m going
to need this information.” And from there that’s how you engineer what your word bank is going to look like and what information is going
to be contained within it. So now that you have some idea of this is the information that
my word bank actually needs, how are you going to format it? Like what technical prowess can you use to actually get this into
a computer readable format? And I love YAML, which I feel like is becoming a very unpopular opinion by the day. YAML, I just think is really cool. So you can, Ruby has a really,
really cool YAML library. It lets you both take a Ruby object and serialize it to YAML
and also do the opposite. So in this case I can write out some YAML that represents an array and I can say YAML.load
and I get a Ruby object that is, I get a Ruby array with all those given values on it. And I can do the same thing with a hash or really any other primitive. I can also do this with classes, which I think is the absolute
coolest thing in the world. So I can write out by hand in YAML what I want a various
object to look like in Ruby and I can say YAML.load and for the most part, I’m gonna get to some
caveats in a second here, for the most part that’s the same thing as saying and
setting all of these values. And the first time I heard that, it pretty much broke me in a good way. Like I thought that was just
the coolest thing in the world that I could write up Ruby objects by hand and then just say, “Give them to me,” and all of the sudden I would have them. There are a couple caveats in this. These are things, we use YAML within our code generator, Magic Modules, so these are a lot of the caveats that we’ve had to deal with which is like initializers
are never called and so you just kind of get an object that all of the various fields that you’ve written up about and there are no rules because just because something doesn’t have a getter or a setter doesn’t mean that you can’t actually inject those values into YAML and you could have some security issues depending on whether or not you trust the source of your YAML and the TL;DR is there are no rules, you can do anything that you want. And that might make some
of you a little unhappy. So kind of just to wrap up, the TL;DR of this entire talk is code generation really is like Mad Libs. It’s taking a template
that looks like Ruby code or Python code or Go code, and it’s taking a word bank of things from various API’s that
you’re interacting with or whatever it is you’re
trying to auto-generate and it’s combining them in order to build functional Ruby code that is run through a Ruby interpreter or a Python interpreter or a Go compiler or whatever else, in the exact same way that
your human written code is. So with that, thanks a lot. I’m available throughout the interwebs. Twitter, GitHub, Speaker
Deck, ramble raptor. Slides are already posted up there. If you are interested in seeing the code generator that I’ve
been working on full-time, And good luck. Hopefully this is useful and you can find uses for it in the real world. (clapping)

Leave a Reply

Your email address will not be published. Required fields are marked *