[MUSIC PLAYING] DAVID MALAN: All right. Thank you so much for coming. This is CS50 seminar on Docker, a technology that we ourselves and CS50 have started using for some time now. So my name is David Malan, I teach Harvard's Introduction to Computer Science. For quite some years, we've been giving students downloadable client-side virtual machines on which they do their problems sets. That we have now transitioned to a Cloud environment that actually uses this technology called Docker, such that all the CS50 students now have their own Docker containers that you'll soon hear all about. Moreover, on CS50's server side cluster, for many years we were using Amazon's Cloud server. We were running individual virtual machines. That too, we've begun to transition to these things called Docker containers so that all of our applications are now perfectly isolated from one another. So for that and more, allow me to introduce our friends, Nico and Mano, from Docker itself. NICOLA KABAR: Thanks, David. Hello, everyone. My name is Nico and this is Mano. We're from Docker. We're going to be talking about-- giving you guys an intro to Docker, and hopefully, towards the end of this talk you can realize how much you can use doctor to oxalate your application development and deployment. So, we're going to start off real quick with some background information. Describe what Docker is all about. How does it work? How is it architected? I'll be doing some demos. And Mano is going to be describing how can you use Docker and giving you specific steps how you can get started. I would appreciate if you guys can hold off for your questions towards the end. That way, I might be addressing those questions throughout the presentation. So we'll leave some time towards the end for questions. So just real quick, who has actually ever worked on Docker, like played with it? Awesome. Cool. Great. So, I'm going to start with some history. So back in the '90s and early 2000s, basically as web developers, app developers, when they went to deploy an application it was tied to bare metal. It was one server. It was one application. Traditionally, an example would be like a LAMP stack, where you actually had to bring up the pool of resources. CPU, memory, disk, network, installing operating system on top of that. If you're serving something, if you're actually having web server, you need something like Apache to serve it. If your application needs database, backhand, you would install something like MySQL, and so on. And if you need the run time, PHPs and PHP Python work were there. And so we actually had to take those steps in order to get your application up and running. If you needed more compute power, you basically had to call your Ops guy or gal to go and rack up a new piece of hardware, connect it, and you have to repeat those processes again and again. So this process was relatively expensive. Was definitely very slow. It was inefficient. And in a lot of cases, your hardware was underutilized. So, in the late '90s and early 2000s, hardware virtualization came across. And as you can see here in the picture, basically what they did is abstracted the pool of free hardware resources and kind of served those to the upper layers, in this case, a guest operating system. And the whole idea of virtual machines came across and that truly helped Cloud computing as we know it today. So what that meant is you can run multiple VMs, which meant multiple stacks, multiple application on a same physical machine. This definitely helped with the speed of application deployment. Definitely with expenses. You don't have to go and spend energy, time, and resources to rack more servers to get to more compute. And the speed of actually bringing those resources up is much faster. Great. So we solved world hunger, right? No, not really. So, virtualization as much as it's actually helped, address the problem, it actually introduced a lot of challenges. The hypervisor definitely introduced a lot of complexity, handling those underlying pool of resources. It's heavier in the sense that before you had a single operating system which is like three, four gigs on disk. Now, if you have 10 machines on a single hardware you have to multiply that by the number of machines. It's definitely more expensive in a sense you still have to get licensing for the virtualization technology if it's not open source. But, let's not take all the credit from virtualization. Because what happened is there's a lot of stacks and lots of software technologies that were enabled by how fast you were able to get to resources with the Cloud boom. So, today a single app or service can be using any of the following runtimes or databases. PHP, Python, MySQL, Redis, whatnot. So there's a lot of complexity on this number of stacks to actually bring up a single service. And along with that, you had a lot of underlying resources or infrastructure types to test deploy and basically take to production those applications that you're developing. Especially as your teams have grown working on those apps, there's a lot of complexity and challenges that were brought to ensure that the cycle-- basically application development cycle, is actually successful. So, the fact that your application is working locally on your local VM does not guarantee that your colleague is going to expect the same results. And when the operations team is involved in taking what you have and deploying it in production scale, also there's no guarantee that that's actually going to happen. So this leaves us with a really big-- a lot of question marks, a lot of challenges actually faced similarly back in the days. And that reminded us of the shipping industry. So the shipping industry had a lot of goods, as you can see on the left hand side. And on the right hand side, there's a lot of, basically, ways to ship those goods. And what happens as a couple folks came together and said, we need to standardize how we actually ship those goods. And boom, you have the intermodal shipping container. So they agreed on the most common sizes for the container. How to handle them. What exact method you need to load them and unload them. And therefore, that truly helped the shipping industry. Now more than 90% focus transported globally are using those containers. And that definitely decreases the expenses as well as the damages due to shipping. So we take the same model and we apply the two app development software architecture, in the sense that containerization took the virtualization up one level. So instead of doing that at the hardware level, it became more of an operating system level virtualization. And we do that by providing each application in its own lightweight, isolated, runnable, and portable, most importantly, a way to actually package everything that it needs to run. Anywhere it can be run. So, regardless if you're running it on local dev environment, your production environment, your staging or testing. No matter what underlying infrastructure is there, you had a functional working app. So that's exactly what basically containers do to this problem. They address it by packaging it in such way that it can guarantee that it's deployed successfully no matter where it lives. So if you're going like, Bob it's still OK. If you're confused with what I'm saying, I'm going to be elaborating on that. So how does Docker itself fit in this picture? So Docker is an open platform to easily, emphasize easily, to build ship, run, lightweight portable self sufficient app containers anywhere. So if you take something from this talk, please take the following. If you have your app running locally and you developed it in using the Docker platform, expect it to be successfully deployed. No matter what is the underlying infrastructure. So if you have a Docker container and it's working, then as long as there's a Docker engine on the other side-- if your operation infrastructure is using any Cloud, whether it is AWS, or Google's, or Microsoft, or any of the public Clouds, or your own Cloud, or your open stack Cloud, or your local environment. If you have an engine running, that means it's going to be successfully deployed there. It's going to be running exactly the same behavior as you architected it to be. So if we look at-- I'm going to go through what actually are in the main components of Docker. So Engine is at the core of Docker. It is the brain's. It orchestrates building, shipping, and deploying and managing the containers themselves. I'll dig into what Engine does in more details in a second. Basically, because Doctor was built around the client server architectures, so in order to interact with the Engine you need some sort of a client. Images are the templates in which containers are built from. So images are basically just static files. Templates and containers is actually what's is running at runtime that is serving your application or doing something with the data. Registry is addressed as a problem of how you actually distribute images. So if you need to share an image that you worked on to your colleague or to the ops team, you use it using Registry. You can download an open source version of it that Docker worked on and open sourced. Or you can use Docker help, which is the Cloud version to push and pull images out there. That's a huge thing. Because there's a huge ecosystem around Docker and it's really heavily utilizing the hub. So to summarize here, this is how the minimalist Docker workflow client. You interact with the host, in this case it's the Docker daemons. It's the same thing as Engine. You do commands like Docker build, pull, run. And the Engine itself goes and does those things. So either it interacts with Registry to pull those images and the layers of the images. Whether if you want to deploy, run containers, kill them, throw them down, whatnot. So this summarizes the workflow of all of these components. So if you take every component by itself. So Engine, it's just a daemon. It'll kind of play it to support it on Linux because it does require certain Linux kernel features. But Windows is working on doing the same thing. It's supposed to be supported by Windows Server 2016. So, again, the responsibilities with the engine is to, or are to, build images. Pull images from the Docker Hub or your own Registry. If you're done with those images or you create a new images, you can push those back to registry to distribute them to other teams. And trying to contain it locally and manage the containers life cycle locally. It is built around HTTP REST API. So technically you can write your own client as long as it uses HTTP, which is a very standard mechanism to talk to Engine and a lot of other services. And you can see from here that regardless of what the infrastructure is, as long as you can-- all you need is an operating system, Linux specifically. And you can install Docker Engine on top of that and have it running and it orchestrates, basically, all these app one, two, and three are actual containers. So that's Engine. As I mentioned earlier because you need to interact with Engine, there's the client. But actually when you install Docker, it ships with it. So it gets installed, so it's a single binary. And you can do local calls to your Docker Engine. Or remote calls to remote Engines. It does use HTTP, as I mentioned earlier. There's a GUI client called Kitematic from Docker. And there are definitely a lot of other folks who are building a lot of GUIs that basically implement some HTTP calls to talk to Engine. Just some sample commands. If you do Docker version, it would show you the client version as well as the server version. If you do Docker info it will tell you all the information about how many containers are running or created, how many images you have, and so on and so on. Here I have, in the next to last box, I have Doctor run. So that's how I'm actually creating container. And I'm giving it to echo Hello World and sleep for a second and whatnot. And you can see the result. So it's ongoing. And similar to Linux ps, you can see all the processes and, in this case, all the running containers. This one's referring back to the container I just created. So, this is really important because, like, it can be a bit confusing. So images are the read-only collection of files, right? They are what our container is based on. But they're only read-only. So you start off with a base image. It tends to mimic OS-like, so Ubuntu, CentOS, whatnot base image. And then you start building on top of that, certain layers, that will make up your end image, the end result here. And each of those layers should have a parent image that it references when it actually wants to create. They are immutable, in the sense that because they're read-only, you cannot actually make changes to them. You can use them to create a container from an image, which will call all the subsequent required images underneath it. You can make changes to a different layer, it's a rewrite layer I'll talk about in a second. But each of those layers are never changed. Basically images use something called Union File System, UFS. And there are different storage backends that utilize this technology. And what that means is that it brings together distinct file systems to make them look like one. So you can actually, from an application perspective, you have a top of a view that shows all the different file system needed for that application to run. But they're actually, on this, they're actually in separate places and being utilized by other containers as well. So as you can see in here that if we start with daemon image as a base image, and then we go in and add [? emacs ?] and then that's another layer. And then add Apache. That's another layer. And then we spend the container from that. Each of those images, each of those layers, is distinct and can be reused by other containers. If you look at containers themselves, they're somehow like VM-like, but not treated the same time. So, they do not have, technically, the full operating system underneath them. They use the single kernel of the host operating system. And they build on top of that. They mimic in how they look. They mimic their root file system of the operating system. But they actually are not replicating. So, instead of having immutable layers, the last layer, which is the container itself, it's a read-write layer. That also runs the processes of your application. And it depends on the underlying layers. Every container is created from an image. And that image can be a single layer or multilayer image. And I want to note here that Docker heavily uses, or is based on Copy-On-Write mechanism. So that, actually, if you are not making changes to the container, it's not going to take extra space. So that's basically how you summarize a Copy-On-Write. It's going to definitely speed up the boot time for the container. Because if you're not making changes to the container, it's utilizing what's already there. So, how it actually works. Part of it's like, right now, it utilizes at least two key kernel features. And that is basically what created that level of isolation for the containers themselves. Those features are namespaces and cgroups. So namespaces are a way to create isolated resources, so that within the container itself, only you can see certain resources. Such as the networking interface or the certain users or whatnot. And those are only visible and only accessible within the container. Cgroup on the other side limits how you use those resources. CPU, memory, and disk. When you can go in, I mean those are actually features that were developed by-- they're part of the Linux kernel. So they were not reinvented by or recreated by Docker. Docker uses them. What Doctor really did here is actually it orchestrated creating namespaces for each container and creating the cgroups so that it's ridiculously easy to create containers using those features. Of course, as I described earlier, Union File Systems and Copy-On-Write truly help the speed and the disk utilization of containers. And once you get your hands around Docker, you're going to see how fast it is to actually spin up containers and tear them down. So, if you might ask, how can you actually build images? We build images by a process of creating containers and making changes, altering them, and committing them into becoming an image. So it's a chicken and egg reference here, because all containers come from images and images come from committed containers, for the most part. There are three options to create images. I'm going to describe the first and last. You can either manually go and run the container and make those changes, like you would do on any VM or any operating system, such as installing new binaries, adding file systems, and whatnot. And then you exit, as you can see up there. I am exiting my container. And then I'm doing Docker commit. And I'm committing that. You can see that the number here is just a UUID, or the first 12 bits of the UUID. Or bytes of the UUID. And then I'm calling it my image. So now Docker takes care of recording everything I did it and creating the new image based on that. I'm not going to talk about tarball, but there's a way you can get a single, create a single, or make a single layer image using tarballs. What I'm going to talk about this and what's mostly used today, is Dockerfile. Which is technically the first step automated by Docker itself. So Dockerfiles are things that you're going to see in a lot of GitHub repos today. It's basically just a text file describing exactly how to build an image. And for every line, it actually creates the container, executes that line, commits that container into a new image, and you, basically, use it for all subsequent operations until you get to the last image. Which is basically the end goal here, the end. And after you exec-- after you write your Dockerfile, which is purely in text, you do a Docker build and the name of the image. And you point at that that is where the Dockerfile is at. And you can expect to see my image as an image that you have locally. So that's just a visual example of what goes on. You start with a base image. You run that into a container that doesn't alter the base image itself. But instead creates a rewrite layer on top of it where you make the changes, in which you commit and you repeat the process until you get to your final image. And by doing so, every other build process can use the same layers and the same-- basically Docker caches those layers. So that if I'm doing the same exact process, but instead of installing PHP, I'm installing Python. It's going to use Apache and Ubuntu. So that way you're utilizing your disk. It's utilizing the cache and available images there. The final piece is Registry, which is how you distribute your images. And, as I mentioned earlier, there's a Cloud version of it, which is Docker Hub. You can go and explore a lot of, basically it's a public SAS product that you can still have private images, but there's a lot of public images. It's actually unlimited, you can push unlimited public images there. And this is how you can collaborate with your team. You can just point them at you repo and they can download it or your image and they can download it. So enough with the talk. Who wants to see some demos real quick? All right. So here I have. Ca you guys see my screen? All right. So I have Docker running here, so I can check it's-- This is the version of Docker that's running. Can do Docker info. Check all the information about how many images they have, and so on and so on. Docker PS, there's nothing running. Concatenated those. So the first thing I want to do is show you how you can easily run a container. So the beauty about Doctor run, if it actually does not find an image locally, by default it talks to Doctor Hub and tries to find it there and downloads it for you. So it includes a Docker pull command, naturally. So if I do a Docker run, hello-world. So, first it's going to try to locate it. Otherwise, as you can see here, it could not find it locally. Right now it just pulled two layers that made that image and I ran it. The hello-world is just basically outputs, what you have done. So this is the easiest, one the easiest examples. So actually I just ran and terminated the container real quick. If I want to run-- and by the way, if I want to time that, just so you know, this is how long it takes to actually spin up and contain it. We're measuring it in milliseconds. So you can see how much this can actually help you not only in testing, but also even deployment. So that's a quick note on that. The next thing I'm going to do is actually run an image I've already prepared. So docker run. -d is just a flag to tell it to run in the background. And -p assigns certain ports. Because by default, the containers are isolated, so you have to specify exactly how it can access them. And in this case, I'm telling Docker to map a random port on the host to a specified port within the container itself. And that's basically where the image-- hopefully this is the right one. So it does parallel downloads each of those layers as you can see here. Those are of the layers making the end image that I built. It's going to take a second. And voila. So now if I do a docker ps, I should see something that is running. I should see the ID, the image that this it was based off, and the command that was executed. And how to access it is basically you go to that port. So I'm going to go to-- this is I'm running it on AWS. I'm going to go to 32769. Oops. And here we go. So this is actually just a web service that shows which container it's being served from. So you can see that it is from container a9f. And here this is the name of the container. So you guys can see how quickly it was to actually not only pull but also deploy this container. Now the next step is to look into Dockerfiles and how we can actually build new images. I'm just going to go get clone, a sample Dockerfile based on the earlier diagram, the one to Apache and PHP. Hopefully I remember my repo. So I have my repository right now. And you're going to see this a lot actually. I did not install tree. So basically you're going to see how your source code documentation around it, and then a Dockerfile on how to actually package it. So it's just a sample PHP that echoes hello CS50. So if I want to run it, I'll do docker build. I have to build it first. I'm going to name it demo_cs50. And you need a tag to it too. So let's call it v1 dot. So as I described earlier, what I'm doing today is I'm telling Docker to go use that-- actually, sorry, my bad. We did not take a look at the Dockerfile itself. So the only things in here are index.php as well as the readme file and a Dockerfile. So if you take a look at the Dockerfile, so it's very similar to what I described earlier. It's just a bunch of steps that Docker executes by creating and tearing down containers and [? counting ?] them into an image. And basically you can see-- [INAUDIBLE] it here-- but this is from the local repo. It's going to go and grab index.php. So that's the only source code that are actually part of your application. All this are basically operating system plumbing, getting the right packages and Apache, and PHP, and whatnot. But this is actually taking index.php and committing it into the container, into the image. So if you go ahead and run the command by doing the following, it's going-- actually, this might take a bit. Hopefully it doesn't take too long. So you can see the steps. And I encourage you to go back home today and try it. And Mano will describe how exactly you do that. But it's really great to see exactly what's happening behind the scenes. But it's ridiculously easy to build images and deploy them using Docker. It's taking a bit longer than I expected. Let's see what happens when you-- cool. So as you can see, each of those steps represent lines in the Dockerfile. And it shows here that it successfully built this image. So if I do docker images, I'm going to see all the images that I have locally. And one of them is called my username, and the name of the image, and the tag representing-- mainly it's a version tag. So now if I want to run it, I do docker run. And I just want to do a -d -P. Do v1. So I can see now that I have two containers running, the one that I just created and the hello Docker one that I got last. And you can see here that it assigned it a different port. So if I go to the same IP but assign it a different port-- hopefully I did not. So now this is application that I just deployed. If I want to make changes, I can quickly edit the source code and do the following. Let's do hello Harvard. So now what's going to happen is that I'm going to tag it with a different version-- oh, not this guy-- tag it with a different version. And you're going to see-- do you guys expect it to take the same amount of time to build it a second time or not? All right, and anyone knows why? Speak up. AUDIENCE: [INAUDIBLE] NICOLA KABAR: It's basically we only change one of the later steps. And therefore it's going to use the cache and use each of those layers. And that's truly some of the killer features of Docker is how it actually utilizes and reuses taking over your disk for the same exact pieces of information. So if we do the same thing, it took just a couple seconds. If we want to redeploy-- so now I should have three containers. But this one is being served on the-- seven one. So now it's the third container. Everyone understand what I just did here? So now if you want to share this container real quick with your friends, you can just do docker push the name of the container, hopefully. So now it's going to push it to-- I'm not signed in here. Sorry about that. But I'm not going troubleshoot this now. But basically that one command is just going up push it. And you're going to be able to see it if you go to Docker Hub And you log in, you're going to be able to see it. And then you can just point whoever is going to use that image to go and pull it. And they can use it. With that, hopefully I kind of demonstrated how easy it is to work with Docker. And I'm just going to give it back to Mano. And he's going take it from here. MANO MARKS: All right thanks, thanks Nico. So what? So one of the things I wanted to do is put together why this is an important-- why Docker and why containers are such an important new development, a new way of actually doing software. And before I do, I'm going to just introduce a few stats. I'm not going to read all of these. But this shows you a lot about how popular this is in the community. The core Docker technologies are open source. So that's Docker Engine, Compose, Swarm, a bunch of other stuff is all open source. And we have, what did I say, 1,300 contributors. You're seeing now, if you look at the number of job openings, the last time we looked, it was about 43,000 job openings specifically mentioning familiarity with Docker. Hundreds of millions of images have been downloaded from Docker Hub. And, well, much more large stats. For those who are curious, it was originally written in Python and then rewritten into Go. And it's only been open source-- it's only been released for about 2 and 1/2 years, which means that in 2 and 1/2 years, we've seen a tremendous amount of growth and importance of this in the community. And so I want to talk a little bit about why. So just to reiterate some of Nico's key points, Docker is fast. It is portable. It is reproducible. And it sets up a standard environment. And what-- this is my crappy stamp out monoliths slide-- what it's helping people do, which a lot of the software industry started doing in the early 2000s, is moving from these monolithic single applications where every dependency had to be tested before the entire app had to be deployed, which could mean a website only got deployed once every three months, or more, to a much more service oriented architecture or componentized different type of application architecture. And so allowing these kind of architectures that take advantage of Docker to run in these three principal areas of development, which is development writing your actual code, testing your code, and deploying it. So why is this important? If you're a-- let me give an example. If you are a website device developer, you're developing a website that's based on the database that David produced over here. Sorry David, I'm calling you out. If you wanted to deploy the whole thing, you'd have to wait under a traditional monolithic software development environment, you'd have to wait until he was done with the database before you could actually make any changes to your website. You'd have to redeploy the entire application to do so. And what Docker helps you do is each person work on different components and update them as they go, just making sure that the interfaces stay the same. So what it has done is it's shifted people from doing these massive monolithic architected software that deployed every month to a continuous integration and continuous development environment. Now this isn't unique to Docker, but Docker makes it so much easier, which means you're basically constantly deploying. We talk to enterprises that are deploying public facing applications thousands of times a day because they see the value in just making small changes, and as long as it runs through the tests, letting it go out into production. Nico was always telling me earlier that in many environments, the standard life cycle of a container is measured in seconds, whereas a virtual machine might be measured in months. I wanted to take a slight turn here because I'm at an educational institution. I wanted to give an example of how this works in an educational research situation. So there's an organization called bioboxes. Bioboxes does DNA analysis for researchers. Now what they found was that when a researcher-- and this is not the fault of any particular researcher-- but when a researcher deployed an algorithm to analyze, In a particular way, a DNA sample, they would write the software, publish that, maybe to GitHub or somewhere else, and then they were done. Well the problem was that it wasn't necessarily reproducible. Because in order to understand the software, they would be set up for the exact development environment that that researcher used, usually their laptop, or a server, or a data center that they were using. And consequently, it was very difficult to reproduce research results when analyzing DNA samples to look at things like incidence-- compare incidence of heart attacks based on certain genes being present, for instance, or cancer risk, or any of the other kinds of things. So what they did instead was they started creating containers. And you can go to bioboxes.org, it's a great organization. And what they do is they produce containers based on research. And then whenever somebody sends in their sample, they can run it. And it has all the environment needed to run that algorithm and produce the results. And they're finding that they're much more likely and much more quickly able to return results to people. And in fact, what people are doing are running their own analysis on DNA, sending that in to bioboxes, and then biobox just takes the data, runs it against the variety of different containers to see different results based on different research. So it's a very powerful way in which researchers can make a single instance that allows other people to try and reproduce the results. So how do you get started? We are well supported on Linux. So if you want to install anything on Linux, you use your standard package manager to install. If you're using a Debian, it's apt get. CentOS is yum. Fedora Red Hat is rpm-- I don't remember. Anyway, it's all there. We support a large variety of Linux distributions. You can check those out. We also have options so you could run on Mac or Windows. Now Nico mentioned earlier that it was only supported on Linux. That's true because it needs a Linux kernel. But, you can run in a virtual machine. And what the Docker Toolbox does, which you can download, it gives you that virtual machine. So just a quick 48 second, I think, download. You just search on Docker Toolbox, download it to the Mac, and this part is of course sped up because who wants to watch a download signal? Standard Mac installation, and then you're going to see Jerome put in his password. That's very exciting. And then it installs a whole bunch of tools. And particularly it will install a command line. And then you could see Jerome testing his images. And then based on this, you can see that YouTube thinks that Nico is interested in Star Wars, The Jimmy Kimmel show, and I think Ellen. I think that last one is a clip from an Ellen show. So Docker Toolbox though comes with more than just Docker Machine. So Docker Machine is the thing that helps you set up a virtual machine on your Windows or Mac-- your Windows box or your Mac box-- and helps you do provisioning, But it also comes with Swarm and Compose, which are designed to help you do large scale deployments of your application. So if you want to manage clusters of nodes, clusters of containers, Compose and Swarm are the way to go about that. And of course it comes with Docker Engine and Kitematic, which is this desktop GUI. I should also mention Docker Registry, which is not included in Toolbox, but it is a way for you to run your own registries of Docker Images like Docker Hub, but you can also just use Docker Hub as a way to do that. And, plot twist, you're seeing it running in a container. And that's how we're distributing our slides. This whole presentation is actually an HTML slide deck. And it is running in a container, which you can get by-- NICOLA KABAR: Yes, so it's running full time on my Max. And I'm presenting from it. And you just do Docker after you install your Toolbox. You can just do a docker run and get it, and use the slides. MANO MARKS: And that's it. So we thank you all for coming. And we're happy to answer questions. I should mention before anybody leaves there is T-shirts over there. Sorry anybody who is watching this on Livestream or video, but we have Docker T-shirts over there. And we know Docker students, and in my experience, professors too, like free clothing. So thank you all for coming out. And follow us on Twitter if you want, or don't. I don't care. Also follow Docker on Twitter. That's also interesting. And then that's it. Docker.com. Thank you. [APPLAUSE]