KEVIN SCHMID: Hello everybody. Welcome to the CS50 seminar on Node.js. My name is Kevin. I'm a CS50 TF. And I'm sort of like really excited about this seminar. I think Node.js is very cool. I hope that this seminar can be used as a good, I guess, springboard for some of your final projects if you're interested in using something like Node.js. We'll sort of start the seminar off by just talking about a little bit of the kind of background scalability perspectives of Node.js, and then we'll move to some code examples. And I'll have the code on a website, and you can look at the code. And after the seminar, I'll sort of talk about how you can set up Node.js on your computer. OK. So let's get started. So I guess I just want to talk about web servers, really, first. And to start this discussion, I basically have a diagram which is from the textbook used for CS61, which basically shows the interaction between a client process, like your web browser or like your aim client or something like that, and a web server. So this kind of looks similar to the picture that you saw in lecture on Wednesday where basically we have some client process like Google Chrome. And then step one is the client sends a request. So that can be something like well let's visit, I don't know, CS50.net. So we issue that request. And does anybody remember the name of the protocol that specifies how that request should be structured? Yep. AUDIENCE: [INAUDIBLE]. KEVIN SCHMID: Exactly. So it's like HTTP, right? So basically the specification for how that request should actually be laid out, because at the end of the day, that request is really just like a string that basically says I want this. And the specification for that is HTTP. So that's like a protocol. So then the server receives that request. So you guys have a web server installed in the CS50 appliance. It's Apache. And this week when you work on problem set seven, you'll actually be working with that web server. So the server receives that request, and then it has to kind of scratch its head and say like well what do I do with this? So based on what it decides to do, then it may have to contact some kind of resource. And that resource could be a lot of different things. For one, it could be just like a static HTML file. So it could just be like some HTML that is like for your personal website. It could be a static file like an image or like a movie that you have. It could even have to talk to some kind of database like a MySQL database. So it doesn't always have to communicate with a resource, but in some cases, it could. So then what it's going to do after that is it's going to send back the response. And the response for this is also specified by HTTP. So then the client can receive it. It can tear it apart and process it. And then you get a web page like Google or CS50.net or whatever you went to. OK? So this is the basic interaction that we're going to be dealing with. And we're pretty much going to be focusing on this part of the interaction, the server. OK. Cool. Anybody have any questions so far? OK. So as we said, the web server receives this HTTP request and then issues this HTTP response. And like we talked about before, the CS50 appliance web server is Apache. So when you guys work on P set seven, you're going to be working with the Apache web server. You'll never have to really work with Apache directly too much. You sort of configure Apache a little when you specify the virtual hosts or the v hosts, and we'll get to that in a little bit. But basically, the Apache web server set up to work with PHP kind of out of the box. So what really happens is when you go to one of your websites like, say, local host slash index.PHP or something, is your browser sends that request, and then Apache is sitting there and figures out to do with it. And the action is to execute that code in index.PHP and then send it off back. So there's that. So we sort of talked about this. So it could just serve a static file or run some PHP code and then issue the response. So then a common question that can come up is well, how do we really deal with having multiple users at the same time? So imagine if you were writing a web server, if you had a web server that you were trying to write in something like C or something like that, basically you can think about how there could be some kind of code that would receive the request, but then it has to do all this work on it. It may have to, for example, contact the database or something like that. Right? And then it would do that kind of processing and then sent back the response. So that's like the high level overview. But it's not immediately obvious how you can do that so that two people or even 1,000 people could work with your web server at the same time. So the solution that Apache uses is called threads or processes. So you may have heard of these terms before. It's OK if you haven't, but just think about threads or processes as ways for an operating system or a user program or something like that or a web server to sort of execute multiple things at once. So you may have heard the term like threads of execution. So it's kind of like you're sort of multitasking. And if you've seen on the box of your laptop, or something like that, multicore, what you can do is you can run two different threads on different parts of the CPU so that they can actually happen at the same time. So this is really powerful. And this is kind of Apache's solution to this problem. So are there kind of like any issues with this approach though? So I guess I kind of wrote them there. But both of them sort of use a lot of memory. It's very expensive to create a thread or a process. And part of the reasoning is that just like when you're running a C program like your main and then that calls another function, that has some kind of stack. So threads also require an entirely separate stack which can be quite large. And if you can imagine having tons of users on your website, you would have a lot of different threads. That's a lot of stacks to manage and maintain. So it's large memory consumption. And then, also, let's say you only have one CPU, or let's say you have more threads than you have those multicores. Right? So let's say you had 10 threads and you only had five CPUs. You kind of have to do this thing where you switch between the current one that's running because you can't run all 10 at once. And that's called a context switch. And that term actually has a couple of different contexts, but let's just think of it as switching between two threads. That can be pretty expensive because basically what you have to do is you have to stop what you're doing, save the state of that running thread, and then switch to somewhere else. So does everybody kind of see the motivation of why threads and processes might be a little bulky? And did you have a question? OK. Cool. Anybody have any questions? OK. So if we take a step back for a second, there's kind of like an observation that we can make about a lot of web applications. And that's really that a lot of them actually don't do that much useful work inside of a thread. So has anybody started on P set seven at all? So do you want to maybe describe some of the parts? Have you worked on login or something like that? AUDIENCE: No. KEVIN SCHMID: OK. Never mind. Sorry. But basically, in the P set, you're going to be making a lot of sort of queries to a database to get some information from that database. And what your code is going to be doing, what that Apache process or that Apache thread is going to be doing while it has to contact the database is it's sort of going to be sitting there and it's going to be waiting for the database to reply. Now that might not sound like that big a deal because the database is on your CS50 appliance, right? But there is some kind of network latency there because now the web server has to issue its own request to the database to communicate with the database and then get that information back. So now it's like well wait for me, I'm going to go get something from the database and then there's a lot of waiting going on. Does that make sense? And for some things it's not that bad. If it just has to, for example, access memory, that's not like horrible I/O latency. And when I say, I/O latency, what I'm referring to is like any kind of like input output. But to access a file on the disk, like if I wanted to serve the static HTML file that was on my web page or something like that, I kind of have to stop for a bit, read that file in from the disk, and then in that process I'm waiting. I'm not doing useful work. This is not true of everything, but it is common in applications like P set seven and a lot of applications that you're not actually doing much thinking. And when I say thinking, I mean like computational work. So computational work could be something like, say, you wanted to write a web server that just computed the nth Fibonacci number. That doesn't sound like a particularly fun web server. Like I wouldn't expect that site to be the next Facebook, but that is some kind of computational work. And you can imagine replacing that with some other kind of interesting computational work. Let's say you were writing something that calculated the degrees of separation between two people or something like that. So that does involve some kind of calculation, right? And even then, to do that you still have to do a lot of waiting for maybe you have to query a database to look up who's friends with who or something like that. So there is that kind of notion of computational work. Does that make sense? Does anybody have any questions? Oh and I guess I put chat servers there because chat servers are kind of another good example of this. A chat server doesn't have to do much thinking. It just has to wait for people to send messages and then when they do, send them. OK? So just to recap again, Apache and similar web servers like that fork a lot of threads and processes which can be kind of wasteful. So I guess the question that may come from that is do we need to have multiple threads and processes? What if we just had one? So let's kind of paint a picture of what this would look like. So let's use only one thread. OK? So just imagine this with one thread. Let's suppose we weren't really doing that much useful-- and when I say useful, I mean computational work-- in those multiple threads before. So let's kind of consolidate everything into one thread. So what if we had one thread that kind of just goes around in the loop and constantly checks did something new happen. So for example, something new happened could mean I got something back from the database, or somebody sent me a new HTTP request. So those are kind of events that happen, right? And then what I can do when those new things happen is in this same thread of execution, this single thread of execution, I can call some code that would handle that particular thing. So for example, if I got something back from the database, I could run my small computational part of it that actually just prepares the thing to send back to the user. So does that kind of make sense? But what are really the implications of this? Right? Because we've written a lot of code that-- and I'm just going to jump ahead in the slides if that's OK. So if you don't mind, I'm just going to take a step back. So this kind of thing is called an event loop. OK? And it's kind of the basic idea behind Node.js. So what Node.js is really doing as a web server is there's a single thread that is basically going around in a loop like a while one kind of under the hood of Node.js that's constantly checking, did we receive new things? And then it will run handlers that you set up. But a good question to ask is, how can we make this happen with existing things? So I put a line of C code here that basically looks like it's opening a file, right? I She just came out with an album. So I had to open her a new file. So the way our C code for operating-- and I guess the reason I chose files was because this is kind of the extent of the I/O work that we've done in C in a sense that there's input output. So we call this code that does this f open. And then on the next line of our program, we can now work with f. So this would be an example of something that's like synchronous or blocking because on that first line there we're waiting until we get the file open. So on the second line, we know that we can work with f, but this means that that second line can't really run until the first line is done. Does that make sense? So this would be bad to put in an event handler. And the reason for that is that this kind of waits, right? So this would revert us back to the same thing. And now we wouldn't even have the benefit of multiple threads or processes because we got one thread in Node.js. Does that make sense to everybody? AUDIENCE: Wait. So what's the replacement? KEVIN SCHMID: Oh, so yes. So I'm going to get to the replacement. OK. So what if we had something that looked like this? So what if now I edited f open a little? So I'm passing in the same two arguments as before. I still love the new song that she came out with. But I'm passing a third thing which is this variable called code. But what is code actually in this context? Is it like a regular C variable? It's a function, right? And that may be a little weird because I'm actually like now passing a function into another function. So a couple things to note about this. One, I'm not actually calling the code function. So you don't see code with the left paren, right paren. I'm just passing in code. And in C, what this would actually do is give me a pointer to that actual code, and then this could run it. But just think about it as you're passing the code to run when that file is opened. But what this means is that now the rest of my program which could do other stuff, can continue doing other stuff while we, not really wait, but just have in the back of our heads that when that file's open, run that code at the top. Does that make sense? And now the idea behind Node.js is that the code in the do stuff with f part should be pretty short and simple and straightforward and not really be very computationally intensive. It may have to open another file, but that should also be pretty quick because it should just say do another f open and then call this other code. So just to be completely clear, the f open that does the new Katy Perry song done mp3, that's going to pretty much return immediately. And then we can just continue doing other stuff because all that now f open call does is tell basically the underlying f open code open this file and when you're done opening this file or when you get it back, then run that code. But it doesn't actually run that code. And you had a question? AUDIENCE: You seemed to imply a few times that adding computationally intensive code sort of break the [INAUDIBLE] driven system. [INAUDIBLE]? KEVIN SCHMID: That's a great question. So I actually have an example of how you could integrate computationally intensive code in a little bit. So when we get to the code examples, I'll be sure to pull that one. Is that OK? Thank you. What was your name? AUDIENCE: Aaron. KEVIN SCHMID: Aaron brings up a very good point, which is that if I had some computationally intensive code in the do stuff with f part, the rest of my program can't run and can't listen for new requests or anything until all that stuff is finished. So if I'm writing Node code in general unless we do something like I'm going to suggest later when we look at the code examples, I have to be sure that my code doesn't tie up this event loop. Does that make sense? OK. Cool. So Node.js offers this framework that you can build these event driven servers with. So it has these kind of asynchronous non-blocking I/O libraries, whereas the standard C libraries that we've been working with, like if you just use them in the same way that we've been using them with f opens and stuff, those are blocking because you actually have to wait for that file to open. But Node.js gives you that and it basically ties into Google's V8 JavaScript engine which is the reason that Chrome is so fast at processing JavaScript because it has this V8 engine. So I know that sounds like one of those WWDC developer conferences thing where they just throw a bunch of the letter number things for processors and say this is so cool. But it is cool that they did this because JavaScript-- or maybe if you're not familiar with JavaScript yet because we haven't had the lectures on it-- but JavaScript is an interpreted language. And this is an important point too. So it's important for our web servers to be fast, right? And if we were just running JavaScript code that was interpreted with just any old interpreter it might be slow. So Node benefits from having this super fast V8 interpreter. And I don't know if they named it because the V8 slap in the forehead thing, but OK. So I've prepared some examples at this URL. After the seminar, I'm sort of going to talk about how you can get Node set up, but for now, I just sort of want to walk through some code examples. So if you want to follow along, all the source code is available there. OK? So I'll leave this URL up for a little. And then I'm just going to switch into the terminal. Is everybody good with this URL? So I'm going to switch over to my terminal here. So here's the code that I have for today. Why don't we start with simpler.js file? The other thing is that all of this code is going to be written in JavaScript which you may or may not be familiar with. I guess a couple things is that a lot of JavaScript code is the kind of syntax and structure is very similar to C, so you can kind of pick it up as you go along. I've tried to write a lot of the starting code for this in a way that's similar to C so that it's a little more readable. But as we progress, I'll be demonstrating some of the additional features of JavaScript that are kind of cool. But let's look at this sample program. I guess everything's cut off there. I'm just going to fix that real fast if that's OK or not. I don't know what this is going to do. Is that a little better? Can you see the var and stuff? OK. So the first line is like the JavaScript version of a variable declaration. So just to highlight what this would look like in C. So this is just like me saying index equals three or something like that. So I didn't specify the type. JavaScript does have types, but it's very dynamically typed in nature, so didn't provide any kind of type on it. So it just has var. That's like variable. OK? And I'm calling this variable HTTP. And on my right hand side, I have the expression that I want to put in HTTP. And this says require HTTP. So this is kind of similar to include. It's a little more like powerful than include in the sense that include would just copy and paste the header file for the function prototypes or whatever with the type definitions. But require is actually going to get us the code. So you can think of it as importing some code. So somewhere in the Node.js module system or whatever, they have all this HTTP server code so I'm just fetching it for my own personal use in this program. OK? So then I have this function that I've written. And notice I didn't have to specify the return type or the type of the arguments again. So kind of loose typed in that kind of sense. Two arguments that it takes in, the request and response. So that's conceptually kind of like familiar from the picture that we had on the screen before because we get this request that we have from the user. And then we have a response that we can write things to. So the first line of this does res.writeHead 200 and then this content type text plain. So let's piece this apart a little. So let's just focus on res.write for a little. So write is basically, and write head, are just ways to sort of write out things to the response. OK? So write head, if anybody remembers from the HTTP lecture, do you guys remember headers at the top of the HTTP thing? So why don't I just demo headers real quick. Would that be helpful? Or should we just sort of-- OK. Sure. So when your browser goes to google.com or something like that, there's actually a little more-- this is like a secret-- there's like a little more information that comes through the pipe than just the little search and everything. So to show you this, I'm going to use a program called Curl. OK? So this is something that you can run at your Mac OSX command line or in the appliance or whatever. And so if I do Curl HTTP google.com, I'm going to see the HTML. And this is, in fairness, just the HTML that sort of tells you to redirect to www if your browser doesn't automatically handle the redirection. So this is just HTML, but I'm going to add to Curl this hyphen I flag. OK? And this is going to show me the headers. So this is also information that comes through when I get this response. OK? So at the top, you see this HTTP 301 move permanently. And this is kind of important because this refers to the status code. So the 301 here is the status code, which is basically just an integer that tells the browser or whoever's reading this, if you pretend that you're a browser and you're seeing this, basically now if you look at that and you see a 301, you know I have to do something special based on 301, or something special happened based on the 301. So it says moved permanently. And then, basically, we have a bunch of key value pairs. So we get the location is www.google.com. And then kind of all this other stuff, but basically, what the location is saying is the new location is at www.google.com. So now if you go to google.com, you'll sort of see the browser kind of blink for a second and then redirect you right back to www.google.com. So the responses can contain these headers. And a couple of things to point out. So let's say we were actually successful in visiting a web page. So let me go to-- what's a good website? I'm bad at thinking of good websites on the spot. AUDIENCE: Wikipedia. KEVIN SCHMID: OK. Let's do Wikipedia. So here I was moved. Oh wait. Was I? Yes, I was. OK. So I got to do www. So I'm going to do www. And as you can see, here's all the HTML that the browser would process for Wikipedia. But if I keep scrolling up here, what I'll see at the top-- wow, there's a lot of HTML on Wikipedia-- but what I can see at the top here is this 200 status code as opposed to the 301 that I saw earlier. And notice that it has a nice friendly OK next to it. So this is like the good status code. Does that 200 number look familiar? Yes because when I did simpler.js, I wrote a 200 there. So that's basically saying tell the browser or whoever is trying to get to this that they were successful. Or that kind of like we were successful too. And there's this kind of special syntax in Javascript for declaring a map of these keys like content type and these values like text plain. So if you look at the response that we got back from Wikipedia before,-- I'm going to try to scroll up a little faster-- you have these keys like server and these values Apache. So you've got keys and values. And you can specify this in Node what to send back. So this is actually kind of, in some ways, and in some ways it's not really, but it's a little lower level than the PHP code that you might be writing for P set seven because PHP and Apache sort of take care of some of these things for you. In PHP, you can override the default behavior by writing your own headers. But for the purposes of this, we get to write out our own headers. So does that line make sense to everybody, the write head line? OK. Awesome. So then what I do is I end the response by saying hello world. OK. But that's just a function called request handler. So now I actually have to kind of do something with this function, right? So here what I do is there is this line which does var server equals HTTP.create server, and then I pass in the request handler. So this is kind of the Node way of creating a server. And notice that I'm passing in the request handler. So this is telling the createServer function that I want you to make me a server, and when that server receives a response, I need you to call this request handler function. OK? So that line pretty much finishes right away. So the var server line is done right after you do that pretty much. I mean, it has to set up some internal state to know that you would have to call that request handler function, but it's not going to sit there and say has the user sent me a request yet? Has the user sent me a request yet? So it doesn't block. OK? So what this will do is it basically now stores a pointer to this code, this request handler function, and then will run that code when somebody makes a request. And then we do server.listen. The 1337 there is pretty arbitrary. I had no particular reason for picking that number. It was totally random. But that just specifies the port. So most web servers you'll see that they use port 80 because that's kind of like the convention. So if I go to something like, I don't know, Wikipedia.org, and I put colon 8-- oh wow, you can't see that. I'm sorry. But if I do Wikipedia-- I'll write it here just so that it's clear on the camera. But if I take this into a browser with a colon 80, that specifies go to Wikipedia.org at port 80. So it's like how the United States has multiple ports like where you can ship things to kind of. So it's like go to this particular place on this server. OK. So I just chose 1337. There's a whole range of numbers that you can pick. That wasn't totally special. But what I'm going to do now is I'm going to run Node. Let me actually enter that a couple lines down so that you can see it. I'm going to do Node, and I'm going to run simpler.js. And we'll talk about how to get Node set up in a little bit. But now it's just running the server. So one thing we can try which may not be that exciting is we can actually try to access it in Curl. So I can do Curl, and my machine is local host. You'll also see this written like this sometimes. Local host and 127.0.0.1 are kind of like your home computer. So it's like talking to your own computer. OK. And then I can say 1337. So if I run this line of code, it says hello world. And if I wanted to see that stuff that had content type text plain or whatever, I could even put this here. And notice that it does say OK. And I do have text plain. And then there's kind of all this other stuff that Node will add in there for me. That's not super important. I mean, there are some kind of technical aspects of at that are kind of cool to talk about, but just to show you, I also have the power to change these around. So I can just add a bunch of stuff like that. And then now, if I look in my output, it will be that. So these headers mean certain things to browsers and things like that. And headers can basically tell a browser how to respond to something. If you've ever heard of cookies before, or if you've ever been annoyed by a web page setting cookies, or turned on cookie block or something like that. You can actually set cookies in these headers. So they tell a browser how to behavior in some cases. OK. So that was simpler.js. Does anybody have any questions on that source code file? OK. Cool. So let's remove the r from that and look at simple.js. So this is pretty much the same program. I just wrote it a little differently because I wanted to sort of highlight some features of JavaScript. So notice that the request handler function has totally vanished. Oh yep, did you have a question? AUDIENCE: Yeah, the arguments that are passed to that function, what are they? KEVIN SCHMID: So those are JavaScript objects. In the Node.js documentation, it basically says what methods are available on them. We just happen to have the access to this method called write head and end and stuff like that. But there's a whole bunch more methods. And for example, like one of them in particular on rec, you can do something like rec.method which will tell you whether it's a HTTP get or HTTP post requests and things like that. So there's all kinds of different properties, but they're both JavaScript objects, and they just have functions attached to them that you can write things to. OK? So notice that request handler is totally gone. But the code that I had in request handler is still there. I still have this res.writeHead and I still have this res.end. And what this is an example of in JavaScript is this idea of an anonymous function. and anonymous is like a fitting name for it because it literally doesn't have a name. There's no function request handler in there. Has no name, but it still is taking an argument. So I still got rec and res. And I still have the code. This is perfectly fine JavaScript code. So I can declare a function without explicitly giving it a name. It's a little confusing at first. There are some like useful things that you can do with these anonymous functions. Does anybody have any questions on this, or is it OK just to, for now, sort of just accept that it will do the same thing? Yep? AUDIENCE: Are functions first class in JavaScript? KEVIN SCHMID: They are first class in JavaScript. And just know that these concepts of passing in an anonymous function like this apply to the JavaScript that you may write in your final project for the web browser too. So for example, in the JavaScript in your browser, it's also somewhat event driven in the sense that what you'll have is when the user clicks this button, I want you to run this code. So it's the same kind of ideas of the client side when a mouse click or they mouse over some image on your web page, run this code. That can apply to servers. So that's kind of like the exciting reason why JavaScript is a really suitable or some people think it's a suitable language for this kind of event driver server because you have these anonymous functions. You have the whole idea of this asynchronous code. OK. Anybody have any questions? OK. So that was simple.js. So let's look at one more or a couple more. So this is sleep.js. So is anybody familiar with the C function sleep? From maybe one of the earlier lectures or something like that? So basically you can pass in I think a number of seconds or if you're using U sleep a number of milliseconds or nanoseconds. And basically the program will just stop running for that amount of time. Right? And then it will wake up eventually and then it'll just continue running the program. So this server sort of gives the impression of sleeping. So notice that we have the same res.writeHead 200 with the header as before, but then we're calling this function called set timeout. Set timeout is also available in your web browser Google Chrome or Safari or whatever. And basically what it's doing here is it's taking in a function. Notice, again, it's an anonymous function. So that's kind of cool because we're using an anonymous function within an anonymous function which can be a little weird. But it's taking that function, which is basically saying-- and the way this works is in 5,000 milliseconds, I want you to execute that function which just ends the response and writes hey. So this gives the impression of like sleeping, but the way this actually works is we'll run through this line very quickly. We're just writing something. And then we'll also run through this line very quickly. So we're not actually going to wait five seconds. We're just going to run this code instantly. And then there's, again, this little event loop that now has this thing registers that basically is just constantly going around in a circle and looking at the clock in a single thread and saying, has five seconds passed yet? And then when it sees that the second hand has moved like five seconds or whatever, then it wakes up and says, oh, what do I have to do? Oh I have to run this code. And then it's going to run res.end hey. So again, we're never waiting here. So it's not that this code inside of this function is going to take five seconds to run. This code will run pretty much instantaneously, at least relative to the five seconds that we were talking about earlier before. So just to show this in action, I can do Node.sleep.js. And did I mess up something? Possibly. Sorry. Let's see what we can do to fix this. OK. So definitely use Node.js. I'm just kidding. OK. Just one sec. OK. I know what it is. So the issue is that in my other tab here, I was running Node already on that same address, 1337. So the error that this threw, if we look at it real closely, is address in use, EADDRINUSE. So I was already using 1337 here. So if I shut this off, and then I now try to run this, hopefully, everything will be fine. OK. So you can only have one thing sort of listening on a port at once. Another solution would have been for me to just edit that program and make it be like 1338 or something like that. But now sleep is running. So let's actually try it out in the browser this time because it's a little unexciting to see it in a terminal. So I'm just going to go to that 127 address again at 1337. And if you can see it-- I don't know if you can-- but my browser's taking a very, very long time to load or like five seconds. And then after that, it finally ended the response. And you can't see it because the thing is moved over a little, but if I make this a little smaller, you can see it says hey. So I got the hey, but after five seconds. And it might be a little cleaner to see it here on the terminal, so I'm going to do a-- let's do in here-- let's do Curl that address again with the 1337. And I just kind of have to sit here for five seconds. But notice that the server can accept new responses. So it prints hey. And to demo this, basically what I can do in this other tab-- so let's say I do this in another tab, I'm going to do Curl and the same thing again. And I'm going to try to kick these guys off at the same time. So I'm going to do this, and I'm going to race over here and I'm going to do it again. And let's make it so that you can see both of them. That one printed hey and that one printed hey all the way at-- let's do that experiment again. Actually, let's use this trick, if that's OK. So I'm going to use a shell thing that allows me to basically run two copies of this program in parallel. So it'll run the first program and the second program in parallel. So now if I press Enter, it's going to make that request pretty much instantaneously at the same time. So let's give this a shot. So now notice it says two processes. And if you're curious, that 27,000 number is basically the process ID. And then notice, they printed hey at the same time. It wasn't like we had to wait five seconds for one and then after that, five seconds later get the second. So that's kind of, in some ways, it's not really evidence, but it's intuitive evidence that it's not just like waiting five seconds and blocking the entire thread. OK cool. So Aaron asked a question earlier that was, well what if we do do something-- Yep? AUDIENCE: Wait. How is that different from printf buffer, though? Doesn't it automatically do that? Why do we have to worry about it? KEVIN SCHMID: Oh, could you say that one more time? AUDIENCE: Doesn't like printf buffer do the exact same thing? KEVIN SCHMID: The printf buffer? AUDIENCE: Yeah. OK. Wasn't in one of the quizzes they were talking about how if you right printf something and then have it pause one second, and then you have it loop ten times, it'll wait ten seconds and then printf everything together? KEVIN SCHMID: Oh, OK. AUDIENCE: Is it doing the same thing then in this case? KEVIN SCHMID: So the question was basically in one of the former quizzes or something, there was a question that basically if you say print f 10 things at a time and then slept like in the process of printing those out, at the end for some reason, it would just dump those all out on the screen. So there's kind of two different concepts here. So I guess one thing is that, in this case, we're dealing with two different sort of people asking the server for things at the same time. And the reason that the printf kind of waits like that and dumps it all out at once is more related to how printf kind of-- so the way printf is actually implemented is it basically has to talk to the operating system to write that stuff to the console. So it doesn't want to do all of that stuff immediately when you say printf some string because that could get expensive if it has to do that every time. So if you do printf hey, your program might not actually print that immediately to the console. It might say, OK, I wrote it. And then kind of wait for you to give it a little more before actually writing it out to the console. So the reason that that was the case-- and it's kind of unrelated to the sleep-- is that the sleep was sort of just injected in there to demonstrate the fact that it doesn't write it synchronously. But the reason for that is just performance so that you don't have to make that many contacts to the operating system. But here, what we're really trying to do with this sleep thing is just show that when we have two people visiting this website, it's not going to put them in a line where it's going to say I have to help you, and then when I'm totally finished helping you after these five seconds, then I'm going to move onto the next person. So the first person's request doesn't tie up that event loop if that makes sense. But here is actually an example of something that will tie up the event loop. So here's a horrible function to compute the nth Fibonacci. It's literally the worse way you can compute the nth Fibonacci number. And this is actually just to acknowledge where this came from, there's actually-- I mean, you can try to go find it-- but there's like a very lengthy blog post that somebody wrote. It's like one of those Reddit things. But somebody criticized Node.js, and they used this as an example. So I kind of wanted to just show you two different perspectives just to get a general understanding of the concepts behind these two things. But this is chosen as just a horrible, horribly inefficient computationally intensive way to compute the nth Fibonacci number. So just as a side note, why is it horrible like in one way? Yep? AUDIENCE: Say you start out with 1,000. 1,000 splits into 999 and 998. Each of this splits into two things. Each of this splits into two things. KEVIN SCHMID: Right. AUDIENCE: All the way down. KEVIN SCHMID: Exactly. So just to repeat for the camera, if I call fib on like 1,000 or something like that, it's obviously not less than or equal to one so I'm going to go to this else case, and then I'm going to call fib 999 plus fib 998. And then pretty much all of that work that fib 999 does is kind of at this level. If you go down, it's even more redundant than that, but if you just think computing fib 998 gets us pretty close to fib 999. So we should really be a little more clever about how we kind of reuse these, but we're not reusing these things at all. So you can imagine this gigantic, gigantic tree that's just horrible. But anyway, OK. So that was fib. It just takes a while to run. Yep? AUDIENCE: [INAUDIBLE]. KEVIN SCHMID: Oh, could you repeat the question? AUDIENCE: [INAUDIBLE]. KEVIN SCHMID: Oh so this is just code that's going to be sort of on the server side. So this is not going to be found in the browser or anything. It's basically what we have is that when the user here pretty much makes their request again, when we sort of make a request, we're going to call this function on the server side. And then we'll get the result back from calling that function. And then we'll just print it to the user. So the user doesn't really deal with this function too much. Was that the question? Does that make sense? OK. Cool. So again, we do this whole res.writeHead thing where we print out the header. And then I end the response by doing the magic number is fib 45. So let's just run this server. So I'm going to do a Node fib.js. So now my fib server is running. And then here, I'm going to do one of these. OK? So I'm just going to say, Curl. So it's going to take a little while but hopefully soon it will finish and it will print out that 45th Fibonacci number. AUDIENCE: [INAUDIBLE]. KEVIN SCHMID: It should get done pretty soon. So it should take five to six seconds. I don't know that's just V8 being super fast, but in any case, this is a very short example and purposely inelegant of a non-trivial computation. So after a while, it does get this. But now, what if I do that same kind of experiment as before where I make two requests at the same time? So here I'm going to a Curl on that address, and I'm going to do another Curl. And remember, when we did this for the sleep server, when we basically had it after five seconds, they pretty much both came back right around the same time. So it wasn't particularly tied up. But let's try it now. OK, so we got our two processes. Remember those are the process IDs. This is going to be a little awkward while we stall. So let's just stay here and wait. So one of them should come back after like-- OK, so one came back. But then why didn't the second one come back just yet? Yep? AUDIENCE: The server can't do anything while it's computing that big number. KEVIN SCHMID: Right. So the response was just that the server really can't do anything while it's computing that Fibonacci number. So now I just got my two things back. But I guess just to think about the code a little more, how it's working and everything. So this function here is the code that I've told this server to run when it receives a new incoming request. So it's just going to run through this entire code, and then it's going to go back to the event loop and then continue checking for new events. So basically what we have happening is the server is listening for new things. The first person asks for what 45 is. We run this code to compute it. This code takes roughly five to six seconds to run. Then we go back to the event loop and check for new requests. So this is an example of how, if you have things that are so-called compute bound, or use a lot of computational, not power, but like are computationally intensive-- I guess one thing to say about this is that this function is doing totally, for the most part, pretty useful work right. The entire time that that callback function was running, it was pretty much spending most of its time just computing that nth Fibonacci number. But we only had one thread to deal with. In the Apache model, when two people made the request to get fib 45, we would have had two different threads. And then the operating system's job would have been, or the user level code that manages the threads, would've been to slice that up on the CPU, or even if you had multiple CPUs, distribute them evenly across the CPUs so that they were both finish roughly at the same time. So just to show you how we can sort of-- and this is not a total perfect solution, but sort of how we can make a come back here and do a little bit better. So what I have here is a program called Fib C. And this basically uses another one of Node's modules called The Child Process Module. So I've included that at the top kind of like I would do a pound include child process.h or something. Now I have access to this CP variable which has all my functionality. So now what I'm doing in this response handler is I'm running this program dot slash fib 45. So what I've done-- and I'm just going to step out of this program for a little bit-- is I've written a C program that basically computes the nth Fibonacci number. So here's just a program I've written in C that computes this. I can compile it, and I can run it at the command line. And it's going to compute the 45th Fibonacci number. So notice it just takes pretty much as long. I probably could have used dash 03 to optimize it or something like that, but I just did like regular compiler settings. And it prints it out. But now, what am I kind of doing? Oh sorry, wrong file. So I do the same stuff with the header as before. Then I do this cp.exec. So what this is going to do is it's going to run this program. But the way this works is that it's not going to wait for that program to finish. It just basically says execute this program. So basically type this into the command prompt kind of. And then, when you're done with it, run this function. So now we kind of get this whole restored thing of like we're not waiting. Does that kind of make sense? Yep? AUDIENCE: [INAUDIBLE]? KEVIN SCHMID: So this will actually open up a new process to do it. So this is actually, in some ways, evil, not super evil, but it is important to say that this is kind of going back to, on one hand, the Apache model where we do threads and processes for each request or processes for each request. So this is kind of analogous to what Apache does. In some cases, it will just use a new thread, which is a little more light weight than a process, but Apache could end up forking a new process which is kind of what we do here implicitly by doing dot slash fib 45. And then in that case, we kind of incur the same expenses of processes. So this is just one thing you can do. But just to show this sort of running. And this talk is just really aimed at presenting these kind of programs as a way to show different perspectives on how to design servers like that. So this is running, and then now if I do this again, I got two process IDs. Let's just talk about things to point out. So notice that they're incrementally. That's cool. Because it was 27,122 before. But notice now, they came back at roughly the same time. And now, a good question to ask about why was that the case is, whose job was it now to sort of make these things kind of play fair with each other, these two instances of dot slash fib 45 that I ran or that Node ran? Who sort of makes it fair that they both get kind of balanced run time? AUDIENCE: [INAUDIBLE]. KEVIN SCHMID: Yeah. So basically, when I do dot slash fib 45 or something like that, now it's kind of up to the operating system to handle the runtime of those programs. And now it can schedule them on different CPUs or it can schedule them. It can slice up the time that one CPU gets it or that they get to run on one CPU. So that's the idea behind that. Does that make sense to everybody? So now Node isn't really playing a part in dividing up these tasks. OK. So that's almost it for examples. I just wanted to show one more thing because a lot of this so far has been not totally super practical in some cases. I can imagine coming home after this talk and something and saying like, well I kind of got out of that talk that I can make a Fibonacci server for my final project. So here's just sort of one more example that hopefully will be-- maybe not, but maybe-- a little more sort of relevant to final projects and thinking ahead for things like that. So this is chat.js. So this is kind of like some sample server side code that you could use to set up a small chat server like you may have seen on the Facebook Chat or whatever. So I'm not saying this is like Facebook Chat, but this is kind of like a good-- maybe not good, but maybe good-- starting point for a chat server for your website for a final project. So let's look at what it's doing. So we're getting this special thing at the top, this var SIO equals require Socket.IO. So this is another thing that it doesn't actually come bundled with Node but you can install it. It's a Node module. So it's just like some extension to Node. SocketIO is actually really kind of cool. It's an abstraction that basically what it does is is it allows you to have this stream of communication between a web browser and a web server. So for the most part so far, we've had these very quick one second or two second communications between a web browser and the web server. So it's basically go to google.com, get the stuff, send it back, and then we're done. We're never talking again until the user types in something else. But what Socket.IO and similar kind of things-- and SocketIO is actually one of the things that is built on as WebSocket which is sort of available as part of HTML5-- that allows you to have this continuing dialogue. And this is very useful in a chat server kind of thing because it is kind of like a continuing dialogue in some ways because if you're chatting with somebody, you can now just send a message down the pipe, and then the server can send a message down the pipe to the other person you're chatting with. And then you can have this exchange like that. So that's kind of what SocketIO is good for. The reason that SocketIO uses WebSockets as one thing is that in addition to just plain old WebSockets, it also does some tricks to basically make it browser compatible. So browsers like Internet Explorer unfortunately don't support WebSockets right out of the box. So it uses some other kind of cool neat things with Adobe Flash to allow you to have cross browser support. So that's really useful. And actually, I know I'm kind of running on time here, but CS50 Discuss, have you ever seen something like, I don't know, blank so and so is replying to this post or something like that, that feature? That's SocketIO. So when somebody starts typing in the discuss box to make a reply or something, your browser does what's called in SocketIO emits some kind of event that says somebody's replying to this post. Then the server says, OK, what do I have to do? Well now I have to tell those other guys who are on CS50 Discuss looking at this post that somebody's replying. So that's kind of what SocketIO is good for, this continuing kind of stream of dialogue. OK. So what I have here-- and we're just going to ignore the connections array for a little bit-- what I do is I do another listen. So that's just the way in Socket.IO is saying let's listen on this port. And then I do this on connection. So that's just basically Socket IO's way of saying, when we receive a connection, I want you to run this code. And notice that instead of having rec and res passed in there I have Socket. And this Socket idea is basically this thing that you can write to and read from that has the user's messages possibly. And the messages that you would send can go through that Socket. Does that make sense? So it's this continuing thing. So what I do is I call Socket.emit. And emit takes pretty much two arguments. The first argument is a string just representing the type of thing you're emitting. So for this case, I've use this string new message. And that's just basically saying that the type of this thing, what I'm sending, is a new message. So you can listen for specific types like new message or whatever by using dot on. So connection and user sent there, if you look at where we call dot on, those are other strings that represent types of user messages. So it's basically you can have this emit one of these message types, and then do something in response to one of these message types So I'm emitting this new message. We're going to ignore connections.push for a second. But then I say, Socket.on user sent. So now it's kind of like when the user sends me a message, I want you to run this code. And notice that that anonymous function is taking in this variable called data which is basically going to have the user's message. So now let's kind of talk about the connections array. So this is designed for a chat client where basically everybody's kind of in the same chat room. So basically, what we need to keep around is some array that basically represents all the people chatting in some ways, if that makes sense. Right? Because we need to know who those guys are so we can send them the messages that other people send to us. So what this code does is when user sends a message-- that's the type of the event-- we're going to run this code. And what we do is we run through this array that we have called connections. And pretty much for every connection except the one that's ours, that's what this code says, we send a new message with that attached message information. So if you notice here, what I did when the user actually makes a new connection is I've added with the JavaScript.push method, this is basically just saying like add that Socket as a value into our connections array. So now when this code runs, it will send things to those particular connections. So this can be a good starting point for making a chat server or something similar. And the kind of cool thing is that the code that you see here for like on and emit and stuff like that is the same kind of JavaScript code that you would write in the browser to interact with the server. So that's why SocketIO is kind of neat and useful in that way. Oh and just one more thing real quick. There was a CS50 final project last year that basically implemented a chat server in Node.js. I think it's Harvardchats.org but I'm not-- OK. I'm not sure what the URL is, but I can send that out afterwards. But it's kind of cool what you can do with Node.js. So I hope, in general, you guys have a good sense of what Node.js is useful for and how you could maybe apply to your final project. I will be sending out some more resources along with this. And thank you for coming. Thank you. [APPLAUSE]