DAVID MALAN: Hello, world, this is CS50Live, and boy, do we have an amazing episode for you today. First, an inside look at how Dropbox works, then a close look at tiny hamsters eating tiny burritos, and lastly, a behind-the-scenes look at CS50's new film, Persistence. But first, Dropbox.com, where we recently traveled in San Francisco, California, to meet with CS50's former head teaching fellow, Thomas Carriero who gave us a tour of Dropbox and exactly what it's like to work and, daresay, live there. Now we sat down not only with Thomas while there, but also with CS50's former head teaching fellow Alex Allain to talk about the underlying workings of Dropbox.com and it's distributed architecture. Let's take a look. THOMAS CARRIERO: I'm Thomas Carreiro I'm a software engineer at Dropbox. ALEX ALLAIN: I'm Alex Allain. I am an engineer here at Dropbox. THOMAS CARRIERO: Yeah, so I was actually the first head CF for CS50 when David Malan took over the class. I had already been teaching CS50 for two semesters with Mike Smith, who was there the prior professor there. ALEX ALLAIN: So I actually didn't take CS50, but I did TF it twice, once as a regular TF and then in my senior year, I was actually Head TF of CS50, which was a lot of fun. THOMAS CARRIERO: So when David reached out to me about setting up Dropbox in the CS50 appliance, I was really excited, because we actually have a Linux client. Most of our users use either Windows or the Macintosh clients, but the Linux, Macintosh, and Windows clients are all, actually, very similar. So what we did is we pre-installed the Dropbox Linux client in the CS50 appliance, and it runs just like all of our other Linux users. ALEX ALLAIN: So the way Dropbox works is it runs as the client on many different operating systems and devices. The Drobox desktop client is one of the most well known, one of the most interesting. THOMAS CARRIERO: So Dropbox basically takes all of the files that you put in the folder, and it chunks those files into four megabyte chunks. So we'll take a 100 megabyte PDF file, and we'll chunk it into 25 four megabyte chunks. Those chunks are then encrypted, and then we send them our block servers. ALEX ALLAIN: The block servers are the storage for the blocks themselves, and so each block is stored in the block server with the data and a SHA-256 hash of that blocks. That's a very basic encryption primitive that summarizes, in some sense, the data in a very a unique way that's unique to that data. You could upload the whole file all at once, but it turns out if you do that with really large files, they take a really long time to upload, and if you have a failure, you're out of luck and you have to restart it. What we then do is we tell another server in our system, what we call the meta server, the metadata server, hey, this is a file and it's composed of the following list of blocks. And we pass up the hashes to identify those blocks rather than re uploading the whole block. The meta server then checks with the block servers, makes sure the blocks are there-- if the are, perfect, everything is good. THOMAS CARRIERO: When we want to, basically, download the file from the internet, let's say, we'll ask meta server first, hey, can you tell me about where this file is located, and meta server will say, oh, well, this file is actually 25 four megabyte chunks, and here they are. And then we'll go to block server and we'll actually download each of those chunks, and then we'll reconstruct the file from there, and then we'll start the download. Yeah, so Dropbox deals with scale, basically, by very, very aggressive sharding. ALEX ALLAIN: So sharding is when you take all of the users in your startup or your company, and maybe they used to be on one database, and that works great until you hit a certain number of users, and really what you want to do is find some way to split those across two databases or maybe more than two-- ideally, enough that you can have every user in the world. So when you shard, what you do is you find some way of deciding which database to go to that doesn't require hitting a central directory, or maybe it's a very quick, cheap look-up in that central directory. THOMAS CARRIERO: We never have everything stored in one database, because that's almost never going to scale. So instead, what we do is we'll take all of that information, all of the files are all of the metadata and we'll shard it across hundreds or thousands of logical databases. That means that when we have a request for a user's information, we'll first say, hey, which database is this user's information stored in, and then we'll basically use that decision to go find that database, and that's where we'll load all the files or all the metadata about the files. So we use a lot of sharding, but sharding's not always enough. You actually need to cache a lot of the common requests, because even though database queries can be expensive. So we also do progressive caching strategies to make sure that the most common requests are quite easy to compute, and basically, that makes it a lot faster and makes it work at scale. So that's, at a very high level, kind of how Dropbox works. ALEX ALLAIN: My name's Alex Allain. THOMAS CARRIERO: I'm Thomas Carreiro. ALEX ALLAIN: And this is CS50. DAVID MALAN: Now if you've ever wondered where this quote on CS50's website comes from, it's actually Alex who is the original author. Now, speaking of Dropbox, I recently received this email from them in my inbox-- Hi, David, you may notice that some of your shared links aren't working, and we wanted to reach out to you personally to let you know why. Well what's a shared link? Well, if you've used Dropbox beyond simply saving your source code inside of the appliance, you might know that you can create shared links by typically right clicking on a file and copying the url to your clipboard. That shared link might look a little something like this, but instead of the word secret, there's actually something more cryptic there, like a sequence of random letters and numbers. The idea being that I can now email or Gchat this kind of url to a friend, and he or she could access CS50.txt and download it onto his or her computer. And only by knowing that url, or with super, super low probability, guessing that url, could someone else actually access the file. Unfortunately a company known as Intralinks recently posted on their CollaboristaBlog that there's actually a couple of threats to this particular workflow. It turns out that if you accidentally make a mistake, as I, frankly, have done in the past, and paste a url like a Dropbox shared link, into not your browser's address bar, but as pictured here, your search bar, that url, of course, is going to be submitted to a search engine like Google. Of course, Google is not going to necessarily recognize that shared link, and so you're going to get more generic search results like a link to Dropbox.com itself, and in this case, an advertisement, and in fact, advertisements, potentially, for competitors of Dropbox. In fact, that's how Intralinks noticed this-- they, too, were running an AdSense campaign alongside of keywords that Dropbox themselves might use. And so if we zoom in on the bottom results here, you'll see that Inralinks has this link to their own service. Now one of the features of Google and other search engines' advertising campaigns is that when a user like me clicks on this link, now, I am going to be disclosing the url that I typed into Google in order to find these search results. The idea being that companies would like to know how people are finding their website. Of course, if I found this page of results by pasting an otherwise secret url into Google, I've now, effectively, told Intralinks and their web logs exactly what secret url I was visiting, thereby disclosing, potentially, the contents CS50.txt. Now, there's another threat all together-- you may know, too, from Dropbox shared links that you can typically open them inside of your own browser and preview them inside of a frame like this. But if that preview contains a hyperlink, as pictured here to Example.com, and you or a user click that hyperlink, thereby opening a new tab or window with that page's url, what you've also just told the web server, by nature of how HTTP works, is the HTTP refer address from whence you came. In other words, you informed the destination website that you were previously at this supposedly secret url. Now, what Intralinks discovered by looking through their own logs is that they found quite a bit of information that was surely meant to be secret-- for instance, someone's mortgage application, someone's tax return, and bunches of more documents, as well. Now, if you'd like to learn more about this particular threat, head to Drop box's blog at this url here, and the reality is that you can't really defend against a threat in which people like me accidentally paste what should be secret urls in to search engines. You and I are simply going to have to be a bit more careful. But they have been working on redressing the other issue whereby links that are embedded in a Dropbox preview were disclosing the refer url. But head to that url for more details. But now, as promised, a closer look at tiny hamsters eating tiny burritos. [MUSIC PLAYING] DAVID MALAN: Now CS50's team recently had an opportunity to participate in a 48-hour film project, an international competition during which teams had, indeed, 48 hours alone to make a film. The catch is that you only find out what film you need to make at the very start of those 48 hours. In particular, on a recent Friday evening at 7:00 pm, we at CS50 learned that we'd be making one, a silent film, two, that the film needed to feature a character named Jeremiah Jones, a teacher, three, that the film needed to feature a diary, this one here, and four, that we needed to somehow include the line it is what it is even though, of course, we were making a silent film. Now, 26 members of CS50's team participated in this 48-hour film project, among them Colton, Dan, Padraig, and Shelley Westover, whom you may recall from such films as this one here. Now, also involved, of course, was CS50's own Ramon Galvan. Ramon, welcome to the show. RAMON GALVAN: Thank you for having me. DAVID MALAN: And CS50's own Daven Farnham. Now, Ramon, what was your role in the film? [? RAMON GALVAN: Flight code ?] director with Dan, actually. DAVID MALAN: And Daven, yourself? DAVEN FARNHAM: I was the star, so I basically made the project. I saved the film. DAVID MALAN: You saved the film. DAVEN FARNHAM: I did. DAVID MALAN: Now, you say this, but I believe we have your screen test for this film. If we could roll this clip here. DAVEN FARNHAM: My name's Daven Farnham, and this is CS50. I wanted to say CSS. This is CSS. DAVID MALAN: Now this was your first film? DAVEN FARNHAM: Uh, no, maybe. DAVID MALAN: No, well, at least this time around it was a silent film. DAVEN FARNHAM: Yes. DAVID MALAN: So at 7:00 pm, we found out those required ingredients, and then we immediately dived in as a group to figure out what movie we were actually going to make. Do you want to walk us through what that night was like? DAVEN FARNHAM: So basically we got the idea at 7:00, we basically started to brainstorm, so we all kind of gathered around a whiteboard and started brainstorming ideas, and then by 9:00, we tried to throw it off to writers, and the writers took it from there. DAVID MALAN: And meanwhile, Dan and Shelley and I actually headed to Target, of course, our favorite nearby store, to pick up all the props for the movie we had decided on, which at that point was-- DAVEN FARNHAM: We had decided on a parity adventure film. DAVID MALAN: Whic was going to be quite like Indiana Jones. DAVEN FARNHAM: Yes, so we needed a bull whips and we needed a fedora and stuff. DAVID MALAN: And a very ornate piece of jewelry that he would then find at the end of the episode. Of course, we get back at midnight or so from Target and realize, nope, that's not the movie we're making-- psych. DAVEN FARNHAM: Completely different film. RAMON GALVAN: We had a film noir for a couple of hours, then we had a romantic comedy at the end. DAVID MALAN: So by 4:00 a.m., we had a romantic comedy, and around 5:00 a.m., you and Dan , the other director, showed up. RAMON GALVAN: Yeah, so we got together and we kind of planned out where we wold shoot, what's scenes we would shoot first, and then around 7:00 or 8:00 a.m., we actually went out and started shooting. DAVID MALAN: Well, if you can stick around, we'd love to do some behind-the-scenes looks at how the film was made, but I think first, shall we give folks the world premiere of CS50's film, Persistence. [MUSIC PLAYING] DAVID MALAN: Guys, I mean-- so let's start from the top. So the very first scene we all shot as a group that morning took place around 8:00 a.m, and we were actually here, Jefferson Hall, which is actually one of the physics lecture halls on campus. And what was the goal with this scene? RAMON GALVAN: So we were here to start the movie, Daven as a teacher, a teaching fellow, or a teaching assistant, something like that, and he was really upset that he sees this couple walking out and he wants that. He wants to be in a relationship, he just doesn't have it. DAVID MALAN: And then the next scene, we transition to actually wasn't shot in order. In fact, here, you are-- DAVEN FARNHAM: So here, actually, we shot this this-- this was one of the last scenes we shot, but this actually shows up at the very beginning of the film. And so in this scene, it's a montage, and so what I'm doing is I'm putting on cologne, I'm combing my hair. DAVID MALAN: Do you use cologne? DAVEN FARNHAM: Uh, Ramon's cologne, lots of cologne. DAVID MALAN: And whose shirt? DAVEN FARNHAM: Uh, Ramon's shirt. DAVID MALAN: So that was more than one take, and the shirt by the end was pretty-- DAVEN FARNHAM: Yes, I think we had to take three or four takes, so each take was three squirts, so there were about 12 squirts of cologne. So I smelled like that cologne for the rest of the day. DAVID MALAN: Well, at least, very quickly. We transitioned outside, and, in fact, if you look closely, this is actually CS50's own Lauren Caraballo. But what were you thinking with this scene? DAVEN FARNHAM: Right, so in this scene, we're trying to get her attention. So I'm walking by her, I'm peacocking, of course. DAVID MALAN: Peacocking? DAVEN FARNHAM: Oh, you don't know? RAMON GALVAN: Uh, should I? DAVEN FARNHAM: Yeah, of course, of course. So normal walking, of course, is just normal walking. DAVID MALAN: So this is normal walking? DAVEN FARNHAM: That's normal walking. Peacocking, throw a little hips in there. RAMON GALVAN: It's really all right here. DAVEN FARNHAM: It's all right here. It's from this qua-- It's all in the hips. And then at the very end, you have to pop and lock. It's key to the maneuver-- it's key. DAVID MALAN: Pop and lock. All right, well, you actually did a lot of physical comedy in the film. In fact, one of the next scenes was here at Lamont Library, outside the door. DAVEN FARNHAM: Yes, they're right here, so I'm actually trying to-- I think it's a pull door and I'm pushing, and as that scene progresses, I'm pushing and pushing ever more aggressively. And I think at the end, someone actually knocks me out of the way. DAVID MALAN: Yeah, and in fact, we didn't notice the-- until the editing phase. RAMON GALVAN: Yeah, so, if we zoom into this shot, and Dan, can we enhance a little bit? OK, perfect. So you got to see me crouched down about to pop up and knock Daven in the face with the door. DAVID MALAN: That we fun found footage just hours before we had to ship the films for the deadline. All right, well thank you both so much for joining and for starring in such-- DAVEN FARNHAM: Oh, no thank you. RAMON GALVAN: Thank you. DAVID MALAN: --a moving film. Well that is it for CS50Live. Thanks so much to our friends at Dropbox, thanks so much to everyone behind the camera, CS50's own Ramon Galvan and Daven Farnham. This was CS50 and this was our favorite scene that didn't it make it into the film. DIRECTOR 1: Are we gonna get the car in the road? DIRECTOR 2: It's coming, that's OK. [HORN HONKING] ACTRESS: Whoa. Oh, god.