THOMAS CARRIERO: I'm Thomas Carriero. I'm a software engineer at Dropbox. ALEX ALLAIN: I'm Alex Allain. I am an engineer here at Dropbox. 

THOMAS CARRIERO: Yes, I was actually the first head TF for CS50 when David Malin took over the class. I had already been teaching CS50 for two semesters with Mike Smith, who was the prior professor there. 

ALEX ALLAIN: So I actually didn't take CS50, but I did TF it twice. Once as a regular TF, and then my senior year I was actually head TF of CS50, which was a lot of fun. THOMAS CARRIERO: So when David reached out to me about setting up Dropbox in the CS50 appliance, I was really excited, because we actually have a Linux client, so most of our users use either Windows or the Macintosh clients, but the Linux, Macintosh, and Windows clients are all actually very similar. 

So what we did is we pre-installed the Dropbox Linux client in the CS50 appliance, and it runs just like all of our other Linux users. 

ALEX ALLAIN: So the way Dropbox works is it runs as a client on many different operating systems and devices. The Dropbox desktop client is one of the most well known, and one of the most interesting. 

THOMAS CARRIERO: So Dropbox basically takes all the files that you put in the folder and it chunks those files into four-megabyte chunks. So we'll take a 100-megabyte PDF file and we'll chunk it into 25 four-megabyte chunks. Those chunks are then encrypted and then we send them to our block servers. 

ALEX ALLAIN: The block servers are the storage for the blocks themselves, and so each block is stored in the block server with the data and a Shaw 356 hash of that block. That's a very basic encryption primitive that summarizes, in some sense, the data in a very unique way that's unique to that data. 

You could upload the whole file all at once, but it turns out if you do that, really large files take a really long time to upload, and if you have a failure, you're out of luck and you have to restart it. 

What we then do is we tell another server in our system, and what we call the metadata server, that hey this is a file, and it's composed of the following list of blocks. And we pass up the hashes to identify those blocks rather than re-uploading the whole block. The metaserver then checks the block servers, makes sure the blocks are there. If they are, perfect. Everything is good. 

THOMAS CARRIERO: When we want to basically download the file from the internet, let's say, we'll say to the last metaserver first, hey can you tell me about where this file's located? And metaserver will say, oh this file's actually 25 four-megabyte chunks, and here they are. And then we'll go a block server and actually download each of those chunks. And then we'll reconstruct the file from there, and then we'll start the download. Yes, so Dropbox of deals with scale basically by very, very aggressive sharding. 

ALEX ALLAIN: Sharding is when you take all of the users in your start up or your company and maybe they used to be in one database, and that works great until you hit a certain number of users. And really what you want to do is find some way to split those across two databases, or maybe more than two. Ideally, enough that you can have every user in the world. 

And so when you shard, what you do is you find some way of deciding which database to go to that doesn't require hitting a central directory. Or maybe it's a very quick, cheap look-up central directory. 

THOMAS CARRIERO: We never have everything stored in one database, because that's almost never going to scale. So instead, what we will do is take all that information, all the files that are stored on the metadata, shard across hundreds or thousands of logical databases. And that means that when we have a request for a user's information, we'll first say, hey which database is this user's information stored in? Then we'll basically use that decision to go find that database and that's where we'll load all the files or all the metadata about the files. 

So we use a lot of sharding. But sharding is not always enough. You are actually need to cache a lot of the common requests, because even those database queries can be expensive so we also do aggressive capturing strategies to make sure that the most common requests are quite easy to compute. And basically that makes a lot faster and it makes it work ex scale. So that's at a very high-level how Dropbox works. 

ALEX ALLAIN: I'm Alex Allain. 

THOMAS CARRIERO: And I'm Thomas Carriero. ALEX ALLAIN: And this is CS50.