1 00:00:00,000 --> 00:00:11,664 2 00:00:11,664 --> 00:00:13,830 MICHAEL D. SMITH: This afternoon I have the pleasure 3 00:00:13,830 --> 00:00:16,830 of introducing Mark Zuckerberg, which is one of our guest speakers 4 00:00:16,830 --> 00:00:21,030 this semester to come and talk a little bit about computer science 5 00:00:21,030 --> 00:00:22,040 in the real world. 6 00:00:22,040 --> 00:00:25,350 As most of you probably know, as you guys all do this much more 7 00:00:25,350 --> 00:00:30,090 than I do, founder of Facebook.com, which is a social networking 8 00:00:30,090 --> 00:00:32,200 program, whatever you want to call it. 9 00:00:32,200 --> 00:00:37,094 Used at over 2000 schools across the nation, and possibly the world too. 10 00:00:37,094 --> 00:00:38,760 Is it the world too, or just the nation? 11 00:00:38,760 --> 00:00:39,750 >> MARK ZUCKERBERG: [INAUDIBLE]. 12 00:00:39,750 --> 00:00:40,740 >> MICHAEL D. SMITH: OK. 13 00:00:40,740 --> 00:00:45,007 So good influence for doing some things in computer science. 14 00:00:45,007 --> 00:00:47,090 He's going to tell us some of the background of it 15 00:00:47,090 --> 00:00:48,780 and what's been important and so forth. 16 00:00:48,780 --> 00:00:50,140 So please join me in welcoming. 17 00:00:50,140 --> 00:00:56,080 18 00:00:56,080 --> 00:00:57,360 >> MARK ZUCKERBERG: Yo. 19 00:00:57,360 --> 00:00:57,540 All right, cool. 20 00:00:57,540 --> 00:01:00,248 This is the first time I've ever had to hold one of these things. 21 00:01:00,248 --> 00:01:02,818 So I'm just going to attach it really quickly, one second. 22 00:01:02,818 --> 00:01:14,270 23 00:01:14,270 --> 00:01:14,770 All right. 24 00:01:14,770 --> 00:01:16,264 Can you hear? 25 00:01:16,264 --> 00:01:17,740 Is this good? 26 00:01:17,740 --> 00:01:19,160 Is this amplified at all? 27 00:01:19,160 --> 00:01:19,797 >> AUDIENCE: Yeah. 28 00:01:19,797 --> 00:01:20,922 MARK ZUCKERBERG: All right. 29 00:01:20,922 --> 00:01:21,979 Sweet. 30 00:01:21,979 --> 00:01:24,895 This is like one of the first times I've been to a lecture at Harvard. 31 00:01:24,895 --> 00:01:31,950 32 00:01:31,950 --> 00:01:35,480 I guess what's probably going to be most useful for you guys is if I just 33 00:01:35,480 --> 00:01:39,104 take you through some of the courses that I took at Harvard where I actually 34 00:01:39,104 --> 00:01:40,270 did go to lecture sometimes. 35 00:01:40,270 --> 00:01:42,230 I was joking. 36 00:01:42,230 --> 00:01:44,120 And sort of, like, how different decisions 37 00:01:44,120 --> 00:01:46,711 that I had to make when I was moving along with Facebook 38 00:01:46,711 --> 00:01:49,460 got impacted by different stuff that I was learning in the classes 39 00:01:49,460 --> 00:01:50,427 that I was taking. 40 00:01:50,427 --> 00:01:53,010 And if all goes according to plan, then maybe some of you guys 41 00:01:53,010 --> 00:01:55,759 will come out of this thinking that taking CS or engineering stuff 42 00:01:55,759 --> 00:01:57,760 at Harvard is actually sort of useful. 43 00:01:57,760 --> 00:02:00,640 So that's the game plan. 44 00:02:00,640 --> 00:02:02,670 >> I think that this is slotted for two hours. 45 00:02:02,670 --> 00:02:04,670 There's no way I'm going to speak for two hours. 46 00:02:04,670 --> 00:02:07,110 I'll probably speak for like 20 minutes, or 15 minutes, 47 00:02:07,110 --> 00:02:08,750 and then I'll just let you guys ask questions. 48 00:02:08,750 --> 00:02:10,958 Because I'm sure you guys have more interesting stuff 49 00:02:10,958 --> 00:02:14,816 to ask me than I could come up with to talk about myself. 50 00:02:14,816 --> 00:02:16,760 >> So I guess I'll just kind of get started. 51 00:02:16,760 --> 00:02:19,810 52 00:02:19,810 --> 00:02:23,580 When I was here, I started off taking 121. 53 00:02:23,580 --> 00:02:26,297 I never actually took 50. 54 00:02:26,297 --> 00:02:28,130 You should have gotten the other guy who was 55 00:02:28,130 --> 00:02:31,190 doing Facebook, Dustin Moskovitz, who was my roommate. 56 00:02:31,190 --> 00:02:34,792 When we got started the site was written in PHP, which isn't something 57 00:02:34,792 --> 00:02:36,500 that you learned in one of these classes. 58 00:02:36,500 --> 00:02:38,960 But fortunately, if you have a good background in C, 59 00:02:38,960 --> 00:02:42,790 the syntax is very similar, and you can pick it up in a day or two. 60 00:02:42,790 --> 00:02:49,020 >> So I started writing the site and launched it at Harvard 61 00:02:49,020 --> 00:02:51,040 in February 2004. 62 00:02:51,040 --> 00:02:52,750 So I guess almost two years ago now. 63 00:02:52,750 --> 00:02:55,770 And within a couple of weeks, a few thousand people had signed up. 64 00:02:55,770 --> 00:02:57,686 And we started getting some emails from people 65 00:02:57,686 --> 00:03:00,800 at other colleges asking for us to launch it at their schools. 66 00:03:00,800 --> 00:03:02,725 >> And I was taking 161 at the time. 67 00:03:02,725 --> 00:03:05,350 So I don't know if you guys know the reputation of that course, 68 00:03:05,350 --> 00:03:07,662 but it was kind of heavy. 69 00:03:07,662 --> 00:03:10,370 It was a really fun course, but it didn't leave me with much time 70 00:03:10,370 --> 00:03:12,510 to do anything else with Facebook. 71 00:03:12,510 --> 00:03:16,360 So my roommate Dustin, who I guess had just finished CS50, 72 00:03:16,360 --> 00:03:18,370 was like, hey, I want to help out. 73 00:03:18,370 --> 00:03:22,009 I want to do the expansion and help you figure out how to do the stuff. 74 00:03:22,009 --> 00:03:24,050 So I was like, you know, that's pretty cool dude, 75 00:03:24,050 --> 00:03:26,500 but you don't really know any PHP or anything like that. 76 00:03:26,500 --> 00:03:29,919 So that weekend he went home, bought the book Perl for Dummies, 77 00:03:29,919 --> 00:03:31,960 came back and was like, alright, I'm ready to go. 78 00:03:31,960 --> 00:03:35,560 I was like dude, the site is written in PHP, not Perl, but you know, 79 00:03:35,560 --> 00:03:36,270 that's cool. 80 00:03:36,270 --> 00:03:41,800 >> So he picked up PHP over a few days because, I 81 00:03:41,800 --> 00:03:44,090 promise that if you have a good background in C, then 82 00:03:44,090 --> 00:03:46,230 PHP is a very simple thing to pick up. 83 00:03:46,230 --> 00:03:49,390 And he just kind of went to work. 84 00:03:49,390 --> 00:03:53,120 So I mean, the first big decision that we really had to make 85 00:03:53,120 --> 00:03:57,140 was in how to kind of expand the architecture 86 00:03:57,140 --> 00:04:01,920 to go from the single school type set up that we had when it was just at Harvard 87 00:04:01,920 --> 00:04:03,830 to something that supported multiple schools. 88 00:04:03,830 --> 00:04:06,890 >> So this was a decision that had to be made on a bunch of levels, 89 00:04:06,890 --> 00:04:10,000 both in the product and how we wanted privacy to work, 90 00:04:10,000 --> 00:04:12,510 but I think that one really important decision that's 91 00:04:12,510 --> 00:04:16,180 helped us scale pretty well is how we decided to distribute the data. 92 00:04:16,180 --> 00:04:21,680 >> So I don't know how much of complexity stuff like big O notation you guys 93 00:04:21,680 --> 00:04:23,620 in this class. 94 00:04:23,620 --> 00:04:28,610 So I mean, one of the most complicated computations that we do on the site 95 00:04:28,610 --> 00:04:32,080 is the computation to tell how you're connected to people. 96 00:04:32,080 --> 00:04:34,680 >> Because if you can imagine, that's stored 97 00:04:34,680 --> 00:04:42,210 as sort of a series of undirected-- it's not weighted-- so undirected, 98 00:04:42,210 --> 00:04:45,970 unweighted pairs of ID numbers of people in the database. 99 00:04:45,970 --> 00:04:49,647 Then if you want to figure out who is friends with someone, 100 00:04:49,647 --> 00:04:51,230 you have to look at all their friends. 101 00:04:51,230 --> 00:04:51,730 Right? 102 00:04:51,730 --> 00:04:54,000 So that's maybe like 100 or 200 people. 103 00:04:54,000 --> 00:04:57,050 >> But then if you want to figure out who's a friend of a friend, 104 00:04:57,050 --> 00:04:59,800 or what the closest connection is there, then you kind of 105 00:04:59,800 --> 00:05:03,440 have to look at the 100 or 200 friends of each of those friends. 106 00:05:03,440 --> 00:05:10,020 So it becomes at each level there's another factor of n multiplied n, where 107 00:05:10,020 --> 00:05:13,861 n is the number of friends that each of your friends has. 108 00:05:13,861 --> 00:05:16,110 So you can see that this kind of becomes exponentially 109 00:05:16,110 --> 00:05:20,076 difficult to solve for the shortest path between people. 110 00:05:20,076 --> 00:05:22,950 So if you're just looking for a friend of a friend, that's n squared. 111 00:05:22,950 --> 00:05:25,520 If you're looking for a friend of a friend of a friend, that's n cubed. 112 00:05:25,520 --> 00:05:27,311 And that's something that traditionally was 113 00:05:27,311 --> 00:05:31,590 pretty difficult for a lot of the predecessor sites to Facebook. 114 00:05:31,590 --> 00:05:34,210 And for example Friendster had large problems with this 115 00:05:34,210 --> 00:05:37,520 because they were trying to compute paths six degrees out, 116 00:05:37,520 --> 00:05:38,870 or like seven degrees out. 117 00:05:38,870 --> 00:05:42,330 >> And that's something that when you're doing like n seventh, 118 00:05:42,330 --> 00:05:47,560 that just is really very hard and it took down their site for a while. 119 00:05:47,560 --> 00:05:51,950 So one of things that we kind of had in mind when we were figuring out 120 00:05:51,950 --> 00:05:56,070 how to do this was how do you distribute the database in such a way 121 00:05:56,070 --> 00:05:58,820 that this computation becomes manageable. 122 00:05:58,820 --> 00:06:03,570 >> So what we decided was that everyone on the site 123 00:06:03,570 --> 00:06:06,800 does most of their activity at the school that they're kind of based at. 124 00:06:06,800 --> 00:06:09,767 So if you're at Harvard, then most of the people 125 00:06:09,767 --> 00:06:12,350 who you're going to be seeing and transacting with on the site 126 00:06:12,350 --> 00:06:13,475 are going to be at Harvard. 127 00:06:13,475 --> 00:06:16,600 It's actually probably like 90% of the stuff that you do on the site. 128 00:06:16,600 --> 00:06:20,510 >> So we decided to split up the databases and create 129 00:06:20,510 --> 00:06:25,740 one instance of MySQL database for each school in the network. 130 00:06:25,740 --> 00:06:30,680 And in doing that, if you notice the paths that we compute 131 00:06:30,680 --> 00:06:32,050 are only within the school. 132 00:06:32,050 --> 00:06:35,120 So instead of say, like now we're at six million users, 133 00:06:35,120 --> 00:06:41,080 and instead of having to do n cubed over some portion of six million, 134 00:06:41,080 --> 00:06:43,850 it's just n cubed over 10,000, which is a much more 135 00:06:43,850 --> 00:06:47,760 manageable type of computation. 136 00:06:47,760 --> 00:06:50,920 >> So that was sort of the first big architectural decision 137 00:06:50,920 --> 00:06:55,210 that we had to make that contributed to us not dying a few months later. 138 00:06:55,210 --> 00:06:58,250 And it was probably a pretty important one. 139 00:06:58,250 --> 00:07:04,935 >> So when we first set up the site we had just one computer that we were running. 140 00:07:04,935 --> 00:07:06,060 It wasn't in our dorm room. 141 00:07:06,060 --> 00:07:06,851 We were renting it. 142 00:07:06,851 --> 00:07:10,780 I kind of learned my lesson for trying to run a site out of my dorm 143 00:07:10,780 --> 00:07:15,040 room a few months earlier, and Harvard almost tried to kick me out. 144 00:07:15,040 --> 00:07:18,750 >> So I ended up renting a server off site this time. 145 00:07:18,750 --> 00:07:26,540 And I guess running originally the database and the web server. 146 00:07:26,540 --> 00:07:29,280 So Apache is what we were using in this instance 147 00:07:29,280 --> 00:07:31,940 to serve the pages from the same machine. 148 00:07:31,940 --> 00:07:35,710 And because we distributed the databases in the way that we did, 149 00:07:35,710 --> 00:07:40,750 we were able to, as time went on, just add more machines linearly and sort of 150 00:07:40,750 --> 00:07:43,630 grow the site without having any kind of exponential expansion 151 00:07:43,630 --> 00:07:45,640 on the amount of machinery that we had. 152 00:07:45,640 --> 00:07:49,470 >> But after we hit about like 30 or 50 schools, 153 00:07:49,470 --> 00:07:54,020 we started realizing that we could start getting more performance out 154 00:07:54,020 --> 00:07:55,130 of MySQL or Apache. 155 00:07:55,130 --> 00:07:57,980 156 00:07:57,980 --> 00:08:02,270 Some of the way that stuff was set up just wasn't as optimal as it could. 157 00:08:02,270 --> 00:08:10,840 >> So for example, when you have MySQL machines and Apache 158 00:08:10,840 --> 00:08:14,500 running on the same server, then if something happens to that server, 159 00:08:14,500 --> 00:08:18,500 then not only does the database for that school or the schools 160 00:08:18,500 --> 00:08:20,700 on that server just stop kind of responding 161 00:08:20,700 --> 00:08:24,367 in a way that will get you anything useful, 162 00:08:24,367 --> 00:08:25,950 but you can't even load any web pages. 163 00:08:25,950 --> 00:08:27,075 So you get page not founds. 164 00:08:27,075 --> 00:08:28,250 And that kind of sucks. 165 00:08:28,250 --> 00:08:33,586 >> But another issue is that the variance and the use from school to schools 166 00:08:33,586 --> 00:08:34,919 is also not going to be perfect. 167 00:08:34,919 --> 00:08:38,049 So some schools are always going to have heavier use. 168 00:08:38,049 --> 00:08:40,760 We have schools now like Penn State that have 50,000 users. 169 00:08:40,760 --> 00:08:44,942 And then the majority of the schools still have less than 2000 users. 170 00:08:44,942 --> 00:08:47,400 Because there's a lot of small schools and a lot of schools 171 00:08:47,400 --> 00:08:49,600 that don't have complete ubiquity. 172 00:08:49,600 --> 00:08:54,920 >> So in trying to deal with this issue and make it 173 00:08:54,920 --> 00:08:59,630 so that you could deal with the fact that Penn State had 174 00:08:59,630 --> 00:09:02,240 50,000 people and just a ton of users all the time, 175 00:09:02,240 --> 00:09:05,380 and then you have some schools that don't, what we decided to do 176 00:09:05,380 --> 00:09:09,280 is separate out some of the web servers from the database servers. 177 00:09:09,280 --> 00:09:14,910 And make it so that we just had a pool of Apache web servers 178 00:09:14,910 --> 00:09:18,100 that we could load balance between. 179 00:09:18,100 --> 00:09:20,300 And make it so that you can use those uniformly 180 00:09:20,300 --> 00:09:23,690 while just having the database layer be sort of consistent. 181 00:09:23,690 --> 00:09:27,840 >> So I don't know if this stuff is interesting to you guys at all. 182 00:09:27,840 --> 00:09:35,800 Or if this is anything that matters to what you guys are studying now. 183 00:09:35,800 --> 00:09:39,260 So if there's more stuff that you guys would rather 184 00:09:39,260 --> 00:09:42,730 know about in terms of the architecture, then I'll leave that open to questions 185 00:09:42,730 --> 00:09:43,310 later. 186 00:09:43,310 --> 00:09:48,440 So I don't spend a lot of time just talking about random applications 187 00:09:48,440 --> 00:09:52,625 that you guys might not ever care to use. 188 00:09:52,625 --> 00:09:55,080 >> Let me try to find some interesting examples. 189 00:09:55,080 --> 00:10:04,020 190 00:10:04,020 --> 00:10:12,860 So I mean, I guess one of the things that was pretty interesting 191 00:10:12,860 --> 00:10:19,850 was when we got to a point in terms of traffic 192 00:10:19,850 --> 00:10:23,110 where we started maxing out the performance of some 193 00:10:23,110 --> 00:10:27,620 of these open source applications that are generally pretty performant. 194 00:10:27,620 --> 00:10:32,149 >> So for example, MySQL is a really good open source database. 195 00:10:32,149 --> 00:10:34,690 I don't know if any of you guys sort of in your own time mess 196 00:10:34,690 --> 00:10:39,920 around and make anything with MySQL or have used it in any way. 197 00:10:39,920 --> 00:10:41,310 But it's pretty easy to use. 198 00:10:41,310 --> 00:10:43,029 It's also decently quick. 199 00:10:43,029 --> 00:10:44,070 Indices work pretty well. 200 00:10:44,070 --> 00:10:48,090 It's not as fully featured as something like Oracle, but it's pretty good. 201 00:10:48,090 --> 00:10:50,460 >> And we got to a point where, I think around 202 00:10:50,460 --> 00:10:54,400 when we started doing like maybe 100 million pages a day, 203 00:10:54,400 --> 00:10:59,230 that we started running into some bottlenecks on that. 204 00:10:59,230 --> 00:11:07,530 So for example, a typical query on MySQL might take two to four milliseconds. 205 00:11:07,530 --> 00:11:09,220 And that's not that much. 206 00:11:09,220 --> 00:11:12,900 But when you're doing 100 billion page views a day, 207 00:11:12,900 --> 00:11:15,679 and each page view might have 30 to 50 queries, 208 00:11:15,679 --> 00:11:18,220 especially if you're doing something like a profile view that 209 00:11:18,220 --> 00:11:23,150 queries all kinds of different information, then that starts to suck. 210 00:11:23,150 --> 00:11:29,450 >> So we started to develop a caching layer that 211 00:11:29,450 --> 00:11:31,750 allowed quicker access to some of the information. 212 00:11:31,750 --> 00:11:35,460 And originally we were using another open source application Memcache, 213 00:11:35,460 --> 00:11:38,320 which I don't know if any of you guys have any experience with that. 214 00:11:38,320 --> 00:11:40,700 But it was pretty quick. 215 00:11:40,700 --> 00:11:43,950 It got access times down to I guess the 0.3 216 00:11:43,950 --> 00:11:46,840 to 0.5 milliseconds, which is pretty good. 217 00:11:46,840 --> 00:11:52,170 >> But it also has a bunch of distribution issues. 218 00:11:52,170 --> 00:11:56,000 It's supposed to be a distributed hash table sort of application, 219 00:11:56,000 --> 00:12:02,540 where you can just attach any number of Memcache boxes in a cluster 220 00:12:02,540 --> 00:12:05,610 and be able to hook it up and have it go. 221 00:12:05,610 --> 00:12:08,710 But we ran into a lot of issues there where 222 00:12:08,710 --> 00:12:11,170 different Memcache boxes would go down. 223 00:12:11,170 --> 00:12:13,270 And there was no redundancy on the information. 224 00:12:13,270 --> 00:12:17,120 So when a Memcache box went down and you had a cache miss, 225 00:12:17,120 --> 00:12:19,640 then all of a sudden you had a lot more traffic 226 00:12:19,640 --> 00:12:22,740 going to a specific set of databases. 227 00:12:22,740 --> 00:12:26,170 And that would suck. 228 00:12:26,170 --> 00:12:32,830 >> So as time went on, we even outgrew Memcache and the indices on MySQL. 229 00:12:32,830 --> 00:12:33,890 We still use that stuff. 230 00:12:33,890 --> 00:12:37,490 But we had to build on top of that extra redundancy. 231 00:12:37,490 --> 00:12:41,870 And I think that's something that's probably maybe a little interesting. 232 00:12:41,870 --> 00:12:45,580 But I'll let you guys ask me more questions about that later. 233 00:12:45,580 --> 00:12:51,432 >> I'm not really sure what would be interesting to talk about right now. 234 00:12:51,432 --> 00:12:53,220 Maybe you guys could help out a little? 235 00:12:53,220 --> 00:12:57,170 236 00:12:57,170 --> 00:12:58,073 Go for it. 237 00:12:58,073 --> 00:13:03,496 >> AUDIENCE: I'm curious about, thinking of [INAUDIBLE] 238 00:13:03,496 --> 00:13:10,891 going into an online business like this, how you felt the atmosphere was 239 00:13:10,891 --> 00:13:15,058 with big players all bringing it to market and other big players 240 00:13:15,058 --> 00:13:16,807 who you thought might [INAUDIBLE] to mark, 241 00:13:16,807 --> 00:13:19,765 or what your experience was with that. 242 00:13:19,765 --> 00:13:24,202 I'd be interested, just on a technical side, [INAUDIBLE] just ramping 243 00:13:24,202 --> 00:13:26,667 up and technically how you [INAUDIBLE]. 244 00:13:26,667 --> 00:13:29,625 245 00:13:29,625 --> 00:13:33,710 >> MARK ZUCKERBERG: Yeah, so that's not a technical question at all. 246 00:13:33,710 --> 00:13:41,250 But I guess I'll just like go into question time now. 247 00:13:41,250 --> 00:13:45,439 Because I'm not really sure what's relevant stuff for me to be discussing. 248 00:13:45,439 --> 00:13:46,480 So I'll just answer this. 249 00:13:46,480 --> 00:13:49,313 Then anyone else who wants to ask me questions can just go for that. 250 00:13:49,313 --> 00:13:51,470 251 00:13:51,470 --> 00:13:54,640 >> I guess I'd never really spent a lot of time worrying about stuff like-- I 252 00:13:54,640 --> 00:13:56,598 mean, there are companies out there like Google 253 00:13:56,598 --> 00:14:00,600 that could just get into your space and do whatever you want at any time. 254 00:14:00,600 --> 00:14:08,050 And I think one of the cool things about this time in technology 255 00:14:08,050 --> 00:14:13,340 is that individuals are leveraged and able to do way more than they've really 256 00:14:13,340 --> 00:14:14,950 ever been able to do before. 257 00:14:14,950 --> 00:14:20,090 >> And even four years ago when Google was started, 258 00:14:20,090 --> 00:14:22,830 now they have hundreds of thousands of machines 259 00:14:22,830 --> 00:14:26,780 and probably billions of dollars spent on equipment. 260 00:14:26,780 --> 00:14:29,340 I think the generation before Google, you couldn't even 261 00:14:29,340 --> 00:14:32,410 make a site without some big piece of hardware. 262 00:14:32,410 --> 00:14:40,000 I think eBay, for example, ran off of two $50,000 machines. 263 00:14:40,000 --> 00:14:43,640 You just can't start doing that if you're just a kid in a dorm room. 264 00:14:43,640 --> 00:14:51,610 >> So I think the fact that we could rent machines for $100 a month 265 00:14:51,610 --> 00:14:56,820 and use that to scale up to a point where we had 300,000 users 266 00:14:56,820 --> 00:14:57,830 is pretty cool. 267 00:14:57,830 --> 00:15:02,810 It's a pretty unique thing that that's going on in technology right now. 268 00:15:02,810 --> 00:15:08,390 It makes it so that instead of worrying about who is the big player 269 00:15:08,390 --> 00:15:15,356 and what is Google going to do next, you can do more of-- you 270 00:15:15,356 --> 00:15:16,730 can just get a lot of stuff done. 271 00:15:16,730 --> 00:15:24,460 >> And instead of having to go out and have some of the traditional business 272 00:15:24,460 --> 00:15:27,927 problems, like you have to raise capital before you can make anything, 273 00:15:27,927 --> 00:15:29,010 that's no longer an issue. 274 00:15:29,010 --> 00:15:32,100 So you're leveraged to do a lot more on your own now. 275 00:15:32,100 --> 00:15:35,300 I don't know if that answers the question that you're asking. 276 00:15:35,300 --> 00:15:38,790 >> But I mean, it's one of the reasons why I think that, at this point, 277 00:15:38,790 --> 00:15:41,040 it makes a lot of sense to be studying this stuff. 278 00:15:41,040 --> 00:15:47,110 Because at no point in the past could you leverage such a small amount 279 00:15:47,110 --> 00:15:49,460 of money to get powerful enough technology 280 00:15:49,460 --> 00:15:52,000 to really touch people in the way that you can today. 281 00:15:52,000 --> 00:15:55,990 Google does about 250 million pages views a day. 282 00:15:55,990 --> 00:16:01,970 They have hundreds of thousands of machines and 5,000 employees. 283 00:16:01,970 --> 00:16:05,480 >> Facebook does 400 million page views a day. 284 00:16:05,480 --> 00:16:10,260 That's a lot more than Google does. 285 00:16:10,260 --> 00:16:12,340 And we have hundreds of machines. 286 00:16:12,340 --> 00:16:15,600 And we just passed 50 employees. 287 00:16:15,600 --> 00:16:19,860 And that's just a technical generation of three or four 288 00:16:19,860 --> 00:16:22,910 years in the architectures that were created. 289 00:16:22,910 --> 00:16:27,162 >> And then you go three or four years back before that from like eBay to Google, 290 00:16:27,162 --> 00:16:28,620 and it's just completely different. 291 00:16:28,620 --> 00:16:32,510 Because at least Google is running off of a lot of distributed equipment 292 00:16:32,510 --> 00:16:34,930 that they have hundreds of thousands of machines, 293 00:16:34,930 --> 00:16:40,200 but the idea there was to get a lot of shitty machines that are really cheap. 294 00:16:40,200 --> 00:16:41,530 I mean, that's a big step up. 295 00:16:41,530 --> 00:16:44,539 >> Because then it's like, OK, that's more redundant. 296 00:16:44,539 --> 00:16:45,830 They're not losing information. 297 00:16:45,830 --> 00:16:47,455 They don't expect stuff to always work. 298 00:16:47,455 --> 00:16:51,307 It's a much more mature attitude than eBay's, which 299 00:16:51,307 --> 00:16:53,390 was the only thing that they could do at the time. 300 00:16:53,390 --> 00:16:56,406 301 00:16:56,406 --> 00:16:58,676 >> AUDIENCE: I have a question about the DHT stuff. 302 00:16:58,676 --> 00:16:59,759 >> MARK ZUCKERBERG: The what? 303 00:16:59,759 --> 00:17:01,551 AUDIENCE: The Distributed Hash Table stuff. 304 00:17:01,551 --> 00:17:02,925 MARK ZUCKERBERG: Yeah, which one? 305 00:17:02,925 --> 00:17:05,074 AUDIENCE: I was just wondering if you [INAUDIBLE] 306 00:17:05,074 --> 00:17:08,511 all your extensions for Memcache, because one thing I've noticed 307 00:17:08,511 --> 00:17:12,930 is that, yeah, there aren't really good available libraries for DHT stuff. 308 00:17:12,930 --> 00:17:14,972 There's all this wonderful research, but in terms 309 00:17:14,972 --> 00:17:18,138 of implementations that actually deal with all the redundancy issues and all 310 00:17:18,138 --> 00:17:18,822 those things-- 311 00:17:18,822 --> 00:17:22,920 >> MARK ZUCKERBERG: Yeah, a lot of the stuff-- we 312 00:17:22,920 --> 00:17:25,280 didn't necessarily extend Memcache. 313 00:17:25,280 --> 00:17:29,480 We built a bunch of stuff ourselves. 314 00:17:29,480 --> 00:17:32,470 Right now, it's not open source. 315 00:17:32,470 --> 00:17:33,590 We considered doing it. 316 00:17:33,590 --> 00:17:37,410 And I mean, there's a lot of work that goes into making stuff open source. 317 00:17:37,410 --> 00:17:42,020 And it's on top of whether or not you want to lose the competitive advantage. 318 00:17:42,020 --> 00:17:43,150 It's kind of unfortunate. 319 00:17:43,150 --> 00:17:46,935 >> Because I think that if it we were just easier to make something like that, 320 00:17:46,935 --> 00:17:47,810 then you could do it. 321 00:17:47,810 --> 00:17:49,950 You could just release the code. 322 00:17:49,950 --> 00:17:55,430 But then there's a lot of support and licensing and all that stuff. 323 00:17:55,430 --> 00:17:57,030 We found that it's been annoying. 324 00:17:57,030 --> 00:17:59,930 >> One of the things that we actually considered making open source 325 00:17:59,930 --> 00:18:03,740 was this search server that actually that guy sitting right there 326 00:18:03,740 --> 00:18:09,050 made while he was still out in California. 327 00:18:09,050 --> 00:18:16,380 And I guess we got to a point where MySQL was lagging a little on some 328 00:18:16,380 --> 00:18:18,520 of the searches that we were trying to do. 329 00:18:18,520 --> 00:18:22,330 And we decided that it would be a cool thing 330 00:18:22,330 --> 00:18:26,750 to do to make a series of distributed machines 331 00:18:26,750 --> 00:18:29,147 that could-- he doesn't use a hash table. 332 00:18:29,147 --> 00:18:30,980 What's the structure that you use, McCollum? 333 00:18:30,980 --> 00:18:33,729 >> ANDREW MCCOLLUM: [INAUDIBLE]. 334 00:18:33,729 --> 00:18:36,270 MARK ZUCKERBERG: So, yeah, we thought about making that open. 335 00:18:36,270 --> 00:18:42,670 But that's when we kind of had to do all this work to come up with a license. 336 00:18:42,670 --> 00:18:44,910 And we're just like, all right, screw that. 337 00:18:44,910 --> 00:18:51,490 338 00:18:51,490 --> 00:18:51,990 Yo. 339 00:18:51,990 --> 00:18:56,157 >> AUDIENCE: What do you spend most of your work time doing these days? 340 00:18:56,157 --> 00:18:57,475 >> MARK ZUCKERBERG: Hiring people. 341 00:18:57,475 --> 00:19:01,160 342 00:19:01,160 --> 00:19:06,060 I guess when, as you grow, the most important thing 343 00:19:06,060 --> 00:19:07,060 is to have smart people. 344 00:19:07,060 --> 00:19:09,630 345 00:19:09,630 --> 00:19:13,915 If you think about how, the technical leverage stuff that I was talking about 346 00:19:13,915 --> 00:19:20,960 in answering that guy's question, as technology becomes 347 00:19:20,960 --> 00:19:23,940 more generic and less expensive, the leverage point 348 00:19:23,940 --> 00:19:26,110 becomes more in the people. 349 00:19:26,110 --> 00:19:29,860 So if you think about this from a perspective 350 00:19:29,860 --> 00:19:36,610 of a person to people time spent or user time spent, or page view 351 00:19:36,610 --> 00:19:40,590 analysis, because of technology now, people 352 00:19:40,590 --> 00:19:46,220 are much more leveraged to do more things 353 00:19:46,220 --> 00:19:49,380 and be more important in the equation. 354 00:19:49,380 --> 00:19:53,130 >> Because of that, it's really important to get the most intelligent people. 355 00:19:53,130 --> 00:19:58,660 And also, I mean, when you're a small company, you can be really nimble 356 00:19:58,660 --> 00:20:00,050 and get a lot of stuff done. 357 00:20:00,050 --> 00:20:02,845 And there's relatively little bureaucracy. 358 00:20:02,845 --> 00:20:06,397 So if you have smart people who can take advantage of that to build cool things, 359 00:20:06,397 --> 00:20:07,230 then that's awesome. 360 00:20:07,230 --> 00:20:10,790 361 00:20:10,790 --> 00:20:15,990 >> I guess, besides that, designing new things. 362 00:20:15,990 --> 00:20:18,530 There's not much corporate bureaucracy yet. 363 00:20:18,530 --> 00:20:20,342 So I don't have to waste much time on that. 364 00:20:20,342 --> 00:20:26,820 365 00:20:26,820 --> 00:20:29,630 Keep on going? 366 00:20:29,630 --> 00:20:36,090 >> AUDIENCE: Yeah, how much have you spoken and consulted with lawyers so far? 367 00:20:36,090 --> 00:20:38,860 >> MARK ZUCKERBERG: I have a lawyer who works for me full-time. 368 00:20:38,860 --> 00:20:43,830 >> AUDIENCE: OK, it is a big part of running a business? 369 00:20:43,830 --> 00:20:47,309 Would you recommend working on [INAUDIBLE] early on? 370 00:20:47,309 --> 00:20:50,291 371 00:20:50,291 --> 00:20:52,550 >> MARK ZUCKERBERG: We didn't. 372 00:20:52,550 --> 00:20:59,980 And that, I guess, provided some annoyance later on. 373 00:20:59,980 --> 00:21:04,502 Getting stuff set up really well is good. 374 00:21:04,502 --> 00:21:05,960 Getting stuff clean is really good. 375 00:21:05,960 --> 00:21:09,590 >> And, I mean, no one's ever going to tell you a lawyer is bad. 376 00:21:09,590 --> 00:21:13,790 It's all just a question of opportunity cost and what you prioritize. 377 00:21:13,790 --> 00:21:19,820 I guess that, in our case, we now have to deal with a bunch of stuff that 378 00:21:19,820 --> 00:21:23,030 wasn't set up properly in the beginning. 379 00:21:23,030 --> 00:21:25,010 Most of the stuff is dealt with. 380 00:21:25,010 --> 00:21:26,620 It's not even a big deal anymore. 381 00:21:26,620 --> 00:21:33,450 >> But instead of talking to lawyers early on, we were making stuff. 382 00:21:33,450 --> 00:21:37,960 And I think that that was probably the right use of our time. 383 00:21:37,960 --> 00:21:41,530 I think that one cool characteristic of a lot of the companies that end up 384 00:21:41,530 --> 00:21:44,860 being really successful, not that we are really successful, 385 00:21:44,860 --> 00:21:46,720 but I guess we also fall into this bucket, 386 00:21:46,720 --> 00:21:49,424 is that they started off as someone trying to make something 387 00:21:49,424 --> 00:21:51,340 cool and not someone trying to make a company. 388 00:21:51,340 --> 00:21:54,800 389 00:21:54,800 --> 00:21:59,540 You kind of have-- Google came out of Larry and Sergey's PhD Dissertation 390 00:21:59,540 --> 00:22:04,500 at Stanford, and Yahoo came out of just, I guess, also some Stanford guys 391 00:22:04,500 --> 00:22:06,510 just kind of screwing around in their dorm room. 392 00:22:06,510 --> 00:22:11,840 And eBay came out of some guy trying to build a marketplace for his girlfriend 393 00:22:11,840 --> 00:22:14,342 to exchange PEZ dispensers. 394 00:22:14,342 --> 00:22:15,842 Amazon was a little more calculated. 395 00:22:15,842 --> 00:22:20,290 396 00:22:20,290 --> 00:22:24,067 >> So I can't imagine that any of those people really had that much advice, 397 00:22:24,067 --> 00:22:25,900 and it seems to have worked out OK for them. 398 00:22:25,900 --> 00:22:28,191 But, I mean, at the same time I'm not going to sit here 399 00:22:28,191 --> 00:22:30,920 and tell you not to get advice on stuff. 400 00:22:30,920 --> 00:22:35,810 And a lot of times people are just too careful, too. 401 00:22:35,810 --> 00:22:40,600 I think it's more useful to make things happen and then apologize later 402 00:22:40,600 --> 00:22:43,740 than it is to make sure that you dot all your I's eyes now and then 403 00:22:43,740 --> 00:22:44,740 just not get stuff done. 404 00:22:44,740 --> 00:22:47,500 405 00:22:47,500 --> 00:22:49,430 Yeah. 406 00:22:49,430 --> 00:22:50,460 Go for it. 407 00:22:50,460 --> 00:22:53,436 >> AUDIENCE: When do you think that Facebook will reach the point where 408 00:22:53,436 --> 00:23:02,860 it could become that big company [INAUDIBLE] new idea, [INAUDIBLE]? 409 00:23:02,860 --> 00:23:05,836 Do you think it will reach that point any time soon? 410 00:23:05,836 --> 00:23:09,340 How would you keep it from [INAUDIBLE]? 411 00:23:09,340 --> 00:23:12,214 >> MARK ZUCKERBERG: Well, I mean, I think that-- I 412 00:23:12,214 --> 00:23:14,253 think you're kind of always at that point. 413 00:23:14,253 --> 00:23:18,290 414 00:23:18,290 --> 00:23:21,830 I mean, most companies are started on like a couple of ideas, 415 00:23:21,830 --> 00:23:25,800 and those are a few things that they do well. 416 00:23:25,800 --> 00:23:29,120 So, I mean, Yahoo's was like we're going to organize all this information 417 00:23:29,120 --> 00:23:31,160 in the world like by directory. 418 00:23:31,160 --> 00:23:33,350 And that was what they started off doing, 419 00:23:33,350 --> 00:23:38,860 and then they kind of diversified out as time went on and built more stuff. 420 00:23:38,860 --> 00:23:42,910 And a lot of that stuff is like the core of their business now. 421 00:23:42,910 --> 00:23:45,460 I mean, it's like they didn't originally do search. 422 00:23:45,460 --> 00:23:47,740 And now directory just doesn't exist. 423 00:23:47,740 --> 00:23:49,280 It sucks. 424 00:23:49,280 --> 00:23:52,880 There's no utility for it. 425 00:23:52,880 --> 00:23:56,320 >> I mean, Google's big thing was just like they did PageRank. 426 00:23:56,320 --> 00:24:02,320 And then, I guess, out of PageRank, they have search. 427 00:24:02,320 --> 00:24:05,960 And now they kind of extend that to do other similar type of algorithms, 428 00:24:05,960 --> 00:24:07,830 searching in other spaces. 429 00:24:07,830 --> 00:24:11,090 But, I mean, you can kind of tell how all the other stuff that they're doing 430 00:24:11,090 --> 00:24:12,330 is sort of tangential. 431 00:24:12,330 --> 00:24:16,220 And it's like they're trying really hard to make PageRank 432 00:24:16,220 --> 00:24:19,080 and other types of algorithms that are very 433 00:24:19,080 --> 00:24:23,660 similar to that work in their spaces, and it's just not as elegant 434 00:24:23,660 --> 00:24:27,460 or pure of an idea as the original one was. 435 00:24:27,460 --> 00:24:30,590 >> So in Facebook, for example, when it just got started, 436 00:24:30,590 --> 00:24:32,840 what I thought was the most interesting thing was just 437 00:24:32,840 --> 00:24:36,010 to be able to type in someone's name and find out information about them. 438 00:24:36,010 --> 00:24:38,800 And there was hardly any of the stuff that was there now. 439 00:24:38,800 --> 00:24:41,010 There was no groups. 440 00:24:41,010 --> 00:24:44,982 There was no messages even. 441 00:24:44,982 --> 00:24:45,690 There was poking. 442 00:24:45,690 --> 00:24:49,790 443 00:24:49,790 --> 00:24:51,250 >> Yeah. 444 00:24:51,250 --> 00:24:56,590 I mean, so it's like you kind of get started on some kind of core idea. 445 00:24:56,590 --> 00:24:59,680 And generally, the company will do well, because I 446 00:24:59,680 --> 00:25:02,520 guess the people who are starting off working on that core idea 447 00:25:02,520 --> 00:25:06,717 kind of understand that single core idea in some sort of unique way. 448 00:25:06,717 --> 00:25:09,800 But that doesn't imply that they have any better understanding of anything 449 00:25:09,800 --> 00:25:12,749 else, than anyone else. 450 00:25:12,749 --> 00:25:15,290 So that's why surrounding yourself with a lot of smart people 451 00:25:15,290 --> 00:25:18,615 is really important. 452 00:25:18,615 --> 00:25:20,833 >> AUDIENCE: What was-- was there any sort of model 453 00:25:20,833 --> 00:25:26,010 that was [INAUDIBLE] photo features [INAUDIBLE] on Facebook? 454 00:25:26,010 --> 00:25:27,982 Was there any sort of [INAUDIBLE]? 455 00:25:27,982 --> 00:25:31,824 456 00:25:31,824 --> 00:25:34,740 MARK ZUCKERBERG: I mean, there's a lot of applications on the internet 457 00:25:34,740 --> 00:25:36,220 now that do that stuff. 458 00:25:36,220 --> 00:25:39,540 So, I mean, Flickr's a pretty photo application. 459 00:25:39,540 --> 00:25:42,470 Although I think in three weeks we passed them in the number of photos 460 00:25:42,470 --> 00:25:43,470 that we had on our site. 461 00:25:43,470 --> 00:25:49,030 462 00:25:49,030 --> 00:25:51,155 I mean, I think that the coolest thing about photos 463 00:25:51,155 --> 00:25:54,849 is that you can tag them and the way that 464 00:25:54,849 --> 00:25:56,390 makes them link to people's profiles. 465 00:25:56,390 --> 00:25:58,750 And I think that that's something that you can really 466 00:25:58,750 --> 00:26:05,960 only do if you have the context of everyone around you on the site. 467 00:26:05,960 --> 00:26:08,190 That kind of requires the ubiquity of usage. 468 00:26:08,190 --> 00:26:09,829 469 00:26:09,829 --> 00:26:13,120 So I don't know if any of the other guys would have done that if they have that 470 00:26:13,120 --> 00:26:16,810 kind of use, but they didn't. 471 00:26:16,810 --> 00:26:20,150 472 00:26:20,150 --> 00:26:20,740 >> I don't know. 473 00:26:20,740 --> 00:26:24,085 Don't any of you guys have any CS questions? 474 00:26:24,085 --> 00:26:25,055 >> AUDIENCE: I'm curious. 475 00:26:25,055 --> 00:26:27,513 How do you decide as you're moving forward with the company 476 00:26:27,513 --> 00:26:30,897 to pursue a technology or not pursue a technology? 477 00:26:30,897 --> 00:26:32,230 MARK ZUCKERBERG: What's an idea? 478 00:26:32,230 --> 00:26:33,185 What's in the example? 479 00:26:33,185 --> 00:26:36,614 >> AUDIENCE: Well, I actually don't know much about Facebook. 480 00:26:36,614 --> 00:26:39,939 What's the next thing you want to do with pictures 481 00:26:39,939 --> 00:26:41,105 and linking people together? 482 00:26:41,105 --> 00:26:45,097 How do you know about figure out which technologies are good ones? 483 00:26:45,097 --> 00:26:48,091 How do you mine to find technology? 484 00:26:48,091 --> 00:26:51,579 Do you have any processes in place today that 485 00:26:51,579 --> 00:26:54,230 are directed towards those sorts of things, 486 00:26:54,230 --> 00:26:56,158 or does technology just come into the company 487 00:26:56,158 --> 00:26:57,866 because you're out someplace and somebody 488 00:26:57,866 --> 00:27:00,980 mentioned something you might want to do in terms of Facebook? 489 00:27:00,980 --> 00:27:04,670 >> MARK ZUCKERBERG: So I think that our process for filtering what technologies 490 00:27:04,670 --> 00:27:08,560 to use are trust the smart people. 491 00:27:08,560 --> 00:27:15,010 So we definitely have some people at the company who are just really smart, 492 00:27:15,010 --> 00:27:19,795 and I think that most of the people at the company are generally pretty smart. 493 00:27:19,795 --> 00:27:22,670 >> But there area a few guys in particular-- I'm 494 00:27:22,670 --> 00:27:32,050 not one of them-- who I think that when they say that something is a generally 495 00:27:32,050 --> 00:27:36,590 good practice to go at it, then it's relatively-- then 496 00:27:36,590 --> 00:27:39,471 they can get support for that pretty easily. 497 00:27:39,471 --> 00:27:42,720 And I think that a lot of the engineers sort of build a consensus around that. 498 00:27:42,720 --> 00:27:45,877 499 00:27:45,877 --> 00:27:47,480 I'm trying to think of a good example. 500 00:27:47,480 --> 00:27:51,470 >> I think it's somewhat goal oriented. 501 00:27:51,470 --> 00:27:56,610 So then with photos, we knew that we wanted 502 00:27:56,610 --> 00:27:58,969 to support just people uploading unlimited photos. 503 00:27:58,969 --> 00:28:01,010 So, I mean, there's no real concept of unlimited. 504 00:28:01,010 --> 00:28:05,240 It's just you have to keep on adding stuff, keep on adding storage. 505 00:28:05,240 --> 00:28:09,337 And you want to make it so that it kind of works as seamlessly as possible. 506 00:28:09,337 --> 00:28:11,170 So the first thing that we were trying to do 507 00:28:11,170 --> 00:28:16,140 is, well, let's evaluate these companies that 508 00:28:16,140 --> 00:28:18,950 just do large storage for a living. 509 00:28:18,950 --> 00:28:21,420 Or it's like NetApp or something, Network Appliance. 510 00:28:21,420 --> 00:28:24,080 So we talk to them for a while. 511 00:28:24,080 --> 00:28:25,440 And then we're like, all right. 512 00:28:25,440 --> 00:28:29,200 Well, we don't really want to go with this single, big box approach. 513 00:28:29,200 --> 00:28:33,530 We want to go with having just a series of distributed smaller 514 00:28:33,530 --> 00:28:37,400 boxes with a lot of hard drive and a lot of RAM. 515 00:28:37,400 --> 00:28:40,360 >> And so I think that the architecture that we first built 516 00:28:40,360 --> 00:28:42,770 was one where we had a bunch of those machines 517 00:28:42,770 --> 00:28:49,090 with relatively slow but very stable disk behind a level of-- a layer 518 00:28:49,090 --> 00:28:54,250 of caching boxes with a ton of RAM that could hold most of the thumbnails 519 00:28:54,250 --> 00:29:01,670 and the most frequently accessed images in-- I guess in RAM at any time. 520 00:29:01,670 --> 00:29:04,610 And then right before we launched, it occurred to us 521 00:29:04,610 --> 00:29:07,480 that we were going to have some issues with this. 522 00:29:07,480 --> 00:29:11,450 And the issues that we were going to have 523 00:29:11,450 --> 00:29:16,635 were going to be network issues, not hardware issues. 524 00:29:16,635 --> 00:29:20,360 >> So, for example, if you take a photo album of 30 photos 525 00:29:20,360 --> 00:29:23,000 and each of your photos is three megabytes, 526 00:29:23,000 --> 00:29:25,330 then you can upload 90 megabytes to Facebook. 527 00:29:25,330 --> 00:29:26,610 And that kind of sucks. 528 00:29:26,610 --> 00:29:27,110 All right. 529 00:29:27,110 --> 00:29:31,790 I mean, it sucks because people tend to have not optimal connections 530 00:29:31,790 --> 00:29:37,770 and because our router-- I guess most routers are set up 531 00:29:37,770 --> 00:29:39,960 to only be able to handle a gigabit at a time, 532 00:29:39,960 --> 00:29:42,127 and routers are kind of expensive. 533 00:29:42,127 --> 00:29:43,460 Thy are big pieces of equipment. 534 00:29:43,460 --> 00:29:46,043 I don't think that there is a distributed version of that yet. 535 00:29:46,043 --> 00:29:48,030 536 00:29:48,030 --> 00:29:50,930 >> So we couldn't, in the time frame that we wanted to launch it, 537 00:29:50,930 --> 00:29:54,720 just get a new router and get it set up. 538 00:29:54,720 --> 00:30:02,000 So what we ended up doing was building a Java applet and an ActiveX control that 539 00:30:02,000 --> 00:30:04,370 coupled the choosing of the photos that people wanted 540 00:30:04,370 --> 00:30:08,650 to upload with compression on the client side to make it smaller, 541 00:30:08,650 --> 00:30:14,380 and then that way people can just upload their photos relatively quickly. 542 00:30:14,380 --> 00:30:16,540 We also saved CPU on our side because we don't 543 00:30:16,540 --> 00:30:18,290 have to do the decompression on our side, 544 00:30:18,290 --> 00:30:21,190 although that wasn't that huge of a bottleneck. 545 00:30:21,190 --> 00:30:22,110 So that worked. 546 00:30:22,110 --> 00:30:25,330 >> And then we got it to a point where we were 547 00:30:25,330 --> 00:30:27,720 having uploads at a rate of 100 a second, 548 00:30:27,720 --> 00:30:31,000 and people were using the feature way more than we thought we were going to. 549 00:30:31,000 --> 00:30:34,226 And even though we had this caching tier setup, 550 00:30:34,226 --> 00:30:35,600 it just still wasn't fast enough. 551 00:30:35,600 --> 00:30:36,490 I'm sure you guys remember this. 552 00:30:36,490 --> 00:30:39,090 A few weeks ago, the site was not having a good time. 553 00:30:39,090 --> 00:30:41,990 554 00:30:41,990 --> 00:30:45,180 >> So what we ended up doing at that point was 555 00:30:45,180 --> 00:30:49,200 using edge caching, like Akamai type of stuff 556 00:30:49,200 --> 00:30:53,440 to make these photos which are static content just be closer to people. 557 00:30:53,440 --> 00:31:00,610 So that way we can sort of offload some of the equipment and the-- sort 558 00:31:00,610 --> 00:31:05,610 of having to transfer these still somewhat large files to people. 559 00:31:05,610 --> 00:31:10,890 So that's where we are now, and it seems to be working pretty well. 560 00:31:10,890 --> 00:31:14,700 >> It wasn't that we had any upfront technical genius about it. 561 00:31:14,700 --> 00:31:19,270 It was just sort of that at each point we sort of anticipated the issues 562 00:31:19,270 --> 00:31:21,390 or picked them out pretty quickly and then 563 00:31:21,390 --> 00:31:23,370 had enough competence to evaluate, I think, 564 00:31:23,370 --> 00:31:25,690 what the options were that we had and make 565 00:31:25,690 --> 00:31:28,458 what I think were decent decisions about how to execute on them. 566 00:31:28,458 --> 00:31:29,354 What's that? 567 00:31:29,354 --> 00:31:31,399 >> AUDIENCE: Take that to the next level, too, in terms of the problems 568 00:31:31,399 --> 00:31:32,315 you just talked about. 569 00:31:32,315 --> 00:31:33,836 MARK ZUCKERBERG: Yeah. 570 00:31:33,836 --> 00:31:38,069 >> AUDIENCE: Students get one year of-- you know, one computer science working 571 00:31:38,069 --> 00:31:44,294 with, like, I go sit in the corner, type on my [INAUDIBLE]. 572 00:31:44,294 --> 00:31:47,282 How did the company work through-- what do the software engineers do 573 00:31:47,282 --> 00:31:50,300 when you guys all have to put curly braces in the same place? 574 00:31:50,300 --> 00:31:51,508 >> MARK ZUCKERBERG: What's that? 575 00:31:51,508 --> 00:31:54,958 AUDIENCE: Curly braces for the programmers in the same place. 576 00:31:54,958 --> 00:31:58,766 How is the structure of the software engineering actually done [INAUDIBLE]? 577 00:31:58,766 --> 00:32:01,800 578 00:32:01,800 --> 00:32:06,720 >> MARK ZUCKERBERG: So the way that-- I guess the methodology that we have is 579 00:32:06,720 --> 00:32:12,520 that I wanted to be sort of-- as much of a meritocracy as possible 580 00:32:12,520 --> 00:32:16,600 where the people who can come up with the coolest solutions 581 00:32:16,600 --> 00:32:20,790 and implement them the quickest and have like the fewest bugs get 582 00:32:20,790 --> 00:32:23,950 to work on the stuff that they think is the most interesting 583 00:32:23,950 --> 00:32:26,600 and go off and have the most influence in the company. 584 00:32:26,600 --> 00:32:29,070 >> So we're also on-boarding a lot of people, 585 00:32:29,070 --> 00:32:31,270 because we're hiring relatively quickly. 586 00:32:31,270 --> 00:32:36,240 And in doing so, we sort of have-- we pair up 587 00:32:36,240 --> 00:32:40,370 new people who are coming in with some-- like the better people 588 00:32:40,370 --> 00:32:44,720 who are sort of at the top of the chain, and then we 589 00:32:44,720 --> 00:32:48,650 have them sort of work with those people when they first come in, 590 00:32:48,650 --> 00:32:51,340 to learn the stuff that they're working on that-- so 591 00:32:51,340 --> 00:32:53,580 that the new guys, like the incoming class, 592 00:32:53,580 --> 00:32:56,870 can sort of learn what some of the people that are currently 593 00:32:56,870 --> 00:32:58,290 at the company are working on. 594 00:32:58,290 --> 00:33:02,270 And I think in doing that, they pick up the style and the methods that we 595 00:33:02,270 --> 00:33:03,540 use for doing stuff. 596 00:33:03,540 --> 00:33:07,940 >> But I think that it changes pretty quickly. 597 00:33:07,940 --> 00:33:12,340 I think one difference between the way stuff works in a company 598 00:33:12,340 --> 00:33:16,600 and the way stuff works in school is that this is a very iterative process. 599 00:33:16,600 --> 00:33:21,880 And it's nice when you get stuff right the first time, but we don't need to. 600 00:33:21,880 --> 00:33:24,810 And I think that a lot of companies go through phases, or stages, 601 00:33:24,810 --> 00:33:26,810 where they don't get stuff right the first time. 602 00:33:26,810 --> 00:33:29,560 >> Like Microsoft-- I mean, I don't know when 603 00:33:29,560 --> 00:33:32,589 the last time was that they had a good product before Version 4. 604 00:33:32,589 --> 00:33:34,380 But by the time they get to Version 4, it's 605 00:33:34,380 --> 00:33:37,286 like always good for the most part. 606 00:33:37,286 --> 00:33:39,380 And I think that works out pretty well for them. 607 00:33:39,380 --> 00:33:42,240 And, I mean, Google always releases their stuff in beta. 608 00:33:42,240 --> 00:33:50,350 >> So I guess we try to have multiple people work on the same thing, 609 00:33:50,350 --> 00:33:53,810 so everyone can learn from each other and kind of pick off 610 00:33:53,810 --> 00:33:58,800 some of the mistakes that might be made that we can reduce pretty quickly. 611 00:33:58,800 --> 00:34:01,676 But like, I guess in general, the idea is 612 00:34:01,676 --> 00:34:04,050 that it doesn't have to be perfect the first time around. 613 00:34:04,050 --> 00:34:07,457 And as long as you get the architecture as right as possible, 614 00:34:07,457 --> 00:34:09,290 then a lot of the other implementation stuff 615 00:34:09,290 --> 00:34:11,581 isn't going to be as big of a deal, and you can sort of 616 00:34:11,581 --> 00:34:13,190 work that out at any time. 617 00:34:13,190 --> 00:34:16,449 I know if that's sort of answering the question that you asked me. 618 00:34:16,449 --> 00:34:20,199 >> AUDIENCE: So now, when you find something 619 00:34:20,199 --> 00:34:22,449 that you want to do that you don't know so much about, 620 00:34:22,449 --> 00:34:24,449 you can ask some of these people that are working for you, 621 00:34:24,449 --> 00:34:25,449 or you can get new people. 622 00:34:25,449 --> 00:34:28,657 But when you started, it was just sort of you and your roommate as a student. 623 00:34:28,657 --> 00:34:32,199 And obviously, there were domain knowledge issues of computer science 624 00:34:32,199 --> 00:34:34,449 that you had to deal with and you didn't know about. 625 00:34:34,449 --> 00:34:37,449 >> I mean, how did you go about figuring out how to do things? 626 00:34:37,449 --> 00:34:39,222 Did you decide to take certain classes? 627 00:34:39,222 --> 00:34:39,971 Did you get books? 628 00:34:39,971 --> 00:34:43,278 Did you go hire or get involved with some more people? 629 00:34:43,278 --> 00:34:45,758 How did you work through those issues of learning 630 00:34:45,758 --> 00:34:48,494 computer science as you worked through this? 631 00:34:48,494 --> 00:34:50,660 MARK ZUCKERBERG: The internet is a pretty good tool. 632 00:34:50,660 --> 00:34:54,300 633 00:34:54,300 --> 00:35:00,120 I think that that's how we did most of it. 634 00:35:00,120 --> 00:35:04,470 I mean, we kind of make a point of not hiring people for skills, 635 00:35:04,470 --> 00:35:08,760 because I guess the theory is if someone has skills in an area 636 00:35:08,760 --> 00:35:11,600 and has been doing it for 10 or 15 years, 637 00:35:11,600 --> 00:35:13,890 then that's probably what they can do. 638 00:35:13,890 --> 00:35:16,230 And that's good, and that mean that they can do that. 639 00:35:16,230 --> 00:35:19,310 >> But if you hire someone, say, right out of college, 640 00:35:19,310 --> 00:35:22,520 or someone younger who you're just hiring them for raw intelligence, 641 00:35:22,520 --> 00:35:25,907 then the idea is that they're going to be able to learn stuff really quickly. 642 00:35:25,907 --> 00:35:28,490 And there's a lot of information available all over the place, 643 00:35:28,490 --> 00:35:32,900 and now, withing recent years, there's good tools for sorting through that. 644 00:35:32,900 --> 00:35:38,320 And I think that the most performant people we have 645 00:35:38,320 --> 00:35:43,080 are sort of younger people, who didn't necessarily know that much about 646 00:35:43,080 --> 00:35:45,190 anything specific coming out of college. 647 00:35:45,190 --> 00:35:48,020 >> I mean, a good example is-- Dustin, my roommate at Harvard 648 00:35:48,020 --> 00:35:49,030 wasn't even a CS major. 649 00:35:49,030 --> 00:35:50,170 He was an economics major. 650 00:35:50,170 --> 00:35:54,260 And he's just a really smart dude, and was able to pick it up. 651 00:35:54,260 --> 00:35:56,510 Some of the other good people we have are 652 00:35:56,510 --> 00:36:00,220 EE majors out of Stanford or Berkeley. 653 00:36:00,220 --> 00:36:02,610 And they aren't even CS all the time. 654 00:36:02,610 --> 00:36:05,040 Like math people-- if you studied math, you 655 00:36:05,040 --> 00:36:07,610 can learn the stuff relatively quickly a lot of the time. 656 00:36:07,610 --> 00:36:13,170 657 00:36:13,170 --> 00:36:14,042 Yeah? 658 00:36:14,042 --> 00:36:17,706 >> AUDIENCE: I guess, since you have the infrastructure in place, right now, 659 00:36:17,706 --> 00:36:21,414 when you focus on your hiring, so you still look for tech skill people? 660 00:36:21,414 --> 00:36:24,747 Or do you look for people who might have the business knowledge to help grow you 661 00:36:24,747 --> 00:36:25,913 further and make more money? 662 00:36:25,913 --> 00:36:32,099 What's actually the priority right now in growing the company? 663 00:36:32,099 --> 00:36:33,890 MARK ZUCKERBERG: I never really hire people 664 00:36:33,890 --> 00:36:37,850 just because they have business skills. 665 00:36:37,850 --> 00:36:42,320 It's actually kind of funny, but knowledge of a lot of core CS stuff 666 00:36:42,320 --> 00:36:44,049 is really important in business, too. 667 00:36:44,049 --> 00:36:46,590 One of the main things that you learn when you're studying CS 668 00:36:46,590 --> 00:36:52,820 is complexity and scale, and that is a huge issue in business, too. 669 00:36:52,820 --> 00:36:56,370 How do you go from having five people to 100 people, 670 00:36:56,370 --> 00:37:00,410 and what's the change in the dynamic there? 671 00:37:00,410 --> 00:37:03,010 And like, how are certain processes-- how 672 00:37:03,010 --> 00:37:07,320 is a sales force going to scale from five people to 100 people? 673 00:37:07,320 --> 00:37:10,760 >> It's like the same type of intelligence that 674 00:37:10,760 --> 00:37:12,680 can figure out both of those problems. 675 00:37:12,680 --> 00:37:15,805 And it might be a different type of person who cares to solve the problems. 676 00:37:15,805 --> 00:37:21,670 >> But I think that the second part of my answer to what you said 677 00:37:21,670 --> 00:37:24,480 is that I think we're sort of continually 678 00:37:24,480 --> 00:37:26,810 in the process of building out infrastructure, 679 00:37:26,810 --> 00:37:29,110 and I don't think you ever get out of that process. 680 00:37:29,110 --> 00:37:32,850 And we're kind of focusing not on just building something 681 00:37:32,850 --> 00:37:34,810 and figuring out how to make money off of it 682 00:37:34,810 --> 00:37:38,550 and sort of maximizing the value of our business in the short term-- 683 00:37:38,550 --> 00:37:45,250 but instead, sort of always looking to maximize 684 00:37:45,250 --> 00:37:47,340 what the long term value would be. 685 00:37:47,340 --> 00:37:49,690 And I think that in doing that, you kind of 686 00:37:49,690 --> 00:37:52,836 need to always just be building out your base, and not at any time 687 00:37:52,836 --> 00:37:54,460 be worried about maximizing your money. 688 00:37:54,460 --> 00:37:59,828 689 00:37:59,828 --> 00:38:02,268 >> AUDIENCE: This is sort of back to the [INAUDIBLE] 690 00:38:02,268 --> 00:38:05,137 Facebook, but do you guys have issue like the day after college, 691 00:38:05,137 --> 00:38:07,636 maybe something like that, with everybody uploading pictures 692 00:38:07,636 --> 00:38:12,125 all at the same time, [INAUDIBLE]? 693 00:38:12,125 --> 00:38:14,000 MARK ZUCKERBERG: Our peaks are pretty strong. 694 00:38:14,000 --> 00:38:17,700 So like at 5:00 in the morning, no matter 695 00:38:17,700 --> 00:38:20,750 how many users we have signed up, there's always like 5,000 people, 696 00:38:20,750 --> 00:38:21,660 and that's it. 697 00:38:21,660 --> 00:38:26,980 And then if you get to 9:00 PM Pacific-- so like midnight here-- 698 00:38:26,980 --> 00:38:29,900 which I guess is like the peak across the country, 699 00:38:29,900 --> 00:38:34,390 it's close to 400,000 people using it simultaneously. 700 00:38:34,390 --> 00:38:41,500 >> And it's actually kind of interesting, because we monitor these graphs 701 00:38:41,500 --> 00:38:43,741 and we have this huge LCD in our office, and whenever 702 00:38:43,741 --> 00:38:46,490 there's a blip in the traffic, we're like, oh crap, what happened? 703 00:38:46,490 --> 00:38:48,796 And a lot of times it's like Laguna Beach. 704 00:38:48,796 --> 00:38:53,330 >> [CHUCKLES] 705 00:38:53,330 --> 00:38:58,617 >> But usually it doesn't swing that far the other way. 706 00:38:58,617 --> 00:39:01,563 >> AUDIENCE: With your archive [INAUDIBLE], if someone deletes something 707 00:39:01,563 --> 00:39:05,719 from their profile, do you keep a cache of that, and how long? 708 00:39:05,719 --> 00:39:07,260 MARK ZUCKERBERG: Right now, we don't. 709 00:39:07,260 --> 00:39:10,334 But we may at some point in the future. 710 00:39:10,334 --> 00:39:13,564 >> AUDIENCE: To follow up on that, what kind of issues 711 00:39:13,564 --> 00:39:15,634 do you talk about at the company in terms 712 00:39:15,634 --> 00:39:19,280 of privacy and security, all those things? 713 00:39:19,280 --> 00:39:21,765 Are you worried about it at all? 714 00:39:21,765 --> 00:39:25,360 You've put your [INAUDIBLE] privacy and security statement online. 715 00:39:25,360 --> 00:39:28,240 So you just put it up and then not worry about it? 716 00:39:28,240 --> 00:39:33,010 >> MARK ZUCKERBERG: Well, I think that what makes Facebook fun 717 00:39:33,010 --> 00:39:37,140 and useful is that there's a lot of information about a lot of people 718 00:39:37,140 --> 00:39:37,970 that you can get. 719 00:39:37,970 --> 00:39:40,820 But what's more important is that the information 720 00:39:40,820 --> 00:39:43,740 is available to the people who that person wants that information 721 00:39:43,740 --> 00:39:44,810 to be available to. 722 00:39:44,810 --> 00:39:47,271 And the flip side of that is that the information 723 00:39:47,271 --> 00:39:50,270 is available to the people that want to have access to that information. 724 00:39:50,270 --> 00:39:54,080 >> So one of the kind of core decisions that we made 725 00:39:54,080 --> 00:39:59,160 was only to let people at the same school see each other's profiles. 726 00:39:59,160 --> 00:40:02,580 And I guess the idea behind that was that you're at Harvard. 727 00:40:02,580 --> 00:40:05,400 You probably wouldn't have that hard of a time just letting 728 00:40:05,400 --> 00:40:07,510 someone else at Harvard see your information. 729 00:40:07,510 --> 00:40:11,030 But at the same time, it's like only people at Harvard, 730 00:40:11,030 --> 00:40:14,210 who you're probably going to see on a day-to-day basis and maybe meet, 731 00:40:14,210 --> 00:40:16,750 who are ever going to want to look you up. 732 00:40:16,750 --> 00:40:19,206 It's not like some kid out at Stanford who you will never 733 00:40:19,206 --> 00:40:22,330 talk to is going to be interested in knowing what your cell phone number is 734 00:40:22,330 --> 00:40:23,900 or what you're interested in. 735 00:40:23,900 --> 00:40:28,030 >> So by limiting the scope of the information 736 00:40:28,030 --> 00:40:32,434 to sort of as narrow as makes sense, I think 737 00:40:32,434 --> 00:40:34,100 that we've solved a lot of those issues. 738 00:40:34,100 --> 00:40:36,050 And then, we also give people complete control 739 00:40:36,050 --> 00:40:39,630 over what parts of their profile get showed. 740 00:40:39,630 --> 00:40:42,100 So we don't force anyone to show anything, 741 00:40:42,100 --> 00:40:48,280 and we give people granular control over some of the more sensitive stuff. 742 00:40:48,280 --> 00:40:50,220 >> So like, right next to the cell phone field, 743 00:40:50,220 --> 00:40:53,160 there's another field that's like, who do you want to show this to? 744 00:40:53,160 --> 00:40:57,300 Just your friends, just people at your school, what? 745 00:40:57,300 --> 00:40:59,060 We care about it, because if people stop-- 746 00:40:59,060 --> 00:41:01,330 if people feel like their information isn't private, 747 00:41:01,330 --> 00:41:05,880 then that screws us in the long term, too. 748 00:41:05,880 --> 00:41:09,050 >> AUDIENCE: Just furthering on that, I guess even though you 749 00:41:09,050 --> 00:41:11,840 put the information up yourself, what's the recourse in case, 750 00:41:11,840 --> 00:41:15,065 say, you have a photo, and somebody puts that photo up 751 00:41:15,065 --> 00:41:17,190 on some message board or some Hot or Not type site. 752 00:41:17,190 --> 00:41:21,052 How do you control what users do with the information that's 753 00:41:21,052 --> 00:41:22,432 input onto your servers? 754 00:41:22,432 --> 00:41:25,515 MARK ZUCKERBERG: It's very hard to control what people do with information 755 00:41:25,515 --> 00:41:27,604 that they have access to. 756 00:41:27,604 --> 00:41:28,470 Right? 757 00:41:28,470 --> 00:41:33,880 I mean, the best that we can do is give people control over their information 758 00:41:33,880 --> 00:41:34,750 and who can see it. 759 00:41:34,750 --> 00:41:38,120 And then once they let someone see it, it's sort of out of anyone's control. 760 00:41:38,120 --> 00:41:41,114 761 00:41:41,114 --> 00:41:45,106 >> AUDIENCE: I'm curious a bit about [INAUDIBLE] Wall feature. 762 00:41:45,106 --> 00:41:48,553 It seemed to start out maybe more like blackboard type of thing, and then it 763 00:41:48,553 --> 00:41:51,094 completely changed around. [INAUDIBLE] like one or the other, 764 00:41:51,094 --> 00:41:53,260 or if there was something that you were thinking of? 765 00:41:53,260 --> 00:41:57,090 Or was there a design change in the process of doing [INAUDIBLE]? 766 00:41:57,090 --> 00:42:00,410 >> MARK ZUCKERBERG: So I originally threw that together in like a half an hour. 767 00:42:00,410 --> 00:42:07,640 And I guess it was pretty complicated, because-- or it 768 00:42:07,640 --> 00:42:10,170 was more complicated than I thought it was going to be. 769 00:42:10,170 --> 00:42:12,055 And I think part of the reason why we changed 770 00:42:12,055 --> 00:42:14,430 it was because it didn't work as well as we wanted it to. 771 00:42:14,430 --> 00:42:17,520 I mean, the original goal was to sort of make it 772 00:42:17,520 --> 00:42:22,250 so that you can have this wiki type thing on people's profiles, 773 00:42:22,250 --> 00:42:29,400 that when you moused over something, it showed who added that part of it. 774 00:42:29,400 --> 00:42:33,540 >> But I guess there were a lot of cases that we missed, 775 00:42:33,540 --> 00:42:35,960 or it just wasn't well designed by me. 776 00:42:35,960 --> 00:42:40,090 And I don't know if you guys remember, but you used to mouse over stuff, 777 00:42:40,090 --> 00:42:41,350 and it just wasn't as good. 778 00:42:41,350 --> 00:42:43,870 And like, it might tell you the wrong person, 779 00:42:43,870 --> 00:42:46,120 or it might highlight more than it was supposed to. 780 00:42:46,120 --> 00:42:53,186 >> So I kind of coupled that with thinking, this isn't even the best feature. 781 00:42:53,186 --> 00:42:56,310 It would be much more interesting if instead of having to mouse over stuff, 782 00:42:56,310 --> 00:43:00,330 people could just see the picture and the name of the person who 783 00:43:00,330 --> 00:43:04,140 posted everything, without having to go through the whole wall. 784 00:43:04,140 --> 00:43:07,350 So over the summer, we just kind of went through 785 00:43:07,350 --> 00:43:10,670 and wrote a better parser for the walls and tried to decompose them. 786 00:43:10,670 --> 00:43:13,420 And then, going forward, we made it so that you just added a post, 787 00:43:13,420 --> 00:43:14,878 and it went to the top of the wall. 788 00:43:14,878 --> 00:43:18,020 789 00:43:18,020 --> 00:43:20,400 >> AUDIENCE: [INAUDIBLE] question. 790 00:43:20,400 --> 00:43:23,270 Where'd you get the idea from, for creating Facebook? 791 00:43:23,270 --> 00:43:24,410 >> MARK ZUCKERBERG: I just wanted to make something 792 00:43:24,410 --> 00:43:26,368 where people can type in someone's name and get 793 00:43:26,368 --> 00:43:28,594 some information about a person. 794 00:43:28,594 --> 00:43:29,977 I thought that would be cool. 795 00:43:29,977 --> 00:43:35,566 796 00:43:35,566 --> 00:43:37,060 Oh, yeah? 797 00:43:37,060 --> 00:43:39,301 >> AUDIENCE: I'm interested in the feature that you 798 00:43:39,301 --> 00:43:44,540 could SMS some [INAUDIBLE] information if you wanted and send it back. 799 00:43:44,540 --> 00:43:46,290 I didn't know about people using it. 800 00:43:46,290 --> 00:43:49,970 So I'm just wondering if there actual considerations [INAUDIBLE]? 801 00:43:49,970 --> 00:43:58,160 >> MARK ZUCKERBERG: So the SMS Gateways also have an email counterpart, 802 00:43:58,160 --> 00:44:05,400 so if your phone numbers is x and you have Cingular as your provider, 803 00:44:05,400 --> 00:44:11,080 then you could email x@cingular.com or some variant of that, 804 00:44:11,080 --> 00:44:13,500 and the text message would go to your phone. 805 00:44:13,500 --> 00:44:15,950 And that's a free gateway. 806 00:44:15,950 --> 00:44:18,880 So, you know when you text message people, a lot of times 807 00:44:18,880 --> 00:44:22,070 depending on what your cell phone plan is, it will cost you money. 808 00:44:22,070 --> 00:44:24,850 If you do it through email, it actually doesn't cost any money. 809 00:44:24,850 --> 00:44:30,370 So that's how we chose to do it. 810 00:44:30,370 --> 00:44:33,710 We were doing a high volume of them and we 811 00:44:33,710 --> 00:44:40,450 decided that it would just be a better thing for us to-- to actually do it 812 00:44:40,450 --> 00:44:44,352 the legit way and send a text message directly to the cell phone, 813 00:44:44,352 --> 00:44:46,310 as opposed to going through the email gateways. 814 00:44:46,310 --> 00:44:48,643 So we're kind of the process of getting that set up now. 815 00:44:48,643 --> 00:44:51,784 816 00:44:51,784 --> 00:45:00,568 >> AUDIENCE: [INAUDIBLE] Myspace [INAUDIBLE]? 817 00:45:00,568 --> 00:45:04,340 >> MARK ZUCKERBERG: I think that we're always looking for more stuff to do. 818 00:45:04,340 --> 00:45:07,170 I don't think that we're competing with Myspace. 819 00:45:07,170 --> 00:45:10,030 And I think it's kind of a different type of application. 820 00:45:10,030 --> 00:45:10,530 Yeah. 821 00:45:10,530 --> 00:45:13,860 822 00:45:13,860 --> 00:45:14,985 AUDIENCE: I'm just curious. 823 00:45:14,985 --> 00:45:20,182 Is there a particular reason why on a person's profiles and school emails 824 00:45:20,182 --> 00:45:24,885 and stuff [INAUDIBLE] and not as text can be copied and pasted? 825 00:45:24,885 --> 00:45:26,865 Is that [INAUDIBLE]? 826 00:45:26,865 --> 00:45:30,200 >> MARK ZUCKERBERG: So I did that so that people 827 00:45:30,200 --> 00:45:32,700 couldn't go through and scrape the pages. 828 00:45:32,700 --> 00:45:35,260 We have a lot of stuff that we put in place 829 00:45:35,260 --> 00:45:39,240 to make sure that people don't aggregate information off of Facebook. 830 00:45:39,240 --> 00:45:42,520 You obviously, you can't see profiles of people at other schools. 831 00:45:42,520 --> 00:45:45,010 But also if you try to view a lot of profiles, 832 00:45:45,010 --> 00:45:50,770 it picks up that you're just viewing an abnormal number of profiles. 833 00:45:50,770 --> 00:45:54,160 >> And we also sort of-- just by analyzing user activity, 834 00:45:54,160 --> 00:45:58,710 we've built these Bayesian filters that I guess just let us pick out 835 00:45:58,710 --> 00:46:02,190 abnormal activity, like really quickly, and just kind of show 836 00:46:02,190 --> 00:46:04,630 very limited information to those users. 837 00:46:04,630 --> 00:46:06,849 But one of the things that we wanted to do, 838 00:46:06,849 --> 00:46:09,890 we want to make sure-- we want to make it especially difficult for anyone 839 00:46:09,890 --> 00:46:12,100 to try to scrape email addresses, because that's 840 00:46:12,100 --> 00:46:14,310 really annoying-- if people get spammed. 841 00:46:14,310 --> 00:46:16,470 So we figured that by making it an image, 842 00:46:16,470 --> 00:46:20,020 instead of plain text, that just added an extra level of complexity 843 00:46:20,020 --> 00:46:21,870 in terms of scraping. 844 00:46:21,870 --> 00:46:27,337 845 00:46:27,337 --> 00:46:33,301 >> AUDIENCE: [INAUDIBLE] pretty valuable resources that [INAUDIBLE]. 846 00:46:33,301 --> 00:46:36,780 847 00:46:36,780 --> 00:46:40,259 Do you do anything [INAUDIBLE]? 848 00:46:40,259 --> 00:46:47,220 >> MARK ZUCKERBERG: Well, we can use it to target posters to you, for example. 849 00:46:47,220 --> 00:46:49,470 I don't know if any of you bought posters off of that. 850 00:46:49,470 --> 00:46:55,230 But we sort of-- we're trying to figure out what we can do that, 851 00:46:55,230 --> 00:46:58,710 but we're obviously really sensitive to people's privacy. 852 00:46:58,710 --> 00:47:00,240 And what's that? 853 00:47:00,240 --> 00:47:02,740 >> AUDIENCE: Not so much for individual [INAUDIBLE], 854 00:47:02,740 --> 00:47:05,774 but just as a whole [INAUDIBLE]? 855 00:47:05,774 --> 00:47:06,690 MARK ZUCKERBERG: Yeah. 856 00:47:06,690 --> 00:47:08,940 I think we're actually going to be releasing something 857 00:47:08,940 --> 00:47:13,740 in late this week or next week that shows some aggregate statistics that we 858 00:47:13,740 --> 00:47:15,710 think are interesting. 859 00:47:15,710 --> 00:47:19,637 I mean, this is the stuff is kind of cool, but it's not the type of thing 860 00:47:19,637 --> 00:47:20,970 that you come back to every day. 861 00:47:20,970 --> 00:47:25,948 862 00:47:25,948 --> 00:47:27,675 No CS questions? 863 00:47:27,675 --> 00:47:31,548 864 00:47:31,548 --> 00:47:33,756 MICHAEL D. SMITH: Do you have any questions for Mark? 865 00:47:33,756 --> 00:47:37,619 He might be willing to stay around for a couple of minutes, 866 00:47:37,619 --> 00:47:40,035 in case people want to not ask you in public, but have a-- 867 00:47:40,035 --> 00:47:40,920 >> MARK ZUCKERBERG: AUDIENCE: I'm especially 868 00:47:40,920 --> 00:47:43,990 disappointed that Will Chen didn't ask me any questions. 869 00:47:43,990 --> 00:47:46,490 >> MICHAEL D. SMITH: We'll work on Will later. 870 00:47:46,490 --> 00:47:46,990 That's it? 871 00:47:46,990 --> 00:47:47,490 No more? 872 00:47:47,490 --> 00:47:51,484 We've got a couple more. 873 00:47:51,484 --> 00:47:52,400 MARK ZUCKERBERG: Cool. 874 00:47:52,400 --> 00:47:54,240 AUDIENCE: Do you ever procrastinate on Facebook, 875 00:47:54,240 --> 00:47:55,620 like everyone else in the room? 876 00:47:55,620 --> 00:47:57,115 >> MARK ZUCKERBERG: What's that? 877 00:47:57,115 --> 00:47:59,355 >> AUDIENCE: Do you ever procrastinate on Facebook? 878 00:47:59,355 --> 00:48:00,606 >> MARK ZUCKERBERG: Of course. 879 00:48:00,606 --> 00:48:01,562 >> AUDIENCE: [INAUDIBLE]. 880 00:48:01,562 --> 00:48:04,570 >> MARK ZUCKERBERG: I mean, I think that there's 881 00:48:04,570 --> 00:48:07,640 a value to what people do on the site. 882 00:48:07,640 --> 00:48:11,824 883 00:48:11,824 --> 00:48:14,264 >> AUDIENCE: I just know that probably many of us 884 00:48:14,264 --> 00:48:16,140 would feel that the hours [INAUDIBLE]. 885 00:48:16,140 --> 00:48:17,390 >> MICHAEL D. SMITH: [INAUDIBLE]. 886 00:48:17,390 --> 00:48:20,300 887 00:48:20,300 --> 00:48:22,891 >> MARK ZUCKERBERG: Yeah, of course. 888 00:48:22,891 --> 00:48:25,974 AUDIENCE: I don't know if you can say this, but what kinds of features can 889 00:48:25,974 --> 00:48:27,918 we expect in the future? 890 00:48:27,918 --> 00:48:30,348 [INAUDIBLE] 891 00:48:30,348 --> 00:48:34,782 >> MARK ZUCKERBERG: Well, I can tell you what we're going to do next two weeks. 892 00:48:34,782 --> 00:48:36,740 There's the thing that I just mentioned before, 893 00:48:36,740 --> 00:48:40,830 where we're aggregating a bunch of stats, and just show what's hot 894 00:48:40,830 --> 00:48:42,760 and what's changing. 895 00:48:42,760 --> 00:48:45,010 And also surprising statistics that we've 896 00:48:45,010 --> 00:48:48,492 found, like 2% of people at Harvard are Libertarian, for example, 897 00:48:48,492 --> 00:48:49,450 or something like that. 898 00:48:49,450 --> 00:48:52,240 899 00:48:52,240 --> 00:48:57,100 I think another thing that we're going to launch hopefully 900 00:48:57,100 --> 00:48:59,770 sometime either late this week or next week, 901 00:48:59,770 --> 00:49:03,250 is something that allows people to clarify 902 00:49:03,250 --> 00:49:05,160 their relationships with other People. 903 00:49:05,160 --> 00:49:10,130 >> So a lot of the problems that we kind of deal with at Facebook 904 00:49:10,130 --> 00:49:14,140 aren't always technical, but there are sometimes like they're social problems. 905 00:49:14,140 --> 00:49:16,310 And it's like-- one thing that I think is 906 00:49:16,310 --> 00:49:20,976 really interesting is-- if you have 100 or 150 friends, how well do 907 00:49:20,976 --> 00:49:24,700 you know each of those people, and who are maybe like the five people 908 00:49:24,700 --> 00:49:27,640 who you actually care about, like a lot. 909 00:49:27,640 --> 00:49:29,490 And that's not something that you can really 910 00:49:29,490 --> 00:49:32,110 answer right now, because the connections are binary. 911 00:49:32,110 --> 00:49:33,970 You either are connected or you're not. 912 00:49:33,970 --> 00:49:39,740 So I've been trying to think for a while about how we could design something 913 00:49:39,740 --> 00:49:43,990 that would make it so that people could express how close they were 914 00:49:43,990 --> 00:49:46,610 to people, in sort of an unbiased way. 915 00:49:46,610 --> 00:49:49,790 >> So you can imagine, if you made a feature that was just like-- rate 916 00:49:49,790 --> 00:49:52,406 your friendship on a scale of 1 to 10, that would not work. 917 00:49:52,406 --> 00:49:54,280 Because first of all, no one would want to do 918 00:49:54,280 --> 00:49:58,370 that because you're insulting someone if you're like, you're a three. 919 00:49:58,370 --> 00:50:01,910 But it's also kind of boring, and so no one 920 00:50:01,910 --> 00:50:03,410 would want to do it because of that. 921 00:50:03,410 --> 00:50:05,993 And it would just be skewed by social pressure in the same way 922 00:50:05,993 --> 00:50:06,930 that the friends are. 923 00:50:06,930 --> 00:50:11,730 Some people have a different sense of what a friend is to them, 924 00:50:11,730 --> 00:50:13,040 then another person would. 925 00:50:13,040 --> 00:50:16,050 So if someone has 30 friends and another person has 150 friends, 926 00:50:16,050 --> 00:50:18,510 does that person actually have more friends in real life? 927 00:50:18,510 --> 00:50:20,810 Maybe or maybe not, and maybe the person with 30 just 928 00:50:20,810 --> 00:50:24,060 has a higher threshold for making someone on a friend on Facebook. 929 00:50:24,060 --> 00:50:28,490 >> So I mean, I guess that the solution that we came up with for this 930 00:50:28,490 --> 00:50:32,860 was to make-- to judge relationships based 931 00:50:32,860 --> 00:50:36,660 on bi-directional, factual statements. 932 00:50:36,660 --> 00:50:40,570 So for example, I took CS50 with this person. 933 00:50:40,570 --> 00:50:43,730 Or I lived in a house with this person. 934 00:50:43,730 --> 00:50:49,920 And there's just kind of a bunch of different ways to do stuff like that. 935 00:50:49,920 --> 00:50:54,730 But I figured that that would probably be a little more accurate, 936 00:50:54,730 --> 00:50:58,570 because no one is going to-- there's no pressure 937 00:50:58,570 --> 00:51:00,080 to lie about something like that. 938 00:51:00,080 --> 00:51:01,830 It's not like, what are you talking about? 939 00:51:01,830 --> 00:51:03,126 I didn't take CS50 with you. 940 00:51:03,126 --> 00:51:05,500 But if someone aggregates a lot of different connections, 941 00:51:05,500 --> 00:51:07,340 then that kind of means something. 942 00:51:07,340 --> 00:51:10,842 So when you take someone like Dustin, who's my roommate here, 943 00:51:10,842 --> 00:51:13,300 and it's like OK, well we lived together at Kirkland House. 944 00:51:13,300 --> 00:51:16,290 Then we worked on Facebook. 945 00:51:16,290 --> 00:51:20,760 Then we moved out to Palo Alto, and now we're still working on Facebook-- then 946 00:51:20,760 --> 00:51:26,470 maybe that's enough connections to say OK, well this person clearly 947 00:51:26,470 --> 00:51:28,810 has a lot to do this person. 948 00:51:28,810 --> 00:51:34,820 Whereas if the only category that you know someone through is, 949 00:51:34,820 --> 00:51:39,384 this person's my Facebook friend, then that also means something. 950 00:51:39,384 --> 00:51:40,050 So I don't know. 951 00:51:40,050 --> 00:51:41,008 We'll see how it works. 952 00:51:41,008 --> 00:51:42,771 Nothing is for sure. 953 00:51:42,771 --> 00:51:43,270 What's up? 954 00:51:43,270 --> 00:51:47,243 >> AUDIENCE: Do you actually [INAUDIBLE] people typing in information 955 00:51:47,243 --> 00:51:47,743 [INAUDIBLE]? 956 00:51:47,743 --> 00:51:53,707 957 00:51:53,707 --> 00:51:55,760 >> MARK ZUCKERBERG: It's a combination. 958 00:51:55,760 --> 00:52:01,670 So I think that another thing that's pretty important for each 959 00:52:01,670 --> 00:52:03,750 of these events is the date at which they occur. 960 00:52:03,750 --> 00:52:08,940 So if you had, for example, a date on each person's friendship 961 00:52:08,940 --> 00:52:14,662 with each person then that would give you a more accurate representation 962 00:52:14,662 --> 00:52:16,370 of what that meant, because right now you 963 00:52:16,370 --> 00:52:20,482 don't know what friend means to each of the people on the network. 964 00:52:20,482 --> 00:52:22,940 And because you don't know when that friendship was formed, 965 00:52:22,940 --> 00:52:25,476 you don't know what has changed in that relationship 966 00:52:25,476 --> 00:52:26,850 since that friendship was formed. 967 00:52:26,850 --> 00:52:29,560 >> I mean if the person-- if friendship means very little to someone 968 00:52:29,560 --> 00:52:34,130 if you know that that happened yesterday, that they became friends, 969 00:52:34,130 --> 00:52:37,160 you still know that there's some-- that there's some strength. 970 00:52:37,160 --> 00:52:39,030 It's like a certainty thing. 971 00:52:39,030 --> 00:52:41,330 There's a lower certainty that their relationship 972 00:52:41,330 --> 00:52:45,320 has diverged since that point if the date at which the action occurred 973 00:52:45,320 --> 00:52:45,820 was sooner. 974 00:52:45,820 --> 00:52:48,390 975 00:52:48,390 --> 00:52:49,374 Sorry, more recent. 976 00:52:49,374 --> 00:52:52,040 So I think that's one of the things that we're focusing on here. 977 00:52:52,040 --> 00:52:54,990 So I took a course-- I took CS50 with someone 978 00:52:54,990 --> 00:52:57,730 this term is a lot different than saying I'm a senior now 979 00:52:57,730 --> 00:53:02,680 and I took CS50 with this person when I was a freshman. 980 00:53:02,680 --> 00:53:06,050 >> A lot of these-- the analysis of how people look at this 981 00:53:06,050 --> 00:53:09,494 and see the relationships isn't necessarily-- 982 00:53:09,494 --> 00:53:11,410 Facebook isn't going to rate the relationship. 983 00:53:11,410 --> 00:53:14,290 It's sort of-- people have an implicit understanding 984 00:53:14,290 --> 00:53:17,540 of what the difference is between having taken CS50 with someone this term 985 00:53:17,540 --> 00:53:20,840 and having taken CS50 within three years ago. 986 00:53:20,840 --> 00:53:22,920 And I think that will kind of help out. 987 00:53:22,920 --> 00:53:26,320 988 00:53:26,320 --> 00:53:27,340 What's up? 989 00:53:27,340 --> 00:53:30,479 >> AUDIENCE: When you get a new idea and you 990 00:53:30,479 --> 00:53:33,619 think it's pretty cool, how [INAUDIBLE] with how you go about it? 991 00:53:33,619 --> 00:53:38,460 992 00:53:38,460 --> 00:53:40,470 >> MARK ZUCKERBERG: Not too. 993 00:53:40,470 --> 00:53:42,500 Because I think that a lot of the stuff, we sort 994 00:53:42,500 --> 00:53:44,950 have a very unique platform for building it. 995 00:53:44,950 --> 00:53:48,010 I don't think there's any other company or group of people 996 00:53:48,010 --> 00:53:50,335 in the world who could develop this right now. 997 00:53:50,335 --> 00:53:53,470 998 00:53:53,470 --> 00:53:56,510 I mean even Google, with their like 5,000 engineers 999 00:53:56,510 --> 00:54:00,680 is not in the place to make an application that sort 1000 00:54:00,680 --> 00:54:04,180 of characterizes people's relationships like this. 1001 00:54:04,180 --> 00:54:06,410 >> And it's like the same thing with the photo tagging. 1002 00:54:06,410 --> 00:54:11,070 We can do that because photo tagging only works if everyone around you 1003 00:54:11,070 --> 00:54:11,965 is on the site. 1004 00:54:11,965 --> 00:54:14,090 Because otherwise you're going to get a type of use 1005 00:54:14,090 --> 00:54:15,845 for it where you go and you upload a photo 1006 00:54:15,845 --> 00:54:18,970 and you go to tag a bunch of people, and they're not there, and that sucks. 1007 00:54:18,970 --> 00:54:23,840 So even if 50% of the people at Harvard were on Facebook, then the tagging 1008 00:54:23,840 --> 00:54:25,740 and the way that we set up would still suck. 1009 00:54:25,740 --> 00:54:30,550 So it only works because 97% of the people at Harvard are on Facebook, 1010 00:54:30,550 --> 00:54:31,260 or whatever. 1011 00:54:31,260 --> 00:54:37,530 So because of that, it's like not that big of a concern. 1012 00:54:37,530 --> 00:54:40,780 1013 00:54:40,780 --> 00:54:41,320 Yeah? 1014 00:54:41,320 --> 00:54:43,361 >> AUDIENCE: So from sort of a software engineering, 1015 00:54:43,361 --> 00:54:46,721 sort of dynamic [INAUDIBLE] way, when somebody 1016 00:54:46,721 --> 00:54:51,046 has one of these ideas-- like let's aggregate this [? wider ?] statistic 1017 00:54:51,046 --> 00:54:53,921 and tell people, or I have a way to measure this, that, and the other 1018 00:54:53,921 --> 00:54:57,610 about these people and mark up this thing on people's profiles-- 1019 00:54:57,610 --> 00:55:00,060 how do they go about getting the go-ahead from everyone 1020 00:55:00,060 --> 00:55:03,490 else in the company to spend some of their time technically working on that? 1021 00:55:03,490 --> 00:55:07,410 Or get other people to work on it with them, and stuff like that? 1022 00:55:07,410 --> 00:55:08,880 >> MARK ZUCKERBERG: Mhm. 1023 00:55:08,880 --> 00:55:14,340 I think that a lot of people-- I mean, the people who work at Facebook really 1024 00:55:14,340 --> 00:55:17,430 like working at Facebook, I think, for the most part, 1025 00:55:17,430 --> 00:55:19,840 and spend a lot of their time doing that. 1026 00:55:19,840 --> 00:55:22,800 And like, a lot of the time that they're spending, 1027 00:55:22,800 --> 00:55:25,240 they spend working on stuff that might be 1028 00:55:25,240 --> 00:55:28,890 sort of strategically important to what we're trying to do at that point. 1029 00:55:28,890 --> 00:55:31,760 But also, a lot of people just mess around with the code base, 1030 00:55:31,760 --> 00:55:36,090 and kind of put if-statements in there that's like, if the user is me, 1031 00:55:36,090 --> 00:55:39,240 then put this in there. 1032 00:55:39,240 --> 00:55:44,050 >> And so I walk around to different people's places during the day, 1033 00:55:44,050 --> 00:55:45,330 or people come and talk to me. 1034 00:55:45,330 --> 00:55:49,610 Like, I hold CEO office hours as a joke, like from 2:00 to 4:00 every day-- 1035 00:55:49,610 --> 00:55:51,077 not today. 1036 00:55:51,077 --> 00:55:53,910 And people just come and show me different stuff that they're doing, 1037 00:55:53,910 --> 00:55:57,060 and a lot of it is relatively cool, and stuff 1038 00:55:57,060 --> 00:55:59,530 that I wouldn't have necessarily thought of. 1039 00:55:59,530 --> 00:56:02,740 >> So I mean, you asked before if we were saving, 1040 00:56:02,740 --> 00:56:06,400 if we were archiving, old profile information, and one of the reasons 1041 00:56:06,400 --> 00:56:08,940 why I said that we might start doing it is 1042 00:56:08,940 --> 00:56:13,124 because one of the guys at the company came up with something where it's like, 1043 00:56:13,124 --> 00:56:16,290 so you go to your friend's page, and it shows your recently updated friends. 1044 00:56:16,290 --> 00:56:18,250 And then you click on that, and it shows their new profile. 1045 00:56:18,250 --> 00:56:20,180 But there's no indication of what changed. 1046 00:56:20,180 --> 00:56:25,290 >> So one of the guys made something that keeps an old version of his profile, 1047 00:56:25,290 --> 00:56:29,690 and then makes it so that when you go to his profile when he updates it, 1048 00:56:29,690 --> 00:56:32,380 it highlights in yellow the parts of it that were changed. 1049 00:56:32,380 --> 00:56:33,880 And I think that that's pretty cool. 1050 00:56:33,880 --> 00:56:37,385 And it's not a huge project-- I mean, it actually kind of is, 1051 00:56:37,385 --> 00:56:39,630 if we have to start storing everyone's information. 1052 00:56:39,630 --> 00:56:42,720 >> But I mean, it's somewhat cool. 1053 00:56:42,720 --> 00:56:48,250 It's not the type of thing that you necessarily are bound to come up, 1054 00:56:48,250 --> 00:56:52,820 but I definitely think it's a pretty big improvement over what we have now. 1055 00:56:52,820 --> 00:56:57,330 Now, it's really hard to go to someone's profile and tell what changed. 1056 00:56:57,330 --> 00:57:01,080 And that's just the most recent example that I have. 1057 00:57:01,080 --> 00:57:05,380 >> AUDIENCE: Do you have time to allow people to change the look of each page? 1058 00:57:05,380 --> 00:57:05,880 [INAUDIBLE]? 1059 00:57:05,880 --> 00:57:09,730 1060 00:57:09,730 --> 00:57:12,970 >> MARK ZUCKERBERG: So, I don't want to do that. 1061 00:57:12,970 --> 00:57:17,230 And the reason is because I think that Facebook is a directory, 1062 00:57:17,230 --> 00:57:20,051 and the primary purpose is to look up someone. 1063 00:57:20,051 --> 00:57:20,550 Right? 1064 00:57:20,550 --> 00:57:22,870 Like type in their name and get some information about them. 1065 00:57:22,870 --> 00:57:24,690 And one of the things that's really useful 1066 00:57:24,690 --> 00:57:27,190 is that everyone's page is structured in the same way. 1067 00:57:27,190 --> 00:57:29,100 >> So if you want to see if someone's single, 1068 00:57:29,100 --> 00:57:32,780 you don't have to scan down the columns until you get to relationship status. 1069 00:57:32,780 --> 00:57:34,430 You just know where that is. 1070 00:57:34,430 --> 00:57:38,040 So you click, go-- your eyes just go to that thing. 1071 00:57:38,040 --> 00:57:43,117 But if you had different people changing their CSSes in different ways, 1072 00:57:43,117 --> 00:57:44,950 then that could become annoying-- especially 1073 00:57:44,950 --> 00:57:49,140 if people are doing stuff like dark blue text on black backgrounds. 1074 00:57:49,140 --> 00:57:52,985 It just gets kind of obnoxious. 1075 00:57:52,985 --> 00:57:57,440 >> AUDIENCE: How successful has the Facebook [INAUDIBLE] been, 1076 00:57:57,440 --> 00:58:02,390 and what do you see as differences in the purpose [INAUDIBLE]? 1077 00:58:02,390 --> 00:58:05,360 1078 00:58:05,360 --> 00:58:08,520 >> MARK ZUCKERBERG: The purpose-- for me, the high school one was the same. 1079 00:58:08,520 --> 00:58:12,130 I think that the application-- this is going to probably 1080 00:58:12,130 --> 00:58:16,460 sound pretty stupid-- but wanting to look people up, I think, 1081 00:58:16,460 --> 00:58:19,260 is kind of a core human desire. 1082 00:58:19,260 --> 00:58:20,470 Right? 1083 00:58:20,470 --> 00:58:23,310 I think that people just want to know stuff about other people. 1084 00:58:23,310 --> 00:58:26,434 So I think that providing an interface where people can just 1085 00:58:26,434 --> 00:58:28,850 type in someone's name and get some information about them 1086 00:58:28,850 --> 00:58:31,050 is generally a pretty useful thing. 1087 00:58:31,050 --> 00:58:32,410 So growth has been pretty good. 1088 00:58:32,410 --> 00:58:35,680 >> It was tough to figure out exactly how to gauge it, 1089 00:58:35,680 --> 00:58:38,676 because when we did college, we opened it up at Harvard. 1090 00:58:38,676 --> 00:58:41,050 Then we opened it up at a couple colleges around Harvard. 1091 00:58:41,050 --> 00:58:45,710 And the idea was always, we were really short on money and equipment. 1092 00:58:45,710 --> 00:58:48,390 So while getting as little equipment as possible, 1093 00:58:48,390 --> 00:58:49,740 we want to maximize our growth. 1094 00:58:49,740 --> 00:58:53,709 So we want to launch at the schools that we 1095 00:58:53,709 --> 00:58:56,000 think are going to grow the quickest, based on the fact 1096 00:58:56,000 --> 00:58:58,458 that the people at those schools are going to have the most 1097 00:58:58,458 --> 00:59:01,655 number of friends at the schools that we're already at. 1098 00:59:01,655 --> 00:59:03,530 We took a different approach for high school, 1099 00:59:03,530 --> 00:59:05,670 because we could just launch it everywhere at the same time. 1100 00:59:05,670 --> 00:59:07,580 So we didn't really know how it was going to grow. 1101 00:59:07,580 --> 00:59:10,704 I think it's growing at more than 5,000 people a day, which is pretty good. 1102 00:59:10,704 --> 00:59:14,533 1103 00:59:14,533 --> 00:59:15,487 Yeah? 1104 00:59:15,487 --> 00:59:17,395 >> AUDIENCE: When you started Facebook, did you 1105 00:59:17,395 --> 00:59:19,727 intend for it to become this full-fledged business? 1106 00:59:19,727 --> 00:59:20,560 MARK ZUCKERBERG: No. 1107 00:59:20,560 --> 00:59:22,412 AUDIENCE: Well, how did you [INAUDIBLE]? 1108 00:59:22,412 --> 00:59:28,900 1109 00:59:28,900 --> 00:59:32,020 >> MARK ZUCKERBERG: I remember thinking that it would be cool 1110 00:59:32,020 --> 00:59:35,030 if you could have a directory of everyone. 1111 00:59:35,030 --> 00:59:38,320 I remember arguing with my parents about this, because after I almost 1112 00:59:38,320 --> 00:59:44,020 got kicked out of school for this project that I did before Facebook, 1113 00:59:44,020 --> 00:59:47,217 they were like, what good could possibly come of doing something new? 1114 00:59:47,217 --> 00:59:48,800 And I'm like, no, this is pretty cool. 1115 00:59:48,800 --> 00:59:52,605 Just imagine how cool it would be if you could just type in someone's name 1116 00:59:52,605 --> 00:59:54,120 and get some information about them. 1117 00:59:54,120 --> 00:59:56,050 And they were just like, I don't see it. 1118 00:59:56,050 --> 00:59:58,662 And I'm like, well, we'll just do it at Harvard for now, 1119 00:59:58,662 --> 01:00:01,620 but imagine what happens if one day, you can just type in anyone's name 1120 01:00:01,620 --> 01:00:02,940 and get some information about them. 1121 01:00:02,940 --> 01:00:04,790 And like, that would be kind of cool, right? 1122 01:00:04,790 --> 01:00:08,190 1123 01:00:08,190 --> 01:00:11,831 So they didn't buy it, but now they do. 1124 01:00:11,831 --> 01:00:15,100 >> [LAUGHTER] 1125 01:00:15,100 --> 01:00:16,620 >> Yeah, so I don't know. 1126 01:00:16,620 --> 01:00:20,450 I guess at each phase, we're just kind of looking at a natural way 1127 01:00:20,450 --> 01:00:22,890 to preserve the integrity of the network, 1128 01:00:22,890 --> 01:00:28,190 and also to make it so that it's more useful-- I 1129 01:00:28,190 --> 01:00:32,668 guess is the answer to that question. 1130 01:00:32,668 --> 01:00:34,129 Yeah? 1131 01:00:34,129 --> 01:00:38,512 >> AUDIENCE: Are there certain skills, particularly [INAUDIBLE], 1132 01:00:38,512 --> 01:00:42,895 that you [INAUDIBLE] or you would suggest for someone to study? 1133 01:00:42,895 --> 01:00:45,779 1134 01:00:45,779 --> 01:00:49,070 MARK ZUCKERBERG: I just suggest that you take the hardest courses that you can, 1135 01:00:49,070 --> 01:00:51,653 because you learn the most when you challenge yourself, right? 1136 01:00:51,653 --> 01:00:57,980 So like 161 just ruined my life, and I learned so much from it. 1137 01:00:57,980 --> 01:01:01,620 121 I also found pretty hard. 1138 01:01:01,620 --> 01:01:03,880 124 kind of changed the way I thought about stuff. 1139 01:01:03,880 --> 01:01:06,700 1140 01:01:06,700 --> 01:01:09,430 >> What 124 taught me that I think was really useful 1141 01:01:09,430 --> 01:01:13,840 was that there are-- I think a lot of people focus 1142 01:01:13,840 --> 01:01:16,630 on how to do stuff as well as possible, and how 1143 01:01:16,630 --> 01:01:18,620 to make the most efficient algorithm. 1144 01:01:18,620 --> 01:01:23,870 But what has always gotten us by isn't doing stuff in the most efficient way, 1145 01:01:23,870 --> 01:01:27,140 but laying the framework in a pretty efficient way. 1146 01:01:27,140 --> 01:01:29,640 So I mean, it kind of teaches you both sides of the problem, 1147 01:01:29,640 --> 01:01:33,980 like data structures and algorithms, and how the setup is really important. 1148 01:01:33,980 --> 01:01:36,843 And that's definitely saved our ass in scaling a lot of times. 1149 01:01:36,843 --> 01:01:40,100 1150 01:01:40,100 --> 01:01:40,870 >> I don't know. 1151 01:01:40,870 --> 01:01:42,400 Work with smart people. 1152 01:01:42,400 --> 01:01:43,150 Learn from people. 1153 01:01:43,150 --> 01:01:47,750 1154 01:01:47,750 --> 01:01:50,666 AUDIENCE: One of the things that I've noticed about Facebook, compared 1155 01:01:50,666 --> 01:01:55,388 to other social networking space, is that it's actually a lot easier to use. 1156 01:01:55,388 --> 01:02:01,350 Do you have people-- like your employees just putting whatever pieces they think 1157 01:02:01,350 --> 01:02:01,849 are cool. 1158 01:02:01,849 --> 01:02:06,830 Do you have separate stability people to ensure it all works all together? 1159 01:02:06,830 --> 01:02:09,280 >> MARK ZUCKERBERG: People can make whatever they want, 1160 01:02:09,280 --> 01:02:11,810 but that doesn't mean they can put it on the site. 1161 01:02:11,810 --> 01:02:21,650 So I think that before stuff goes on the site, a lot of people see it. 1162 01:02:21,650 --> 01:02:24,667 I mean, I definitely check off on it before it can go live. 1163 01:02:24,667 --> 01:02:27,750 But I mean, I think that people have a lot of creativity to do cool stuff. 1164 01:02:27,750 --> 01:02:32,520 And a lot of times, it's like someone can come up with a cool idea, 1165 01:02:32,520 --> 01:02:36,200 but that doesn't mean it's the final way that it would happen. 1166 01:02:36,200 --> 01:02:40,710 >> So for example, people highlighting in yellow what the changes are 1167 01:02:40,710 --> 01:02:44,510 in their profile-- I think that just the concept of highlighting 1168 01:02:44,510 --> 01:02:47,520 stuff that has changed is really good, but the interface 1169 01:02:47,520 --> 01:02:50,960 that that guy used for it isn't what I think is the best one. 1170 01:02:50,960 --> 01:02:54,130 And the way that he's storing the old profile information 1171 01:02:54,130 --> 01:02:55,410 isn't optimal either. 1172 01:02:55,410 --> 01:02:58,201 And that kind of is cool, because he was just doing it for himself. 1173 01:02:58,201 --> 01:03:01,790 But if we were ever going to make something live out of that, which 1174 01:03:01,790 --> 01:03:04,060 I want to, we do in a different way. 1175 01:03:04,060 --> 01:03:05,540 And it's more just like a mock-up. 1176 01:03:05,540 --> 01:03:07,814 >> AUDIENCE: So like, the ideas come from the ground, up, 1177 01:03:07,814 --> 01:03:10,230 and then [? it's just ?] [? tossed ?] [? down the line? ?] 1178 01:03:10,230 --> 01:03:12,260 >> MARK ZUCKERBERG: I mean, it goes both ways. 1179 01:03:12,260 --> 01:03:14,995 And I'm not completely unopinionated. 1180 01:03:14,995 --> 01:03:21,072 1181 01:03:21,072 --> 01:03:22,322 MICHAEL D. SMITH: [INAUDIBLE]. 1182 01:03:22,322 --> 01:03:28,298 1183 01:03:28,298 --> 01:03:30,788 >> AUDIENCE: I actually have a question about the [INAUDIBLE]. 1184 01:03:30,788 --> 01:03:35,270 So, going back about the [INAUDIBLE] and [INAUDIBLE] privacy. 1185 01:03:35,270 --> 01:03:37,760 And it's a different platform? 1186 01:03:37,760 --> 01:03:38,756 >> MARK ZUCKERBERG: Yeah. 1187 01:03:38,756 --> 01:03:41,744 >> AUDIENCE: So college people are over 18 and allowed 1188 01:03:41,744 --> 01:03:44,483 to post whatever pictures they want, and they're not really 1189 01:03:44,483 --> 01:03:47,720 incriminating themselves, except possibly for drugs and alcohol? 1190 01:03:47,720 --> 01:03:52,202 I've seen pictures on Facebook where my younger 1191 01:03:52,202 --> 01:03:54,692 cousins are drinking and stuff like that. 1192 01:03:54,692 --> 01:04:00,170 But when you go to the high school kids, they're 15 and 16 and younger. 1193 01:04:00,170 --> 01:04:03,158 >> And are you guys just saying, it's the internet, 1194 01:04:03,158 --> 01:04:06,644 and if they want to incriminate themselves and things like that, 1195 01:04:06,644 --> 01:04:07,640 is that OK? 1196 01:04:07,640 --> 01:04:11,624 Or do you guys filter the pictures that high school students put up 1197 01:04:11,624 --> 01:04:13,118 and the information they write? 1198 01:04:13,118 --> 01:04:15,435 Or do you just [INAUDIBLE]? 1199 01:04:15,435 --> 01:04:18,310 MARK ZUCKERBERG: So a lot of the solutions that we come up with stuff 1200 01:04:18,310 --> 01:04:23,230 aren't technical or organizational, but just applying social pressure 1201 01:04:23,230 --> 01:04:24,580 in good ways. 1202 01:04:24,580 --> 01:04:28,740 So Myspace has-- almost a third of their staff 1203 01:04:28,740 --> 01:04:32,660 is monitoring the pictures that get uploaded for pornography. 1204 01:04:32,660 --> 01:04:36,270 We hardly ever have any pornography uploaded, 1205 01:04:36,270 --> 01:04:39,290 and I think that a lot of the reason is that people 1206 01:04:39,290 --> 01:04:44,470 use their real names on Facebook, and your real email address for school. 1207 01:04:44,470 --> 01:04:47,900 And if you have that, then you're not going to upload pornography. 1208 01:04:47,900 --> 01:04:50,830 And I think that that's a really simple social solution 1209 01:04:50,830 --> 01:04:56,060 to a possibly complex technical issue. 1210 01:04:56,060 --> 01:05:02,367 >> So that said, we changed some of the features around for high school. 1211 01:05:02,367 --> 01:05:04,200 For example, we took parties out, because we 1212 01:05:04,200 --> 01:05:06,370 figured that parents would get pissed off 1213 01:05:06,370 --> 01:05:09,280 or they would just break up all the keg parties really quickly, 1214 01:05:09,280 --> 01:05:10,744 and that would suck for everyone. 1215 01:05:10,744 --> 01:05:13,470 >> [CHUCKLES] 1216 01:05:13,470 --> 01:05:16,250 >> I don't know. 1217 01:05:16,250 --> 01:05:20,290 We deemphasize contact information in high school. 1218 01:05:20,290 --> 01:05:22,746 Yeah. 1219 01:05:22,746 --> 01:05:24,120 AUDIENCE: All right, we end here. 1220 01:05:24,120 --> 01:05:26,220 If you have other questions, feel free to come down and talk to Mark. 1221 01:05:26,220 --> 01:05:27,120 Thank you very much. 1222 01:05:27,120 --> 01:05:28,036 >> MARK ZUCKERBERG: Yeah. 1223 01:05:28,036 --> 01:05:34,457 [APPLAUSE]