1 00:00:00,000 --> 00:00:03,332 2 00:00:03,332 --> 00:00:04,800 DAVID MALAN: Well hello, world. 3 00:00:04,800 --> 00:00:08,370 My name is David Malan, and this is the last of CS50 seminars 4 00:00:08,370 --> 00:00:11,160 here at Harvard in fall of 2022. 5 00:00:11,160 --> 00:00:14,130 As you might know, we've been filming live the class again. 6 00:00:14,130 --> 00:00:16,260 And all of this past semester is lectures 7 00:00:16,260 --> 00:00:18,720 and now seminars will end up on edX and YouTube 8 00:00:18,720 --> 00:00:24,570 and other platforms on January 1 as part of what will be called CS50x 2023. 9 00:00:24,570 --> 00:00:27,180 In this particular seminar, I'm so excited that we're 10 00:00:27,180 --> 00:00:31,860 joined by Andrew Sellergren, an alum of Harvard who actually took CS50 11 00:00:31,860 --> 00:00:35,520 the very first year I taught it, for better or for worse. 12 00:00:35,520 --> 00:00:39,210 I think it's fair to say that surely everything he knows 13 00:00:39,210 --> 00:00:41,637 can be traced back to me, it would seem. 14 00:00:41,637 --> 00:00:44,970 But that's actually not the case because if you read the description for today's 15 00:00:44,970 --> 00:00:49,440 talk on applications of artificial intelligence or AI in medicine, 16 00:00:49,440 --> 00:00:52,110 Andrew has been amazingly self-taught. 17 00:00:52,110 --> 00:00:54,090 In fact, when he was in college here only 18 00:00:54,090 --> 00:00:58,540 took CS50 but then went on nonetheless to work for Google in various roles 19 00:00:58,540 --> 00:00:59,040 since. 20 00:00:59,040 --> 00:01:02,340 Only just a few years ago did he dive into and teach himself 21 00:01:02,340 --> 00:01:04,739 the world of machine learning, which he now 22 00:01:04,739 --> 00:01:08,940 applies to real-world problems in medicine that are so incredibly 23 00:01:08,940 --> 00:01:11,010 impactful that we're so glad that he's here 24 00:01:11,010 --> 00:01:14,850 today to share not only that same trajectory, but a little bit 25 00:01:14,850 --> 00:01:16,960 about what he now does at Google. 26 00:01:16,960 --> 00:01:23,322 So without further ado, here is CS50's own and now Google's Andrew Sellergren. 27 00:01:23,322 --> 00:01:24,530 ANDREW SELLERGREN: Thanks. 28 00:01:24,530 --> 00:01:25,820 Thanks a lot, David. 29 00:01:25,820 --> 00:01:28,610 That was a really nice introduction. 30 00:01:28,610 --> 00:01:33,920 You said you would be funny, but I guess you didn't come through on that. 31 00:01:33,920 --> 00:01:41,870 No I really I know he exaggerates, but I do honestly trace a lot of this back 32 00:01:41,870 --> 00:01:45,080 to CS50 in particular. 33 00:01:45,080 --> 00:01:50,970 Let me go ahead and start sharing, and I'll kind of tell my story as well. 34 00:01:50,970 --> 00:01:54,740 OK, awesome. 35 00:01:54,740 --> 00:02:01,610 Great, yeah, I'll start by welcoming everyone and, thank you for joining. 36 00:02:01,610 --> 00:02:05,690 It's really exciting to be here, and it's really, really cool 37 00:02:05,690 --> 00:02:07,970 to see all the faces from around the world. 38 00:02:07,970 --> 00:02:12,890 That's been a real joy of mine to watch as the course grew. 39 00:02:12,890 --> 00:02:16,920 So as David mentioned, I took the course for the first time. 40 00:02:16,920 --> 00:02:22,490 I guess I should, I am going to date us by saying this, but it was 2007. 41 00:02:22,490 --> 00:02:26,720 It was my senior year in college, and I was a pre-med. 42 00:02:26,720 --> 00:02:29,510 I was getting ready to go to medical school. 43 00:02:29,510 --> 00:02:34,100 And my last year at Harvard, I was taking courses 44 00:02:34,100 --> 00:02:38,480 that I wouldn't normally take just for fun and CS50 was one of them. 45 00:02:38,480 --> 00:02:43,610 I actually already knew David from a previous venture 46 00:02:43,610 --> 00:02:46,730 and was really excited that he was going to be teaching it. 47 00:02:46,730 --> 00:02:51,260 So it was everything I wanted in terms of getting 48 00:02:51,260 --> 00:02:58,760 me excited about a new whole set of knowledge that I didn't have yet. 49 00:02:58,760 --> 00:03:03,080 And I wouldn't say there's not like a lightning strike 50 00:03:03,080 --> 00:03:06,290 there, where all of a sudden I decided I wanted to be a software engineer. 51 00:03:06,290 --> 00:03:08,270 I won't tell the story that way. 52 00:03:08,270 --> 00:03:13,100 But it was the beginning of a lot of lifelong learning. 53 00:03:13,100 --> 00:03:17,930 And that's the advice that I can give that is probably 54 00:03:17,930 --> 00:03:23,120 best for anyone is that as long as you are continuing to learn, 55 00:03:23,120 --> 00:03:26,600 challenge yourself, you're excited about what you're doing, 56 00:03:26,600 --> 00:03:28,790 you don't necessarily need to concern yourself 57 00:03:28,790 --> 00:03:31,820 too much with what that makes you whether that 58 00:03:31,820 --> 00:03:36,140 makes you a software engineer or a scientist or an artist 59 00:03:36,140 --> 00:03:37,580 or what have you. 60 00:03:37,580 --> 00:03:41,300 It's really just about a skill set that you're gathering. 61 00:03:41,300 --> 00:03:44,090 And that's really what the start of it was with CS50. 62 00:03:44,090 --> 00:03:51,200 And I don't exaggerate when I say that that was what gave me the confidence 63 00:03:51,200 --> 00:03:53,240 to learn all of these things. 64 00:03:53,240 --> 00:03:59,657 So CS50 is in large part about learning the fundamentals, the programming 65 00:03:59,657 --> 00:04:01,490 languages that you might begin with and some 66 00:04:01,490 --> 00:04:04,700 of the very foundational concepts of it. 67 00:04:04,700 --> 00:04:08,120 But it empowers you to learn anything you want after that. 68 00:04:08,120 --> 00:04:12,380 You can always trace back and say, well, I remember this concept 69 00:04:12,380 --> 00:04:15,710 from CS50 or yeah, I got stuck at this problem set, 70 00:04:15,710 --> 00:04:17,149 but then I worked my way through. 71 00:04:17,149 --> 00:04:19,550 And that's been invaluable. 72 00:04:19,550 --> 00:04:25,130 And to be honest, I think it continues to serve me even today. 73 00:04:25,130 --> 00:04:28,940 There's just, there isn't anything that I try to limit myself 74 00:04:28,940 --> 00:04:31,700 and say, oh, I can't learn that or oh, that's not 75 00:04:31,700 --> 00:04:33,980 what I'm supposed to be learning now. 76 00:04:33,980 --> 00:04:37,730 It's really just something that I'm always challenging myself with. 77 00:04:37,730 --> 00:04:40,740 And CS50 was a huge part of that to begin with. 78 00:04:40,740 --> 00:04:44,450 So the course itself has grown tremendously 79 00:04:44,450 --> 00:04:48,050 since then, which is also really exciting. 80 00:04:48,050 --> 00:04:52,970 It was still really cool then, but it's gotten a lot bigger. 81 00:04:52,970 --> 00:04:54,410 Oops, OK. 82 00:04:54,410 --> 00:04:57,330 So today, I'm here to talk to you about what I currently do, 83 00:04:57,330 --> 00:05:00,890 which is AI for medical imaging artificial intelligence 84 00:05:00,890 --> 00:05:05,480 for medical imaging and a tool that we just released along with some research 85 00:05:05,480 --> 00:05:11,060 that we did behind it called CXR Foundation, CXR for chest x-ray. 86 00:05:11,060 --> 00:05:15,510 And I'm a member of the Google Health AI team. 87 00:05:15,510 --> 00:05:19,550 So I do promise I used to look like that. 88 00:05:19,550 --> 00:05:22,320 David can attest that I did look like that at some point. 89 00:05:22,320 --> 00:05:26,930 But I guess I, the pandemic hit pretty hard. 90 00:05:26,930 --> 00:05:30,560 So I'm a software engineer. 91 00:05:30,560 --> 00:05:32,480 A little bit more about my journey, too was 92 00:05:32,480 --> 00:05:36,560 that after college when I decided I didn't want to go to medical school, 93 00:05:36,560 --> 00:05:39,360 I wasn't really sure what I wanted to do. 94 00:05:39,360 --> 00:05:41,660 I did end up teaching CS50 a bit. 95 00:05:41,660 --> 00:05:43,730 I taught a few of the other courses that David 96 00:05:43,730 --> 00:05:47,150 offers, really enjoyed it really, used it as an opportunity 97 00:05:47,150 --> 00:05:49,880 to keep learning myself. 98 00:05:49,880 --> 00:05:55,080 And then, I applied for a job at Google as a technical analyst, 99 00:05:55,080 --> 00:06:00,410 which is some kind of data analyst for the ad traffic quality team. 100 00:06:00,410 --> 00:06:06,230 And we were looking for invalid add activity on the AdSense and AdWords 101 00:06:06,230 --> 00:06:07,310 networks. 102 00:06:07,310 --> 00:06:12,110 And that was also a great opportunity because I learned a lot of skills, 103 00:06:12,110 --> 00:06:15,000 and there was a big push to scale yourself. 104 00:06:15,000 --> 00:06:18,503 So if you wanted to have a bigger impact, 105 00:06:18,503 --> 00:06:21,170 but then you could write some scripts and some tools of your own 106 00:06:21,170 --> 00:06:27,300 that would make you scale better than that was a huge benefit. 107 00:06:27,300 --> 00:06:30,500 So that's where I also started learning more of my coding skills. 108 00:06:30,500 --> 00:06:35,030 And that was the time when I decided, hey, I really like this. 109 00:06:35,030 --> 00:06:40,610 And I want to do this more frequently, or I want this to be my core job. 110 00:06:40,610 --> 00:06:44,420 So I transferred into software engineering in 2014. 111 00:06:44,420 --> 00:06:50,330 I worked on the Google Fit app for a couple of years, if you've used that. 112 00:06:50,330 --> 00:06:56,030 Then I, in 2017 I transferred to the Google Surveys team. 113 00:06:56,030 --> 00:06:58,890 In both of those teams, I worked a lot on infrastructure, 114 00:06:58,890 --> 00:07:02,930 so large scale distributed systems, getting data 115 00:07:02,930 --> 00:07:06,020 to flow from here to there, kind of like plumbing. 116 00:07:06,020 --> 00:07:10,460 And it was a fantastic experience, obviously 117 00:07:10,460 --> 00:07:14,520 at Google because of the scale of the data that we were working with. 118 00:07:14,520 --> 00:07:19,370 And it also just hardened my skills in terms of writing good code, 119 00:07:19,370 --> 00:07:21,990 writing good tests, so on. 120 00:07:21,990 --> 00:07:25,550 And then in 2019, I decided that I wanted 121 00:07:25,550 --> 00:07:29,570 to pivot back a little bit toward health and something 122 00:07:29,570 --> 00:07:32,270 that I knew was intrinsically meaningful to me. 123 00:07:32,270 --> 00:07:38,540 So I reached out to the team working on some of the health 124 00:07:38,540 --> 00:07:40,370 AI ventures at the time. 125 00:07:40,370 --> 00:07:43,520 I joined in an infrastructure role, and I was working 126 00:07:43,520 --> 00:07:45,110 on some of the infrastructure there. 127 00:07:45,110 --> 00:07:48,630 I had no modeling experience, but I wanted to learn. 128 00:07:48,630 --> 00:07:51,980 So again, I just kind of picked it up, picked up the mantle 129 00:07:51,980 --> 00:07:54,080 and just said, OK, I'm going to learn this. 130 00:07:54,080 --> 00:07:59,930 I took Andrew Ng's course on machine learning for Coursera 131 00:07:59,930 --> 00:08:04,700 and then just dove in and just really started modeling extensively. 132 00:08:04,700 --> 00:08:06,350 And it's been three years. 133 00:08:06,350 --> 00:08:08,180 I would definitely not say I'm an expert, 134 00:08:08,180 --> 00:08:10,910 but I'm really excited with what I've learned. 135 00:08:10,910 --> 00:08:15,380 And the cool thing about the AI field in general 136 00:08:15,380 --> 00:08:20,830 is that it's advancing so rapidly that pretty much no one can keep pace 137 00:08:20,830 --> 00:08:21,330 with it. 138 00:08:21,330 --> 00:08:24,410 So if you are trying to stay on the cutting edge of it, 139 00:08:24,410 --> 00:08:27,500 you may be ahead of some other people who have studied a lot longer 140 00:08:27,500 --> 00:08:31,410 but haven't been keeping up with the state of the art. 141 00:08:31,410 --> 00:08:33,770 So that's where we find ourselves now. 142 00:08:33,770 --> 00:08:40,640 And I would also be remiss if I didn't show my great team here 143 00:08:40,640 --> 00:08:42,080 at least in one slide. 144 00:08:42,080 --> 00:08:46,190 This-- most of these are my co-authors on the paper. 145 00:08:46,190 --> 00:08:52,400 They run the gamut from pure software engineers who 146 00:08:52,400 --> 00:08:58,190 work on machine learning theory to product managers to we 147 00:08:58,190 --> 00:08:59,510 have radiologists in there. 148 00:08:59,510 --> 00:09:04,400 We have clinicians clinicians, scientists, research scientists. 149 00:09:04,400 --> 00:09:06,733 And this isn't even all of our co-authors. 150 00:09:06,733 --> 00:09:08,150 These are just the ones at Google. 151 00:09:08,150 --> 00:09:12,120 We also collaborated with a few different hospital systems. 152 00:09:12,120 --> 00:09:14,120 So it's really a huge team effort. 153 00:09:14,120 --> 00:09:16,580 And that also hopefully gives you inspiration 154 00:09:16,580 --> 00:09:19,160 that you don't have to be a pure software engineer 155 00:09:19,160 --> 00:09:22,040 to contribute to things like this. 156 00:09:22,040 --> 00:09:26,570 And I'll get into how valuable they were in each of their different roles 157 00:09:26,570 --> 00:09:31,880 and why it's important to come together in this interdisciplinary way. 158 00:09:31,880 --> 00:09:38,780 Because you don't get great results without that kind of teamwork. 159 00:09:38,780 --> 00:09:46,400 So you may know some of our work on Google in terms of AI. 160 00:09:46,400 --> 00:09:51,320 And it was not too many years ago that Sundar put out 161 00:09:51,320 --> 00:09:58,890 a missive basically saying that we would focus on AI first and a lot of things. 162 00:09:58,890 --> 00:10:04,160 So Google has tremendous research experience in this area. 163 00:10:04,160 --> 00:10:08,360 And we can use this AI to improve experience in core Google products 164 00:10:08,360 --> 00:10:10,940 like Google Photos, Google Translate. 165 00:10:10,940 --> 00:10:15,710 So you may have used photos where you can search for personal photos 166 00:10:15,710 --> 00:10:18,140 with tags, or you can even search for objects. 167 00:10:18,140 --> 00:10:23,540 You can search for bike or banana, and it will find it. 168 00:10:23,540 --> 00:10:26,480 And then the Google Translate functionality 169 00:10:26,480 --> 00:10:30,290 is something I've used myself where you can read a menu 170 00:10:30,290 --> 00:10:32,540 in a different language in real time. 171 00:10:32,540 --> 00:10:35,180 It's fantastic. 172 00:10:35,180 --> 00:10:40,190 And we're also using AI to tackle hard problems in the physical world. 173 00:10:40,190 --> 00:10:46,070 So Waymo is taking on computer vision in the realm of self-driving. 174 00:10:46,070 --> 00:10:50,150 And we'll be talking a little bit about computer vision and AI, 175 00:10:50,150 --> 00:10:55,260 and it's not too dissimilar from what they're doing there as well. 176 00:10:55,260 --> 00:10:59,300 And with Google Health and AI, our mission 177 00:10:59,300 --> 00:11:05,160 is really to show what's possible and bring these benefits of AI to everyone. 178 00:11:05,160 --> 00:11:08,610 So we do conduct research that advances the state of the art in the field. 179 00:11:08,610 --> 00:11:13,790 And we're applying AI to new products, new domains. 180 00:11:13,790 --> 00:11:19,970 But one of our larger goals is to make sure that everyone can access that AI. 181 00:11:19,970 --> 00:11:22,160 And that's what I want to talk to you about today 182 00:11:22,160 --> 00:11:26,600 and why I tried to reiterate that no experience is necessary. 183 00:11:26,600 --> 00:11:32,420 I think if some of the topics we talk about today may catch you off guard, 184 00:11:32,420 --> 00:11:35,600 then you can always go back, study a little bit, come back to this. 185 00:11:35,600 --> 00:11:40,490 But the goal is to bring that to everyone both in terms of resources 186 00:11:40,490 --> 00:11:43,050 and in terms of knowledge. 187 00:11:43,050 --> 00:11:46,370 So a few of the examples here, you see there's 188 00:11:46,370 --> 00:11:50,690 some work we're doing on device with Coral. 189 00:11:50,690 --> 00:11:54,350 There's the brain mapping. 190 00:11:54,350 --> 00:11:59,940 There's colonoscopies and prediction of sleep patterns. 191 00:11:59,940 --> 00:12:04,850 So all of these things are in our wheelhouse. 192 00:12:04,850 --> 00:12:10,730 OK, so before we get started, I know that a lot of you 193 00:12:10,730 --> 00:12:14,300 also don't have experience with artificial intelligence 194 00:12:14,300 --> 00:12:16,950 or with ML, machine learning in general. 195 00:12:16,950 --> 00:12:20,360 So I wanted to give a very quick intro to neural networks, which 196 00:12:20,360 --> 00:12:23,280 is the type of machine learning and artificial intelligence 197 00:12:23,280 --> 00:12:26,010 we'll be talking about today. 198 00:12:26,010 --> 00:12:30,110 However, I want to give a big shout out to Brian Yu 199 00:12:30,110 --> 00:12:32,000 and to point, you all to that course. 200 00:12:32,000 --> 00:12:38,465 Because I went through a little bit of the content so far, and it's fantastic. 201 00:12:38,465 --> 00:12:40,340 I'm going to be borrowing a few of his slides 202 00:12:40,340 --> 00:12:43,290 from his lecture on neural networks. 203 00:12:43,290 --> 00:12:48,740 But it's a great way to bootstrap yourself in terms of learning what ML 204 00:12:48,740 --> 00:12:50,330 and what AI can do. 205 00:12:50,330 --> 00:12:54,650 So but let's just talk about neural networks for a minute. 206 00:12:54,650 --> 00:12:57,960 Here, we have an illustration of a neuron. 207 00:12:57,960 --> 00:13:02,090 So these are the fundamental building block of your brain. 208 00:13:02,090 --> 00:13:06,290 And these neurons are connected to each other, 209 00:13:06,290 --> 00:13:09,960 and they receive electrical signals from other neurons. 210 00:13:09,960 --> 00:13:12,650 So there's two important parts there, basically. 211 00:13:12,650 --> 00:13:15,810 One is that they communicate with each other. 212 00:13:15,810 --> 00:13:18,470 So there's billions and billions in your brain. 213 00:13:18,470 --> 00:13:23,310 They process input signals and then they have this notion of being activated. 214 00:13:23,310 --> 00:13:27,830 So that in the brain, that corresponds to an electrical signal. 215 00:13:27,830 --> 00:13:31,790 But we will talk about it in the context of an artificial neural network 216 00:13:31,790 --> 00:13:32,430 in a second. 217 00:13:32,430 --> 00:13:37,470 So here you have a two neurons that are communicating via the synapse. 218 00:13:37,470 --> 00:13:40,580 And that's some kind of electrical signal that's going between them. 219 00:13:40,580 --> 00:13:44,090 And together, the two of them plus billions of others 220 00:13:44,090 --> 00:13:48,180 may combine to do much more complex calculations. 221 00:13:48,180 --> 00:13:52,890 So that's an oversimplification, but it's a useful one. 222 00:13:52,890 --> 00:13:55,340 So when we talk about artificial neural networks, 223 00:13:55,340 --> 00:13:57,920 we're talking about a mathematical model for learning 224 00:13:57,920 --> 00:14:01,940 that's inspired by these biological neural networks. 225 00:14:01,940 --> 00:14:06,620 So you can think of it as just one large math equation. 226 00:14:06,620 --> 00:14:10,770 It takes inputs on one end and gives outputs on the other. 227 00:14:10,770 --> 00:14:14,730 And this is based on the structure and the parameters of the network. 228 00:14:14,730 --> 00:14:19,940 So these neurons, let me go forward for a second. 229 00:14:19,940 --> 00:14:24,080 This neuron, this biological neuron you can report it placed with just this. 230 00:14:24,080 --> 00:14:27,380 We'll have a circle here, a single building 231 00:14:27,380 --> 00:14:29,640 block for this large neural network. 232 00:14:29,640 --> 00:14:31,970 And if we want to do more complex things, 233 00:14:31,970 --> 00:14:36,590 we can connect them together and connect many, many, many, many, many together, 234 00:14:36,590 --> 00:14:37,760 right? 235 00:14:37,760 --> 00:14:41,130 So the-- let's go back for a second to here. 236 00:14:41,130 --> 00:14:46,380 So the other aspect of this neural network that is important to note 237 00:14:46,380 --> 00:14:50,660 and will be good for our sake is that they can learn. 238 00:14:50,660 --> 00:14:54,270 So we start them with basically completely random, 239 00:14:54,270 --> 00:14:56,690 they have no knowledge. 240 00:14:56,690 --> 00:15:00,240 But through a process, which I think I can point, 241 00:15:00,240 --> 00:15:04,730 you again to Brian Yu's course through a process called back propagation. 242 00:15:04,730 --> 00:15:06,380 We give it the outputs. 243 00:15:06,380 --> 00:15:12,710 And then we let it look back on all of its parameters and say, OK, how can 244 00:15:12,710 --> 00:15:17,030 I tweak these so that I predict those outputs a little bit better? 245 00:15:17,030 --> 00:15:23,630 And that is the magic that makes these neural networks work. 246 00:15:23,630 --> 00:15:29,030 These parameters are basically, you hear them called weights. 247 00:15:29,030 --> 00:15:34,500 But they essentially are coefficients in a very large math equation. 248 00:15:34,500 --> 00:15:37,820 So if you were to write that out end to end, 249 00:15:37,820 --> 00:15:40,100 you would be able to put in your inputs. 250 00:15:40,100 --> 00:15:45,020 And then you would get your outputs just like you do in a equation like y 251 00:15:45,020 --> 00:15:50,075 equals mx plus b, except it's many, many millions or billions of these neurons. 252 00:15:50,075 --> 00:15:53,270 253 00:15:53,270 --> 00:15:55,320 So what do we use this for? 254 00:15:55,320 --> 00:16:02,460 So Brian also put together these nice slides about a classification problem. 255 00:16:02,460 --> 00:16:06,530 So a lot of the problems that we face in machine learning 256 00:16:06,530 --> 00:16:10,190 and in health for machine learning are classification problems. 257 00:16:10,190 --> 00:16:16,190 That just means that we're trying to say something is of one class or another, 258 00:16:16,190 --> 00:16:18,440 in this case, red or blue. 259 00:16:18,440 --> 00:16:25,410 In health, it might be diseases present or disease is not present. 260 00:16:25,410 --> 00:16:28,220 So with this distribution of data that you see here, 261 00:16:28,220 --> 00:16:33,830 it's fairly obvious that we could design a classifier just by drawing 262 00:16:33,830 --> 00:16:36,290 a line between the red and the blue. 263 00:16:36,290 --> 00:16:41,340 So in our neural network, then it would be very simple. 264 00:16:41,340 --> 00:16:42,740 It would just be a line. 265 00:16:42,740 --> 00:16:46,650 And it's a nice visual representation of how we can separate that data. 266 00:16:46,650 --> 00:16:49,280 And so then you say anything below the line is blue. 267 00:16:49,280 --> 00:16:54,180 Anything above the line is red, and we go from there. 268 00:16:54,180 --> 00:16:57,960 What about a more complex set of data like this? 269 00:16:57,960 --> 00:17:00,830 So obviously, we as humans can very quickly 270 00:17:00,830 --> 00:17:02,645 draw a circle around the red in there. 271 00:17:02,645 --> 00:17:04,520 That's actually a little bit more complicated 272 00:17:04,520 --> 00:17:09,109 for writing an equation, a math equation for this. 273 00:17:09,109 --> 00:17:11,630 One thing that I'll mention here that you 274 00:17:11,630 --> 00:17:16,880 may hear in other lectures on this topic is that these neural networks 275 00:17:16,880 --> 00:17:17,869 are non-linear. 276 00:17:17,869 --> 00:17:22,040 So that's a huge advantage to them to be able to learn not just 277 00:17:22,040 --> 00:17:25,550 linear relationships but have these so-called activation 278 00:17:25,550 --> 00:17:27,140 functions, which are linear. 279 00:17:27,140 --> 00:17:29,010 And they allow it to learn things like this, 280 00:17:29,010 --> 00:17:33,000 which are a little bit more complicated in their nature. 281 00:17:33,000 --> 00:17:38,300 So we'll put a pin in this for now because I will actually show you 282 00:17:38,300 --> 00:17:40,790 much more complex data that can be even reduced 283 00:17:40,790 --> 00:17:43,190 to something that looks like this. 284 00:17:43,190 --> 00:17:48,080 But very, very briefly and very broadly speaking, 285 00:17:48,080 --> 00:17:50,510 our problem is no different than this. 286 00:17:50,510 --> 00:17:55,220 We're trying to separate the red dots out from the blue dots. 287 00:17:55,220 --> 00:17:58,790 You may have a more complex, maybe there's more than two colors. 288 00:17:58,790 --> 00:18:00,530 There's green there's yellow, et cetera. 289 00:18:00,530 --> 00:18:03,515 But it's fundamentally the same problem. 290 00:18:03,515 --> 00:18:07,550 291 00:18:07,550 --> 00:18:10,760 I guess actually this would be a good time to pause for questions 292 00:18:10,760 --> 00:18:11,825 if there are any, or. 293 00:18:11,825 --> 00:18:17,570 294 00:18:17,570 --> 00:18:20,390 I should just mention that we have plenty of time, 295 00:18:20,390 --> 00:18:24,800 and I will be going fairly quickly. 296 00:18:24,800 --> 00:18:29,150 This will be a big survey through not just ML and AI 297 00:18:29,150 --> 00:18:32,220 but medical imaging as well. 298 00:18:32,220 --> 00:18:34,790 So please don't hesitate to stop me. 299 00:18:34,790 --> 00:18:41,600 And mostly, I want to show you what's capable, what we can do with this 300 00:18:41,600 --> 00:18:49,310 and to get you excited about the field 301 00:18:49,310 --> 00:18:53,780 So let's now talk about AI development for medical imaging. 302 00:18:53,780 --> 00:18:59,810 And as I mentioned earlier, the big focus of Google and Google Health AI 303 00:18:59,810 --> 00:19:02,580 is to reduce barriers to AI development. 304 00:19:02,580 --> 00:19:06,260 So for us to have impact, we want more people to be involved. 305 00:19:06,260 --> 00:19:08,480 We see ourselves as an enabler. 306 00:19:08,480 --> 00:19:14,330 That means both as I said, in terms of resources as well as knowledge 307 00:19:14,330 --> 00:19:20,120 and skills, so we're very well aware that Google has resources for modeling 308 00:19:20,120 --> 00:19:22,430 that most other places don't. 309 00:19:22,430 --> 00:19:28,910 That comes in the form of accelerators for the actual modeling as well 310 00:19:28,910 --> 00:19:30,020 as data. 311 00:19:30,020 --> 00:19:32,852 And we want to share that with the world. 312 00:19:32,852 --> 00:19:35,060 We want to share the benefits of that with the world, 313 00:19:35,060 --> 00:19:38,940 and similarly the knowledge and the skill set. 314 00:19:38,940 --> 00:19:44,460 So we're very lucky to have the best in the world that we can work with. 315 00:19:44,460 --> 00:19:47,240 But we want to make this a little bit more 316 00:19:47,240 --> 00:19:50,520 accessible to those who are just getting started. 317 00:19:50,520 --> 00:19:54,980 So today, we're going to talk the use case or the case study 318 00:19:54,980 --> 00:19:57,140 that I'm going to look at first is chest x-rays. 319 00:19:57,140 --> 00:20:00,360 And that's my main focus. 320 00:20:00,360 --> 00:20:06,540 But it's really just an example of what could be possible. 321 00:20:06,540 --> 00:20:13,430 And this could apply to many different modalities like CT imaging, MRI, 322 00:20:13,430 --> 00:20:19,550 ultrasound, pathology slides, even just natural images 323 00:20:19,550 --> 00:20:23,880 from a camera for things like your skin conditions. 324 00:20:23,880 --> 00:20:28,010 So some of the principles or most of the principles 325 00:20:28,010 --> 00:20:32,630 are pretty applicable through everything. 326 00:20:32,630 --> 00:20:35,990 So let's see. 327 00:20:35,990 --> 00:20:39,060 328 00:20:39,060 --> 00:20:40,730 Oh, I do see some questions. 329 00:20:40,730 --> 00:20:43,737 330 00:20:43,737 --> 00:20:46,570 DAVID MALAN: Andrew, I've been jotting down some questions as we go. 331 00:20:46,570 --> 00:20:48,380 I can leave them in when it finally. 332 00:20:48,380 --> 00:20:51,130 ANDREW SELLERGREN: Yeah, I think it's a good time to pause anyway. 333 00:20:51,130 --> 00:20:54,250 Does this-- Fred says does this mean that in the future, 334 00:20:54,250 --> 00:20:56,410 radiologists will not be human? 335 00:20:56,410 --> 00:20:58,150 I don't know if I would go that far. 336 00:20:58,150 --> 00:21:02,290 I think, and this will be clear from some of how 337 00:21:02,290 --> 00:21:04,310 I describe our collaborations later. 338 00:21:04,310 --> 00:21:09,610 But I think the future is really a combination of the two. 339 00:21:09,610 --> 00:21:15,400 We get our best results when we work collaboratively with radiologists. 340 00:21:15,400 --> 00:21:19,330 They lend their knowledge to us, and then we lend extra tools to them. 341 00:21:19,330 --> 00:21:22,510 And we really want to understand the problem from their perspective. 342 00:21:22,510 --> 00:21:29,380 And from there, we may design a tool that works well with their workflow. 343 00:21:29,380 --> 00:21:32,920 But we are not intending to replace it. 344 00:21:32,920 --> 00:21:37,660 We don't think that that's the best path forward at the moment. 345 00:21:37,660 --> 00:21:40,480 But we do want to help save them time and help bridge 346 00:21:40,480 --> 00:21:45,080 that gap for a short supply. 347 00:21:45,080 --> 00:21:47,245 So let's see. 348 00:21:47,245 --> 00:21:50,150 349 00:21:50,150 --> 00:21:53,720 Yeah, I think we'll have some as far as resources, 350 00:21:53,720 --> 00:21:57,080 we'll definitely have some links that I can provide you 351 00:21:57,080 --> 00:22:03,440 and we can send out that would get you started in this field of machine 352 00:22:03,440 --> 00:22:06,740 learning and in health AI. 353 00:22:06,740 --> 00:22:10,490 With how much precision can medical AI identify disease, e.g. 354 00:22:10,490 --> 00:22:11,695 Cancer cells? 355 00:22:11,695 --> 00:22:12,570 It's a good question. 356 00:22:12,570 --> 00:22:17,820 I think it's getting more and more precise every day. 357 00:22:17,820 --> 00:22:23,600 So and it's maybe scaling even exponentially. 358 00:22:23,600 --> 00:22:28,160 So there-- we have a team for example that 359 00:22:28,160 --> 00:22:33,080 works our genomics or genetics team that works 360 00:22:33,080 --> 00:22:36,230 on deep variant and deep consensus. 361 00:22:36,230 --> 00:22:41,570 And those are things that enable you to do virtually real time sequencing 362 00:22:41,570 --> 00:22:46,910 of either disease or the person. 363 00:22:46,910 --> 00:22:50,600 And that helps with a precision treatment 364 00:22:50,600 --> 00:22:53,420 or the availability of a treatment. 365 00:22:53,420 --> 00:23:01,070 The precision medicine on the scale of say, pathology slides or radiology 366 00:23:01,070 --> 00:23:05,683 images, those are getting more and more real time. 367 00:23:05,683 --> 00:23:07,100 Obviously, they're getting larger. 368 00:23:07,100 --> 00:23:10,460 But then they're still able to run particularly fast. 369 00:23:10,460 --> 00:23:16,235 So it's an exciting time to see how far we can push the field. 370 00:23:16,235 --> 00:23:19,210 371 00:23:19,210 --> 00:23:21,370 Instead of predictions, what type of AI would 372 00:23:21,370 --> 00:23:23,490 run for decision making in health? 373 00:23:23,490 --> 00:23:24,740 That's a really good question. 374 00:23:24,740 --> 00:23:25,282 I don't know. 375 00:23:25,282 --> 00:23:32,260 My field is not necessarily in the NLP or the natural language 376 00:23:32,260 --> 00:23:36,880 processing or conversational AI, but obviously that field 377 00:23:36,880 --> 00:23:39,320 is advancing rapidly as well. 378 00:23:39,320 --> 00:23:46,480 And I think the ability to add reasoning to some of these AI tools 379 00:23:46,480 --> 00:23:48,760 is super compelling. 380 00:23:48,760 --> 00:23:51,020 And I think it's going to get better. 381 00:23:51,020 --> 00:23:56,320 I think there are some data sets like medical question answering data sets 382 00:23:56,320 --> 00:24:00,410 where you test its ability to reason. 383 00:24:00,410 --> 00:24:05,830 And again, I think that's still where humans shine in large part, 384 00:24:05,830 --> 00:24:07,930 but it's getting better every day. 385 00:24:07,930 --> 00:24:10,480 386 00:24:10,480 --> 00:24:11,095 Let's see. 387 00:24:11,095 --> 00:24:15,130 388 00:24:15,130 --> 00:24:20,950 Are there models not based in biological brains? 389 00:24:20,950 --> 00:24:21,520 Yes. 390 00:24:21,520 --> 00:24:26,350 I mean, I think I would actually argue that if you want to dig deep into it 391 00:24:26,350 --> 00:24:31,570 and go down a bit of a rabbit hole, the models don't really 392 00:24:31,570 --> 00:24:35,320 perfectly mimic biological brains. 393 00:24:35,320 --> 00:24:38,740 I think we learn more about how they complement each other 394 00:24:38,740 --> 00:24:41,620 or how they mimic each other every day. 395 00:24:41,620 --> 00:24:44,470 But it's a bit of an oversimplification. 396 00:24:44,470 --> 00:24:49,000 I would also argue that your brain is far, far, far, far more 397 00:24:49,000 --> 00:24:52,810 powerful than the neural networks that we're training. 398 00:24:52,810 --> 00:24:55,900 So that's a pretty cool thing to learn when you maybe 399 00:24:55,900 --> 00:24:59,800 come into this field thinking the opposite that oh, machines 400 00:24:59,800 --> 00:25:01,840 are much better. 401 00:25:01,840 --> 00:25:07,240 They may be better at certain tasks, but we are much better generalists. 402 00:25:07,240 --> 00:25:09,910 And there's no better example of that than my daughter 403 00:25:09,910 --> 00:25:13,360 who watching her learn is amazing. 404 00:25:13,360 --> 00:25:17,240 Because she may not have even seen something before. 405 00:25:17,240 --> 00:25:22,690 But she already has a knowledge of how to interact with it. 406 00:25:22,690 --> 00:25:27,040 And that's a pretty cool thing. 407 00:25:27,040 --> 00:25:31,300 Did Google perform research on EEG, EKG, EMG? 408 00:25:31,300 --> 00:25:34,000 I think we do have some research there. 409 00:25:34,000 --> 00:25:36,490 I'm not sure of the current published status of it. 410 00:25:36,490 --> 00:25:40,630 But yeah. 411 00:25:40,630 --> 00:25:44,727 So let's see. 412 00:25:44,727 --> 00:25:47,310 DAVID MALAN: Andrew, I can take notes on some of the questions 413 00:25:47,310 --> 00:25:49,860 and weave them back in if some of these you'll touch on anyway. 414 00:25:49,860 --> 00:25:51,390 ANDREW SELLERGREN: OK, yeah that's-- 415 00:25:51,390 --> 00:25:54,420 wow there's, a lot of-- 416 00:25:54,420 --> 00:25:55,990 yeah, we will do our best. 417 00:25:55,990 --> 00:25:58,140 Let's I guess we'll move forward and maybe you 418 00:25:58,140 --> 00:26:00,325 can weave some more in as we go. 419 00:26:00,325 --> 00:26:01,200 DAVID MALAN: Perfect. 420 00:26:01,200 --> 00:26:07,290 ANDREW SELLERGREN: OK, so going back to AI for medical imaging 421 00:26:07,290 --> 00:26:12,990 and as I mentioned, we're going to use chest x-ray here as a case study. 422 00:26:12,990 --> 00:26:15,810 But a lot of it is going to apply to other modalities, 423 00:26:15,810 --> 00:26:17,680 all kinds of medical imaging. 424 00:26:17,680 --> 00:26:19,740 So why chest x-ray? 425 00:26:19,740 --> 00:26:23,370 Well, chest x-rays are ubiquitous. 426 00:26:23,370 --> 00:26:26,730 They're highly accessible, highly available there's about a billion 427 00:26:26,730 --> 00:26:30,480 taken around the world every year. 428 00:26:30,480 --> 00:26:32,940 And they're very inexpensive. 429 00:26:32,940 --> 00:26:36,900 They're also non-invasive meaning they're not requiring 430 00:26:36,900 --> 00:26:38,640 surgery or something like that. 431 00:26:38,640 --> 00:26:44,310 They have a fairly low radiation dose I believe as well. 432 00:26:44,310 --> 00:26:48,520 So it's a great modality for that. 433 00:26:48,520 --> 00:26:54,310 It's also used for a very long tail of rare conditions. 434 00:26:54,310 --> 00:26:59,700 So we'll look today at five or 10 that are fairly common 435 00:26:59,700 --> 00:27:03,870 and that are kind of standard ones that you may see on a lot of chest x-rays. 436 00:27:03,870 --> 00:27:05,280 But there's a huge long tail. 437 00:27:05,280 --> 00:27:10,080 There's a lot of rare conditions which makes chest x-ray very useful, 438 00:27:10,080 --> 00:27:14,190 but it's also very difficult then to interpret them. 439 00:27:14,190 --> 00:27:19,860 And that's where the radiologists shine on the human side of things. 440 00:27:19,860 --> 00:27:28,710 But that high quality interpretation is difficult to do and to train 441 00:27:28,710 --> 00:27:33,490 for, doubly so for training a model to do that. 442 00:27:33,490 --> 00:27:39,450 So that's where this work that we'll talk about will shine. 443 00:27:39,450 --> 00:27:43,020 So there's also a short supply of radiologists. 444 00:27:43,020 --> 00:27:46,890 There are definitely areas of the world where there's far fewer 445 00:27:46,890 --> 00:27:50,250 available than there are images taken. 446 00:27:50,250 --> 00:27:55,050 And there's also variability between the experts and the sites. 447 00:27:55,050 --> 00:27:57,840 So you may have different imaging equipment. 448 00:27:57,840 --> 00:28:04,050 You may have different interpretations of the same image. 449 00:28:04,050 --> 00:28:05,550 So these are all challenges. 450 00:28:05,550 --> 00:28:08,970 And then as I mentioned with Google and where 451 00:28:08,970 --> 00:28:15,660 Google's forte is in building these very, very large models. 452 00:28:15,660 --> 00:28:20,040 So they require very large curated data sets. 453 00:28:20,040 --> 00:28:24,060 So for our purposes for this model we're talking about, 454 00:28:24,060 --> 00:28:26,370 it was about a million chest x-rays, which 455 00:28:26,370 --> 00:28:30,240 even on the scale of machine learning is pretty small. 456 00:28:30,240 --> 00:28:35,580 A lot of the natural language processing ML 457 00:28:35,580 --> 00:28:41,550 that you see today may have been trained on multi-billion or web scale data 458 00:28:41,550 --> 00:28:42,730 sets. 459 00:28:42,730 --> 00:28:45,930 Whereas a million chest x-rays is a very large medical data 460 00:28:45,930 --> 00:28:49,950 set but it's a small data set for natural images. 461 00:28:49,950 --> 00:28:56,380 But for anyone to have a million chest x-rays is pretty difficult. 462 00:28:56,380 --> 00:28:58,960 And it also requires extensive fine tuning. 463 00:28:58,960 --> 00:29:05,190 So some of the very specific models that we developed like our tuberculosis 464 00:29:05,190 --> 00:29:11,190 model, they require six 12, 24 months of software engineering time. 465 00:29:11,190 --> 00:29:15,990 And that's a whole team of people working on it, trying different things, 466 00:29:15,990 --> 00:29:18,210 going back to the drawing board. 467 00:29:18,210 --> 00:29:20,430 We call this fine tuning. 468 00:29:20,430 --> 00:29:23,250 And it's not something that everyone can replicate. 469 00:29:23,250 --> 00:29:28,170 So this is where again, once we've done this in one use case, 470 00:29:28,170 --> 00:29:33,810 how can we help generalize it and let it benefit 471 00:29:33,810 --> 00:29:37,140 everyone and a bunch of other tasks? 472 00:29:37,140 --> 00:29:43,800 So let's-- we can play a little game here if people want to be interactive. 473 00:29:43,800 --> 00:29:46,620 474 00:29:46,620 --> 00:29:54,720 So how challenging is this task, and not just for machines, but for humans? 475 00:29:54,720 --> 00:29:57,900 I wanted to show a few examples of chest x-rays. 476 00:29:57,900 --> 00:30:00,510 You can Google around for things like these. 477 00:30:00,510 --> 00:30:03,060 But here here's one. 478 00:30:03,060 --> 00:30:07,620 Can anyone spot the abnormality here? 479 00:30:07,620 --> 00:30:11,010 I see one person saying pneumonia. 480 00:30:11,010 --> 00:30:13,605 Anyone else have any guesses? 481 00:30:13,605 --> 00:30:20,150 482 00:30:20,150 --> 00:30:24,350 Right chest smaller, left upper lobe, actually, 483 00:30:24,350 --> 00:30:30,330 yeah, I would say left upper lobe is pretty close, so I'll give that credit. 484 00:30:30,330 --> 00:30:36,320 So there's a nodule here, which you can see is very, very difficult to see. 485 00:30:36,320 --> 00:30:41,060 A nodule is a small mass of tissue in the lung. 486 00:30:41,060 --> 00:30:44,270 In the case in this case, it's less than three centimeters. 487 00:30:44,270 --> 00:30:46,680 And it can be benign or malignant. 488 00:30:46,680 --> 00:30:52,400 So it may or may not be even noted on a radiology report. 489 00:30:52,400 --> 00:30:55,520 It could be something that they decide to biopsy and figure out 490 00:30:55,520 --> 00:30:58,340 if it's benign or malignant. 491 00:30:58,340 --> 00:31:03,500 But as you can see, it's not super easy to catch. 492 00:31:03,500 --> 00:31:07,730 And so training an AI classifier to do it is not trivial. 493 00:31:07,730 --> 00:31:09,890 OK, how about this one? 494 00:31:09,890 --> 00:31:17,010 Anyone can spot the abnormality? 495 00:31:17,010 --> 00:31:18,330 Oh, wow. 496 00:31:18,330 --> 00:31:21,210 Salt has got it, pneumothorax. 497 00:31:21,210 --> 00:31:29,160 Yes, so on the right here you have what looks in some ways like more normal 498 00:31:29,160 --> 00:31:33,310 it's a Black space on the chest x-ray. 499 00:31:33,310 --> 00:31:36,000 But this is actually a collapsed lung. 500 00:31:36,000 --> 00:31:40,620 So air has leaked into the space between the lung and the chest wall. 501 00:31:40,620 --> 00:31:45,030 And this would require intubation. 502 00:31:45,030 --> 00:31:50,460 And it's a very serious issue. 503 00:31:50,460 --> 00:31:54,390 I'll pause here just to note one thing about these chest x-rays. 504 00:31:54,390 --> 00:31:57,330 So when we're talking here, we're talking about frontal chest x-rays. 505 00:31:57,330 --> 00:31:59,370 There's two different ways to take those. 506 00:31:59,370 --> 00:32:06,540 There's AP, anteroposterior and PA, posteroanterior. 507 00:32:06,540 --> 00:32:10,170 So this is whether the x-ray machine. 508 00:32:10,170 --> 00:32:15,690 Comes from above or below and there's some interesting implications 509 00:32:15,690 --> 00:32:21,900 for that when patients are sicker, they tend 510 00:32:21,900 --> 00:32:24,510 to have one type because they're laying down, 511 00:32:24,510 --> 00:32:27,570 and they may not be able to have an x-ray standing up. 512 00:32:27,570 --> 00:32:35,580 So they-- this is actually something that confounds our ability to classify. 513 00:32:35,580 --> 00:32:39,600 It may just learn the type, the orientation of the image as opposed 514 00:32:39,600 --> 00:32:42,780 to the actual diagnosis. 515 00:32:42,780 --> 00:32:48,160 Similarly here with the pneumothorax, this one is untreated so far. 516 00:32:48,160 --> 00:32:54,300 So there's no intubation whereas pneumothorax would normally require 517 00:32:54,300 --> 00:32:58,150 intubation and immediate treatment. 518 00:32:58,150 --> 00:33:01,890 So then when you take the chest x-ray, anyone who has a pneumothorax 519 00:33:01,890 --> 00:33:05,460 is going to already have some lines and tubes, which 520 00:33:05,460 --> 00:33:10,320 make it very easy for the AI to learn well as long as there's lines and tubes 521 00:33:10,320 --> 00:33:11,890 then I know it's a pneumothorax. 522 00:33:11,890 --> 00:33:16,720 So that's a big problem. 523 00:33:16,720 --> 00:33:20,835 I think I saw a hand raised, but. 524 00:33:20,835 --> 00:33:22,770 DAVID MALAN: There is a question in the chat. 525 00:33:22,770 --> 00:33:26,460 Amelia is asking Andrew is this the same person, and what about their age group? 526 00:33:26,460 --> 00:33:30,390 ANDREW SELLERGREN: No, this is not the same person as the nodule one, 527 00:33:30,390 --> 00:33:31,050 no it's not. 528 00:33:31,050 --> 00:33:35,610 You can see there's a pretty big difference in their anatomy. 529 00:33:35,610 --> 00:33:39,610 I have the citation here below. 530 00:33:39,610 --> 00:33:41,620 So you should be able to look that up. 531 00:33:41,620 --> 00:33:45,790 I'm not sure about the age group of them. 532 00:33:45,790 --> 00:33:49,260 OK, so this one I'm not even going to ask 533 00:33:49,260 --> 00:33:51,210 you to try because it's very, very difficult, 534 00:33:51,210 --> 00:33:55,830 but it was my example of a very, very difficult case that 535 00:33:55,830 --> 00:33:57,370 may not be easy to catch. 536 00:33:57,370 --> 00:34:03,240 So down there in the bottom right of our screen but on their left 537 00:34:03,240 --> 00:34:05,170 is a rib fracture. 538 00:34:05,170 --> 00:34:08,250 So this is the type of thing that can very easily be missed, 539 00:34:08,250 --> 00:34:12,850 and it's also very hard for AI to pick up on. 540 00:34:12,850 --> 00:34:16,859 So it exemplifies some of the challenges we face here. 541 00:34:16,859 --> 00:34:20,460 542 00:34:20,460 --> 00:34:22,980 Here's a question I can address really quickly. 543 00:34:22,980 --> 00:34:26,280 Would you recommend any specific deep learning structure to the task? 544 00:34:26,280 --> 00:34:28,980 Is Vision Transformer a good choice? 545 00:34:28,980 --> 00:34:31,710 I'll try not to go too much down the rabbit hole 546 00:34:31,710 --> 00:34:34,090 here because it's a very interesting time. 547 00:34:34,090 --> 00:34:36,480 The type of model that we've trained here 548 00:34:36,480 --> 00:34:39,550 is something called a convolutional neural network. 549 00:34:39,550 --> 00:34:42,760 So again I'll point, you to Brian's amazing course, 550 00:34:42,760 --> 00:34:45,120 which talks about convolution. 551 00:34:45,120 --> 00:34:49,170 Convolutions are you can think of as a filter for the image. 552 00:34:49,170 --> 00:34:55,860 So it will progress through the image and in some ways sort of compress 553 00:34:55,860 --> 00:34:57,180 it to that end. 554 00:34:57,180 --> 00:35:01,710 And this has been the standard since about 2012 for vision 555 00:35:01,710 --> 00:35:03,930 anyway for a number of reasons. 556 00:35:03,930 --> 00:35:08,190 Convolutional neural networks have built in what 557 00:35:08,190 --> 00:35:10,980 they call translation invariance. 558 00:35:10,980 --> 00:35:21,100 So it doesn't matter where in the image that it's happening, it can pick it up. 559 00:35:21,100 --> 00:35:28,440 And that's a huge advantage for something like vision tasks. 560 00:35:28,440 --> 00:35:33,870 That being said, so they've been this the go to for 10 or more years 561 00:35:33,870 --> 00:35:35,070 at this point for. 562 00:35:35,070 --> 00:35:39,960 So only recently in the last two years, vision transformers have emerged. 563 00:35:39,960 --> 00:35:42,660 Transformers is another type of architecture 564 00:35:42,660 --> 00:35:47,520 that has been used heavily in natural language processing. 565 00:35:47,520 --> 00:35:51,120 And this just works in a little bit of a different way. 566 00:35:51,120 --> 00:35:55,500 It doesn't have some of the same what we call inductive biases 567 00:35:55,500 --> 00:35:58,140 as the convolutional neural networks. 568 00:35:58,140 --> 00:36:01,710 But it also has advantages like its ability 569 00:36:01,710 --> 00:36:08,640 to using a method called attention, it can basically look at the whole image 570 00:36:08,640 --> 00:36:12,750 And it correlates every block of the image with every other block 571 00:36:12,750 --> 00:36:16,390 of the image, which is a big advantage. 572 00:36:16,390 --> 00:36:21,690 However, they require a lot more data in the case of vision 573 00:36:21,690 --> 00:36:26,130 in order to kind of bootstrap themselves and get used to the task. 574 00:36:26,130 --> 00:36:31,830 So they've definitely proven very performant 575 00:36:31,830 --> 00:36:37,320 on some of the natural language, I'm sorry, natural image tasks. 576 00:36:37,320 --> 00:36:42,840 They are starting to prove even better for the medical image tasks. 577 00:36:42,840 --> 00:36:46,230 I've tried them on some of the same tasks that I'll show here, 578 00:36:46,230 --> 00:36:48,840 and I got slightly worse results than I did 579 00:36:48,840 --> 00:36:50,850 for my convolutional neural networks. 580 00:36:50,850 --> 00:36:54,190 That doesn't mean they are worse or that they always will be that way. 581 00:36:54,190 --> 00:36:58,650 It could just be a matter of I didn't tune them properly. 582 00:36:58,650 --> 00:37:06,130 But that remains to be seen. 583 00:37:06,130 --> 00:37:08,730 So is there a nodule in the right lobe here? 584 00:37:08,730 --> 00:37:11,070 To be honest, I don't know I don't think so. 585 00:37:11,070 --> 00:37:16,390 If you're talking about this, I think that's maybe an artifact of the image, 586 00:37:16,390 --> 00:37:17,430 but I could be wrong. 587 00:37:17,430 --> 00:37:22,110 I'm also not a radiologist I'm doing my best impression here. 588 00:37:22,110 --> 00:37:30,975 But so this last one, we'll see if we can spot the abnormality here, anyone? 589 00:37:30,975 --> 00:37:39,230 590 00:37:39,230 --> 00:37:47,150 Yeah, I'll go ahead and give that I hope I'm saying Myla or Mila, pneumonia. 591 00:37:47,150 --> 00:37:50,880 Yeah, so this one is COVID-19. 592 00:37:50,880 --> 00:37:55,670 So they refer to it technically as peripheral and lower lobe predominant 593 00:37:55,670 --> 00:37:57,380 rounded airspace opacities. 594 00:37:57,380 --> 00:38:03,560 But for the most part, that will amount to pneumonia 595 00:38:03,560 --> 00:38:07,940 in the bottom parts of these lungs. 596 00:38:07,940 --> 00:38:11,450 So this is an important motivator for some of the tasks 597 00:38:11,450 --> 00:38:15,140 that we're going to look at for our model here. 598 00:38:15,140 --> 00:38:20,690 And it's just a reminder of what chest X-rays can do. 599 00:38:20,690 --> 00:38:23,000 So as I mentioned earlier, our mission is 600 00:38:23,000 --> 00:38:27,380 really to enable others to train better custom chest x-ray models. 601 00:38:27,380 --> 00:38:32,870 We know that other outfits don't have the same data, the same setup, 602 00:38:32,870 --> 00:38:36,630 the same computational power. 603 00:38:36,630 --> 00:38:41,540 So our mission is to advance science and deliver impact. 604 00:38:41,540 --> 00:38:49,760 And we define impact as being the end result, actual clinical effects. 605 00:38:49,760 --> 00:38:53,990 In some-- in most cases, those are going to be taken care of 606 00:38:53,990 --> 00:38:58,490 or affected by others, not by Google directly. 607 00:38:58,490 --> 00:39:03,330 And that's what we're looking to do here with our ability to train these models. 608 00:39:03,330 --> 00:39:07,100 So this is a three-fold problem. 609 00:39:07,100 --> 00:39:10,670 We want to decrease the training time for these models. 610 00:39:10,670 --> 00:39:12,920 We want to improve their label efficiency. 611 00:39:12,920 --> 00:39:15,890 That means we just want to be able to use fewer images 612 00:39:15,890 --> 00:39:20,090 to do the same tasks so as opposed to having the hundreds of thousands 613 00:39:20,090 --> 00:39:27,560 or millions of images, can we do it with 1,000, 100, 10 images? 614 00:39:27,560 --> 00:39:31,010 Reducing model complexity, so the convolutional neural networks 615 00:39:31,010 --> 00:39:36,680 that we trained here are on the order of some of them as large as 500 million 616 00:39:36,680 --> 00:39:41,240 or even a billion parameters, a billion weights. 617 00:39:41,240 --> 00:39:43,880 And those are massive. 618 00:39:43,880 --> 00:39:48,320 We're also talking about huge high resolution images. 619 00:39:48,320 --> 00:39:55,400 Original chest x-rays are somewhere between 3,000 by 4,000 pixels. 620 00:39:55,400 --> 00:40:01,970 A lot of the literature will downsample them all the way to 224 by 224. 621 00:40:01,970 --> 00:40:09,230 With our resources, we're able to train on images that are size 1,024 by 1,024. 622 00:40:09,230 --> 00:40:11,540 So that's another big advantage. 623 00:40:11,540 --> 00:40:15,110 624 00:40:15,110 --> 00:40:19,250 So what is our method here in order to try to affect this third party 625 00:40:19,250 --> 00:40:20,810 impact that I mentioned? 626 00:40:20,810 --> 00:40:25,130 What we want to do is just basically do as much as possible on Google's side 627 00:40:25,130 --> 00:40:28,970 as we can before we hand off something resembling a model 628 00:40:28,970 --> 00:40:33,020 or something resembling the training to the third party 629 00:40:33,020 --> 00:40:37,460 to the person who might actually be doing the end training. 630 00:40:37,460 --> 00:40:41,780 So there's a fair amount of text on here, 631 00:40:41,780 --> 00:40:46,040 but let's just let's just break it down fairly simply. 632 00:40:46,040 --> 00:40:49,570 So going back here when I mentioned-- 633 00:40:49,570 --> 00:40:53,590 634 00:40:53,590 --> 00:40:57,880 So this is a rough diagram of a neural network. 635 00:40:57,880 --> 00:41:03,020 And you can see we're probably looking at moving left to right, 636 00:41:03,020 --> 00:41:05,800 which is through these neurons. 637 00:41:05,800 --> 00:41:09,520 But you can see let's say you were to turn it on its side, 638 00:41:09,520 --> 00:41:11,830 it resembles something like a pyramid. 639 00:41:11,830 --> 00:41:16,780 The way that we're envisioning this here and a useful analogy 640 00:41:16,780 --> 00:41:18,280 here is to think of it as a pyramid. 641 00:41:18,280 --> 00:41:21,380 So on the bottom, we have our chest x-ray image, 642 00:41:21,380 --> 00:41:28,390 which we feed into the model as pixels, just as numerical values. 643 00:41:28,390 --> 00:41:33,130 And we slowly hone in on our classification. 644 00:41:33,130 --> 00:41:37,540 So the actual output of the model is going to be just one number generally 645 00:41:37,540 --> 00:41:39,700 that will be a number probably between zero and one 646 00:41:39,700 --> 00:41:43,550 that represents our prediction. 647 00:41:43,550 --> 00:41:47,560 So usually, it would be something like if it's closer to one, 648 00:41:47,560 --> 00:41:52,100 then we are predicting that the disease is there or the condition is there. 649 00:41:52,100 --> 00:41:56,330 And if it's zero, it's not there. 650 00:41:56,330 --> 00:42:02,660 So here within our pyramid, these layers in here are artificial neurons. 651 00:42:02,660 --> 00:42:05,260 So we have many, many, many, many layers of them 652 00:42:05,260 --> 00:42:08,830 and again millions billions of them in there. 653 00:42:08,830 --> 00:42:11,980 Somewhere in there toward the top of it, we 654 00:42:11,980 --> 00:42:16,420 have a layer that is not quite the end result, 655 00:42:16,420 --> 00:42:21,160 but it has a lot of what we call features. 656 00:42:21,160 --> 00:42:24,910 So these are still just floating point numbers. 657 00:42:24,910 --> 00:42:29,720 And we can't exactly point to one thing that they represent. 658 00:42:29,720 --> 00:42:33,260 But they are useful for the diagnosis. 659 00:42:33,260 --> 00:42:40,360 So they're not quite as interpretable as say oh, this is a left lung, 660 00:42:40,360 --> 00:42:42,100 and this is a right lung. 661 00:42:42,100 --> 00:42:46,210 But you can think of it as something resembling that. 662 00:42:46,210 --> 00:42:48,850 And we're going to call this the embedding. 663 00:42:48,850 --> 00:42:54,940 This is in our case, an embedding is really just a string of numbers. 664 00:42:54,940 --> 00:43:01,510 It's maybe 1,000 1,300 2,000 numbers in a row. 665 00:43:01,510 --> 00:43:05,450 And these are our features for the chest x-ray. 666 00:43:05,450 --> 00:43:09,100 So another useful way of thinking about it is this 667 00:43:09,100 --> 00:43:13,120 is just a very fancy image compression algorithm. 668 00:43:13,120 --> 00:43:18,460 So we put in a chest x-ray image, which is 3,000 by 4,000 pixels. 669 00:43:18,460 --> 00:43:21,700 And what we got out of it was 1,000 numbers. 670 00:43:21,700 --> 00:43:29,270 So just like you might compress an audio file or a video file on your computer, 671 00:43:29,270 --> 00:43:32,050 this is sort of a compression of that in a way 672 00:43:32,050 --> 00:43:35,860 that we're trying to retain as much of the useful information as possible. 673 00:43:35,860 --> 00:43:38,380 674 00:43:38,380 --> 00:43:43,420 So let's take that definition of embedding and go forward from here. 675 00:43:43,420 --> 00:43:46,630 OK, so what was our technique here? 676 00:43:46,630 --> 00:43:49,450 It's pretty much as simple as it sounds. 677 00:43:49,450 --> 00:43:52,420 So I don't want to complicate this and make 678 00:43:52,420 --> 00:43:55,090 it sound like it was too, too fancy. 679 00:43:55,090 --> 00:43:58,480 What we have is what we call a CXR pre-trained network, 680 00:43:58,480 --> 00:44:00,830 a chest x-ray pre-trained network. 681 00:44:00,830 --> 00:44:04,810 So in a typical setup for machine learning, 682 00:44:04,810 --> 00:44:07,690 you often do what's called transfer learning, which 683 00:44:07,690 --> 00:44:10,570 means that you take a model that was trained on one task, 684 00:44:10,570 --> 00:44:13,750 and you try to train it on another. 685 00:44:13,750 --> 00:44:18,440 You fine tune it on another task by exposing it to new data. 686 00:44:18,440 --> 00:44:24,520 And generally, that is an advantage that tends to get better results. 687 00:44:24,520 --> 00:44:28,510 The story, the truth of it for medical imaging 688 00:44:28,510 --> 00:44:31,370 is that may not be all that useful. 689 00:44:31,370 --> 00:44:36,460 So the state of the art in all kinds of machine learning and computer 690 00:44:36,460 --> 00:44:38,890 vision out there is usually done on a data set 691 00:44:38,890 --> 00:44:46,840 called ImageNet, which is I think 14 million natural images of 1,000 692 00:44:46,840 --> 00:44:47,800 different classes. 693 00:44:47,800 --> 00:44:52,960 And everyone competes to get the best score on this. 694 00:44:52,960 --> 00:44:56,890 Then what we do on in medical imaging is we usually take 695 00:44:56,890 --> 00:45:01,780 a model trained on those, and then we expose it to medical images, 696 00:45:01,780 --> 00:45:03,320 and we get good results. 697 00:45:03,320 --> 00:45:05,488 But the question is, do we get better results 698 00:45:05,488 --> 00:45:07,030 than if we just started from nothing? 699 00:45:07,030 --> 00:45:10,780 And I think the jury is still out on that, actually. 700 00:45:10,780 --> 00:45:15,800 Because it may speed up the time it takes for us to train it. 701 00:45:15,800 --> 00:45:19,550 But I'm not sure that the end result is a better model. 702 00:45:19,550 --> 00:45:21,310 So what we're proposing here is basically 703 00:45:21,310 --> 00:45:25,660 kind of adding in another step where we train things 704 00:45:25,660 --> 00:45:28,670 on more specific data for your task. 705 00:45:28,670 --> 00:45:32,680 So we have the ability to train it on a million chest x-rays. 706 00:45:32,680 --> 00:45:36,040 So then when you take your model, it will already 707 00:45:36,040 --> 00:45:38,020 know what a chest x-ray is. 708 00:45:38,020 --> 00:45:41,810 It will have some way of reasoning about that. 709 00:45:41,810 --> 00:45:44,320 So then on the right side, we're going to take 710 00:45:44,320 --> 00:45:49,302 this pre-trained network, which we're using to generate these embeddings. 711 00:45:49,302 --> 00:45:51,260 And then we can do a bunch of different things. 712 00:45:51,260 --> 00:45:56,860 So the typical setup is on the bottom right. 713 00:45:56,860 --> 00:46:01,395 There's a task specific network, and we can fine tune the whole network. 714 00:46:01,395 --> 00:46:03,960 715 00:46:03,960 --> 00:46:06,690 So this is usually going to get the best performance, 716 00:46:06,690 --> 00:46:09,540 but there's a few concerns here in our case. 717 00:46:09,540 --> 00:46:14,520 One is that we trained on private data from a few different data sets. 718 00:46:14,520 --> 00:46:20,620 And we don't want to expose that data in any way to the outside world. 719 00:46:20,620 --> 00:46:26,320 So releasing the model may be a risk in that sense. 720 00:46:26,320 --> 00:46:28,390 So what can we do instead? 721 00:46:28,390 --> 00:46:34,560 Well, we are able actually to just expose the model as an API. 722 00:46:34,560 --> 00:46:42,240 So you can call this model with your images, and you get back embeddings. 723 00:46:42,240 --> 00:46:45,810 And then from there in these strategy 1 and 2, 724 00:46:45,810 --> 00:46:50,520 you can see you can actually train a very small model on that 725 00:46:50,520 --> 00:46:54,160 and get decent results in any case. 726 00:46:54,160 --> 00:46:59,940 So the way we were visualizing this is, again hopefully very instructive. 727 00:46:59,940 --> 00:47:03,210 If this is the pyramid, we're sort of just chopping off 728 00:47:03,210 --> 00:47:05,220 the head of the pyramid. 729 00:47:05,220 --> 00:47:09,450 And then we're asking you to fill to fill 730 00:47:09,450 --> 00:47:14,970 in the space so as opposed to having to build the whole pyramid yourself. 731 00:47:14,970 --> 00:47:18,510 So this is where this really cool animation comes from. 732 00:47:18,510 --> 00:47:21,900 On the left, you can see the traditional approach 733 00:47:21,900 --> 00:47:25,110 is to build a pyramid for each different task. 734 00:47:25,110 --> 00:47:29,340 So you have to start all the way from scratch or from ImageNet, 735 00:47:29,340 --> 00:47:31,500 and it's going to take time. 736 00:47:31,500 --> 00:47:35,880 On the right, we gave you a pyramid with the head chopped off. 737 00:47:35,880 --> 00:47:42,120 And you can just add your own different heads that represent different tasks. 738 00:47:42,120 --> 00:47:45,180 And what we found is that this approach, you 739 00:47:45,180 --> 00:47:52,300 get really good results even without having started from the very beginning. 740 00:47:52,300 --> 00:48:00,420 So I'll very quickly talk about this is more highly technical side of things, 741 00:48:00,420 --> 00:48:04,090 but this is the type of learning that we did for this. 742 00:48:04,090 --> 00:48:08,610 So when you start your machine learning education, 743 00:48:08,610 --> 00:48:10,980 you will probably learn a lot about cross entropy 744 00:48:10,980 --> 00:48:18,000 loss, which is kind of the typical task or objective for classification 745 00:48:18,000 --> 00:48:19,140 problems. 746 00:48:19,140 --> 00:48:22,020 But there's a lot of other options out there. 747 00:48:22,020 --> 00:48:24,390 Some of them are contrastive learning. 748 00:48:24,390 --> 00:48:26,560 And this is just what it sounds like. 749 00:48:26,560 --> 00:48:31,840 So you're taking to two images, and you're comparing them to each other. 750 00:48:31,840 --> 00:48:34,800 And then you're giving the model some information 751 00:48:34,800 --> 00:48:37,650 about what those two images represent. 752 00:48:37,650 --> 00:48:41,760 So in this case, or in the images we have 753 00:48:41,760 --> 00:48:43,950 here, what we're doing is we're telling the model 754 00:48:43,950 --> 00:48:46,710 that these two images are both of a dog. 755 00:48:46,710 --> 00:48:52,110 And so we want to learn some representation, 756 00:48:52,110 --> 00:48:57,000 some embedding of this image that brings those two things together. 757 00:48:57,000 --> 00:49:00,150 And then we want to push all the cats away. 758 00:49:00,150 --> 00:49:05,070 So this is the same for our medical imaging task. 759 00:49:05,070 --> 00:49:10,540 What we're going to do is we're going to say that these two images are abnormal. 760 00:49:10,540 --> 00:49:13,260 So that means that they as you saw in the previous slides, 761 00:49:13,260 --> 00:49:16,770 maybe they have a pneumothorax, maybe they have a nodule, 762 00:49:16,770 --> 00:49:19,200 maybe they have a fracture, they're abnormal. 763 00:49:19,200 --> 00:49:23,470 So put them together in our embedding space, 764 00:49:23,470 --> 00:49:25,990 and then push away any ones that are normal. 765 00:49:25,990 --> 00:49:29,520 So anything where a chest x-ray doesn't have anything that requires 766 00:49:29,520 --> 00:49:32,640 follow up or is of note. 767 00:49:32,640 --> 00:49:38,070 So the nice thing about this is that most of our radiology classification 768 00:49:38,070 --> 00:49:40,840 tasks can be reformulated in this way. 769 00:49:40,840 --> 00:49:46,050 So you can imagine that if you could identify a chest x-ray as normal, 770 00:49:46,050 --> 00:49:47,880 then we're kind of done with it. 771 00:49:47,880 --> 00:49:51,730 The decision tree is basically you move on. 772 00:49:51,730 --> 00:49:56,880 But the abnormals is where we want to start separating things out. 773 00:49:56,880 --> 00:49:59,040 What's interesting about the loss, and this 774 00:49:59,040 --> 00:50:02,160 is a perfect example of why I love this field is 775 00:50:02,160 --> 00:50:06,480 because even doing something like this, we 776 00:50:06,480 --> 00:50:09,870 get separation of the abnormal images. 777 00:50:09,870 --> 00:50:12,250 They get separated from each other to some extent. 778 00:50:12,250 --> 00:50:15,690 So the fractures get separated from the pneumothorax. 779 00:50:15,690 --> 00:50:18,150 And I'm not really sure why. 780 00:50:18,150 --> 00:50:21,240 It's something that we want to actively consider. 781 00:50:21,240 --> 00:50:23,260 But here's a visualization of that. 782 00:50:23,260 --> 00:50:26,850 So if you remember, I pointed earlier to this red and blue 783 00:50:26,850 --> 00:50:31,270 and how prescient that would be for our classification problem. 784 00:50:31,270 --> 00:50:35,010 So here, we're using a visualization technique 785 00:50:35,010 --> 00:50:40,440 called T distributed stochastic neighbor embedding, or TC. 786 00:50:40,440 --> 00:50:46,680 And this is just a very fancy name for a way of taking our long vector, 787 00:50:46,680 --> 00:50:51,030 are array of 1,000 floats. 788 00:50:51,030 --> 00:50:58,170 And then we're compressing it, or we're projecting it down onto two numbers, 789 00:50:58,170 --> 00:51:00,840 two axes, x and a y so that we can visualize. 790 00:51:00,840 --> 00:51:02,760 It so it's definitely an oversimplification, 791 00:51:02,760 --> 00:51:06,570 but it helps us as humans reason about it a little bit, right? 792 00:51:06,570 --> 00:51:09,900 So what we're seeing here is that on the left, 793 00:51:09,900 --> 00:51:14,820 if we take a model that was trained on ImageNet, and we do this technique, 794 00:51:14,820 --> 00:51:17,670 you can see that there's not really very good separation. 795 00:51:17,670 --> 00:51:22,260 It would be very hard for you as a human to draw a circle around just 796 00:51:22,260 --> 00:51:24,210 the blue dots that excludes. 797 00:51:24,210 --> 00:51:27,210 The red dots and just the same for a machine, 798 00:51:27,210 --> 00:51:28,620 it would be very hard to do that. 799 00:51:28,620 --> 00:51:31,208 On the right, it's a little bit easier. 800 00:51:31,208 --> 00:51:33,000 It's not perfect, but you can see you could 801 00:51:33,000 --> 00:51:35,970 imagine drawing a little circle around the blue, 802 00:51:35,970 --> 00:51:38,940 and that's a separation from the red. 803 00:51:38,940 --> 00:51:42,360 And this on the right is our chest x-ray network. 804 00:51:42,360 --> 00:51:46,320 So you can see that even without having done any work, 805 00:51:46,320 --> 00:51:47,850 we have a pretty good separation. 806 00:51:47,850 --> 00:51:52,080 We already have a pretty good way of classifying this airspace opacity 807 00:51:52,080 --> 00:51:53,400 class. 808 00:51:53,400 --> 00:51:57,870 So that's a pretty cool result. 809 00:51:57,870 --> 00:52:04,050 And I'll say here very quickly just to talk 810 00:52:04,050 --> 00:52:09,000 a little bit about the metrics, what we're visualizing here 811 00:52:09,000 --> 00:52:15,750 or what we're graphing here is something called AUC, area under the curve so 812 00:52:15,750 --> 00:52:17,430 actually, more on that in a minute. 813 00:52:17,430 --> 00:52:21,240 I will say that what we're showing here is 814 00:52:21,240 --> 00:52:25,170 that we're able to get basically the same performance as some 815 00:52:25,170 --> 00:52:32,040 of these pre-trained networks that are fine tuned on the task at hand. 816 00:52:32,040 --> 00:52:38,110 Our model with a small layer trained on top is the red. 817 00:52:38,110 --> 00:52:44,580 So you can see it's best in show up to an order of magnitude 818 00:52:44,580 --> 00:52:46,680 around 10,000 images. 819 00:52:46,680 --> 00:52:50,580 So only there, if you have more than 10,000 images 820 00:52:50,580 --> 00:52:54,348 do you start to get an advantage from training on all of them 821 00:52:54,348 --> 00:52:55,890 and training a large network on them. 822 00:52:55,890 --> 00:53:00,030 Otherwise, you may be better off doing our simple approach, which 823 00:53:00,030 --> 00:53:01,620 is much faster. 824 00:53:01,620 --> 00:53:06,450 And in the case of two really vital diagnoses 825 00:53:06,450 --> 00:53:11,610 here, tuberculosis and COVID-19, we get better results 826 00:53:11,610 --> 00:53:17,760 when we use our small models, and that's because we have so few images. 827 00:53:17,760 --> 00:53:22,710 So I'll bring up this slide so we can talk about some of the metrics 828 00:53:22,710 --> 00:53:23,340 as I mentioned. 829 00:53:23,340 --> 00:53:27,820 So here, we have AUC is on the left. 830 00:53:27,820 --> 00:53:29,770 And that's area under the curve. 831 00:53:29,770 --> 00:53:30,960 So what is that curve? 832 00:53:30,960 --> 00:53:33,000 Here is that curve. 833 00:53:33,000 --> 00:53:37,210 It's called the receiver operating characteristic curve. 834 00:53:37,210 --> 00:53:41,010 So it represents the trade between finding 835 00:53:41,010 --> 00:53:44,530 true positives and false positives. 836 00:53:44,530 --> 00:53:48,840 So once you have your number in a classification model 837 00:53:48,840 --> 00:53:51,780 that's between zero and one, you have to decide 838 00:53:51,780 --> 00:53:54,780 what is your so-called operating point. 839 00:53:54,780 --> 00:53:59,590 So that's the threshold at which you say, OK above this is positive, 840 00:53:59,590 --> 00:54:02,020 below this is negative. 841 00:54:02,020 --> 00:54:08,260 And that's a whole can of worms to try to figure out your operating point. 842 00:54:08,260 --> 00:54:11,640 You might think OK, let's just start with 0.5 well most of the time, 843 00:54:11,640 --> 00:54:14,520 it's not perfectly well distributed. 844 00:54:14,520 --> 00:54:17,130 So maybe 0.8 or 0.7. 845 00:54:17,130 --> 00:54:21,870 So you have a choice to make as to where to pick that operating point 846 00:54:21,870 --> 00:54:24,610 to kind of get the best trade-offs. 847 00:54:24,610 --> 00:54:28,140 So in some cases, you may want to have-- 848 00:54:28,140 --> 00:54:33,720 you may want to optimize for more true positives or a higher 849 00:54:33,720 --> 00:54:38,280 true positive rate and then a false positive rate or a lower 850 00:54:38,280 --> 00:54:39,430 false positive rate. 851 00:54:39,430 --> 00:54:43,238 So this graph, this receiver operating characteristic curve just 852 00:54:43,238 --> 00:54:45,030 represents the trade-off between these two. 853 00:54:45,030 --> 00:54:49,410 So everywhere along this line is a possible operating point 854 00:54:49,410 --> 00:54:51,060 that you can pick. 855 00:54:51,060 --> 00:54:53,730 And each one is going to have, again different trade-offs. 856 00:54:53,730 --> 00:54:57,810 So when we talk about what scores we're giving our model, 857 00:54:57,810 --> 00:55:02,470 where we actually just go ahead and add up all the area under this curve. 858 00:55:02,470 --> 00:55:07,110 So that the better this curve is, the more up into the left it is, right? 859 00:55:07,110 --> 00:55:13,320 So this AUC here is on the order of 0.9, 0.92, 0.95. 860 00:55:13,320 --> 00:55:18,180 You can see most of the graph is underneath the curve. 861 00:55:18,180 --> 00:55:21,280 And that's a good thing. 862 00:55:21,280 --> 00:55:27,030 And also here, we have graphed a few different radiologists. 863 00:55:27,030 --> 00:55:31,230 So this is from having them review the same images, 864 00:55:31,230 --> 00:55:36,150 we can show that our performance is at least as good as them. 865 00:55:36,150 --> 00:55:41,520 Because they fall somewhere on the curve or below the curve generally. 866 00:55:41,520 --> 00:55:44,010 But what's fascinating about this is that here, we've 867 00:55:44,010 --> 00:55:46,320 trained on only 100 images. 868 00:55:46,320 --> 00:55:49,590 We trained a very simple model that took five minutes. 869 00:55:49,590 --> 00:55:55,480 And the train and the test sets are from different countries. 870 00:55:55,480 --> 00:56:01,770 So another common practice in machine learning is to train on certain images, 871 00:56:01,770 --> 00:56:05,790 and then you hold out some images so that you 872 00:56:05,790 --> 00:56:09,580 can be sure that you are testing on something sort of fair. 873 00:56:09,580 --> 00:56:14,310 So in this case, our training images came from the US, 874 00:56:14,310 --> 00:56:16,660 and our test images came from China. 875 00:56:16,660 --> 00:56:18,790 So we can be sure that it does pretty well 876 00:56:18,790 --> 00:56:23,700 even if we're using different demographics, different imaging 877 00:56:23,700 --> 00:56:25,030 equipment. 878 00:56:25,030 --> 00:56:31,590 So that's a really good result. And then here, we have even more ROC curves. 879 00:56:31,590 --> 00:56:34,350 And we trained a model on COVID-19 severity. 880 00:56:34,350 --> 00:56:41,460 So our ability to predict whether a COVID-19 patient would need to be 881 00:56:41,460 --> 00:56:45,780 ventilated or go into the ICU. 882 00:56:45,780 --> 00:56:48,970 Again, the interesting thing here is we only have 500 images. 883 00:56:48,970 --> 00:56:54,060 So I'll pause here again to talk quickly about the sum of how 884 00:56:54,060 --> 00:56:55,980 this research came to be. 885 00:56:55,980 --> 00:57:01,560 In March of 2020, we were looking at developing an abnormal normal model, 886 00:57:01,560 --> 00:57:04,620 so as I mentioned, this ability to separate abnormal chest 887 00:57:04,620 --> 00:57:09,150 x-rays from normal chest x-rays, and then when the pandemic hit, 888 00:57:09,150 --> 00:57:11,580 we wondered could we do something with this model 889 00:57:11,580 --> 00:57:19,170 to help with the scarcity of testing that was available at that time. 890 00:57:19,170 --> 00:57:21,450 We talked to our partners at Northwestern, 891 00:57:21,450 --> 00:57:24,830 and we found that they had about 500 images. 892 00:57:24,830 --> 00:57:27,550 And it was March of 2020 so it was very new. 893 00:57:27,550 --> 00:57:30,280 They didn't have a lot of images. 894 00:57:30,280 --> 00:57:32,190 So what can we do with that few images? 895 00:57:32,190 --> 00:57:36,690 Well that's where we started to look at these very low label, 896 00:57:36,690 --> 00:57:41,050 low number of images case studies. 897 00:57:41,050 --> 00:57:44,880 So the result that we showed was that yes, you can actually predict 898 00:57:44,880 --> 00:57:46,590 some of this stuff with chest x-ray. 899 00:57:46,590 --> 00:57:53,850 But I also want to highlight that this conversation, we did not put anything 900 00:57:53,850 --> 00:57:58,050 out there because we, having spoken to our radiologists, our clinicians 901 00:57:58,050 --> 00:58:04,380 on staff the determination was basically that this is not 902 00:58:04,380 --> 00:58:06,460 a part of a normal clinical workflow. 903 00:58:06,460 --> 00:58:11,070 So even though you could diagnose COVID-19 from a chest x-ray, 904 00:58:11,070 --> 00:58:13,500 it would be a strange thing to have people 905 00:58:13,500 --> 00:58:17,520 come in who had symptoms of COVID and then take a chest x-ray of them 906 00:58:17,520 --> 00:58:19,140 and then diagnose them. 907 00:58:19,140 --> 00:58:20,890 That's not typically what would happen. 908 00:58:20,890 --> 00:58:24,940 And so it wasn't super useful off the bat. 909 00:58:24,940 --> 00:58:28,440 However, it was a great way of demonstrating 910 00:58:28,440 --> 00:58:32,970 that we are able to train these models that can very quickly adapt. 911 00:58:32,970 --> 00:58:38,580 And that continues to be true even today because the landscape of COVID-19 912 00:58:38,580 --> 00:58:41,650 is always changing more so than anything we've ever seen before. 913 00:58:41,650 --> 00:58:45,540 So now we have patient populations that are 914 00:58:45,540 --> 00:58:52,560 changing in terms of the vaccine rates, their demographics for who's 915 00:58:52,560 --> 00:58:54,840 being affected, the different variants. 916 00:58:54,840 --> 00:58:57,840 So all of these things are things that are going to challenge our model, 917 00:58:57,840 --> 00:59:02,280 but if we're able to quickly adapt with a small number of images like this, 918 00:59:02,280 --> 00:59:04,590 then we're well suited. 919 00:59:04,590 --> 00:59:07,440 So I'm almost to the good stuff. 920 00:59:07,440 --> 00:59:11,610 I will say one other thing just to set up the-- 921 00:59:11,610 --> 00:59:14,700 so another type of-- so I mentioned earlier this idea 922 00:59:14,700 --> 00:59:19,110 of contrastive losses, and we're taking two images and bringing them together. 923 00:59:19,110 --> 00:59:22,140 Another cool thing that you can do is take 924 00:59:22,140 --> 00:59:28,260 not two images, but one image and one piece of text and bring them together. 925 00:59:28,260 --> 00:59:32,140 And you tell the model, these are similar, even though they're different. 926 00:59:32,140 --> 00:59:34,560 One of them's an image, one of them's text. 927 00:59:34,560 --> 00:59:37,600 You can bring them together in this embedding space. 928 00:59:37,600 --> 00:59:41,220 So this was developed originally for the clip paper, 929 00:59:41,220 --> 00:59:47,370 but it's now been applied to chest x-rays with the convert and check zero. 930 00:59:47,370 --> 00:59:51,630 And we'll just hold that thought for now because it will become important 931 00:59:51,630 --> 00:59:53,890 in a minute. 932 00:59:53,890 --> 00:59:56,680 So what can we do with these embeddings? 933 00:59:56,680 --> 00:59:59,870 We can do a few things, and I'm going to do-- help you do all of these right 934 00:59:59,870 --> 01:00:00,370 now. 935 01:00:00,370 --> 01:00:02,070 So let's get to it. 936 01:00:02,070 --> 01:00:04,390 A zero-shot image classifier, what does that mean? 937 01:00:04,390 --> 01:00:11,130 Well zero-shot means that the model has not seen anything 938 01:00:11,130 --> 01:00:13,870 or has not been given a label of something before. 939 01:00:13,870 --> 01:00:19,530 So you could imagine training something on Fracture, 940 01:00:19,530 --> 01:00:24,450 and then you ask it where is the pneumothorax? 941 01:00:24,450 --> 01:00:26,190 That would be an example of a zero-shot? 942 01:00:26,190 --> 01:00:30,390 And it's a very difficult problem, obviously because machine learning 943 01:00:30,390 --> 01:00:36,160 models benefit from having seen labeled images beforehand. 944 01:00:36,160 --> 01:00:38,730 So what if you don't have that? 945 01:00:38,730 --> 01:00:40,800 Well, this is what we'll look at here. 946 01:00:40,800 --> 01:00:42,660 Another case is text image retrieval. 947 01:00:42,660 --> 01:00:48,640 So what if you wanted to look up images based on the text that you input? 948 01:00:48,640 --> 01:00:51,960 So this is what I'm demonstrating here on the right and what we'll build. 949 01:00:51,960 --> 01:00:56,280 What if you wanted to build a radiology report generator? 950 01:00:56,280 --> 01:00:58,110 We can do that, too. 951 01:00:58,110 --> 01:01:01,260 And I want to pause here also to really thank 952 01:01:01,260 --> 01:01:08,160 the folks that put together these two large open source data sets MIMIC-CXR, 953 01:01:08,160 --> 01:01:09,250 it comes out of an MIT. 954 01:01:09,250 --> 01:01:15,270 CheXpert comes out of Stanford, and they're really, really useful 955 01:01:15,270 --> 01:01:18,460 for developing these kinds of models. 956 01:01:18,460 --> 01:01:23,700 So let me go ahead and switch to Colab so we 957 01:01:23,700 --> 01:01:27,240 can get some cool demos going here. 958 01:01:27,240 --> 01:01:30,630 I have taken more time than I expected. 959 01:01:30,630 --> 01:01:36,360 David, do you have any questions you want to feed in right now? 960 01:01:36,360 --> 01:01:38,280 DAVID MALAN: No, I think folks are probably 961 01:01:38,280 --> 01:01:39,990 about to really enjoy the hands-on part. 962 01:01:39,990 --> 01:01:41,100 So we should forge ahead. 963 01:01:41,100 --> 01:01:42,558 ANDREW SELLERGREN: Ok, sounds good. 964 01:01:42,558 --> 01:01:45,420 So if you're not familiar, this is Colab. 965 01:01:45,420 --> 01:01:49,710 It's based on Jupiter, which is also available. 966 01:01:49,710 --> 01:01:55,620 And it's basically just a playground for you to run Python code locally 967 01:01:55,620 --> 01:02:01,500 or you can have it connected to a runtime that's hosted in the cloud. 968 01:02:01,500 --> 01:02:05,013 So I will make some version of this code available. 969 01:02:05,013 --> 01:02:07,430 I need to clean it up because it's embarrassing right now. 970 01:02:07,430 --> 01:02:11,600 But it will serve its purposes now. 971 01:02:11,600 --> 01:02:17,540 So here, there's just some instructions for setting this up to run locally. 972 01:02:17,540 --> 01:02:21,217 And I will I've also prefab some of this. 973 01:02:21,217 --> 01:02:24,050 So here we're just going to go ahead and import a bunch of libraries 974 01:02:24,050 --> 01:02:29,330 that we need, a few of them are TensorFlow, SK Learn, Pandas. 975 01:02:29,330 --> 01:02:33,230 A lot of them are very common data science libraries. 976 01:02:33,230 --> 01:02:36,350 CSR foundation is the name of our tool that we released 977 01:02:36,350 --> 01:02:40,010 and is a library on GitHub and on PIP. 978 01:02:40,010 --> 01:02:44,538 PIP is our library dependency management system. 979 01:02:44,538 --> 01:02:45,830 So I've already installed this. 980 01:02:45,830 --> 01:02:48,360 It also includes a bunch-- has a bunch of dependencies. 981 01:02:48,360 --> 01:02:51,660 So it takes a little longer to install. 982 01:02:51,660 --> 01:02:56,510 So now, I'm connected to my local machine here. 983 01:02:56,510 --> 01:03:01,130 And I'm just going to go into this folder for MIMIC-CXR. 984 01:03:01,130 --> 01:03:03,950 This step unfortunately takes a little bit too long. 985 01:03:03,950 --> 01:03:06,800 But what I'm doing here is just reading in the radiology reports. 986 01:03:06,800 --> 01:03:12,060 Because we're going to use those as inputs to our model. 987 01:03:12,060 --> 01:03:16,220 And here, we're also going to take the radiology reports, 988 01:03:16,220 --> 01:03:18,890 and we're going to just clip into a section of them called 989 01:03:18,890 --> 01:03:20,310 the impressions section. 990 01:03:20,310 --> 01:03:24,470 So that's usually some kind of a summary that the radiologist 991 01:03:24,470 --> 01:03:28,370 has given that is less detailed but more focused 992 01:03:28,370 --> 01:03:31,980 and will be good for our training. 993 01:03:31,980 --> 01:03:37,380 So here, I've also just again, done some more pre-processing of this. 994 01:03:37,380 --> 01:03:40,960 But I'll show you what this looks like in a minute. 995 01:03:40,960 --> 01:03:43,830 OK, so let's examine our embeddings. 996 01:03:43,830 --> 01:03:48,980 So as I mentioned, these are just arrays of numbers. 997 01:03:48,980 --> 01:03:51,930 So to prove that point, here's one of them. 998 01:03:51,930 --> 01:03:54,710 So these range in value. 999 01:03:54,710 --> 01:03:57,140 I don't know that they have a min and a max. 1000 01:03:57,140 --> 01:04:00,270 You can normalize them and so on, but they could be negative. 1001 01:04:00,270 --> 01:04:01,340 They can be positive. 1002 01:04:01,340 --> 01:04:04,790 There's about 1,000 of them in our particular model. 1003 01:04:04,790 --> 01:04:10,110 And believe it or not, they contain all of the information that we need. 1004 01:04:10,110 --> 01:04:14,190 So here, I have my labels file. 1005 01:04:14,190 --> 01:04:17,420 So for every one of my images, I have an identifier. 1006 01:04:17,420 --> 01:04:23,330 I want to load up the radiology report that corresponds to this. 1007 01:04:23,330 --> 01:04:26,270 So I've again, prefabed this. 1008 01:04:26,270 --> 01:04:32,780 But it's really a lot of this is working in Pandas, which is a data science 1009 01:04:32,780 --> 01:04:35,242 tool that is fantastic. 1010 01:04:35,242 --> 01:04:36,950 And it gives you back these things called 1011 01:04:36,950 --> 01:04:39,530 data frames, which is just a table. 1012 01:04:39,530 --> 01:04:43,040 It has rows and columns like an Excel spreadsheet. 1013 01:04:43,040 --> 01:04:48,720 And it allows you to have things associated with each particular image. 1014 01:04:48,720 --> 01:04:51,920 So here, I have our DICOM ID. 1015 01:04:51,920 --> 01:04:55,640 DICOM is a type of is a medical image format. 1016 01:04:55,640 --> 01:04:59,690 And then I have a bunch of these labels representing 1017 01:04:59,690 --> 01:05:05,060 different conditions that are diagnosed on chest x-ray, atelectasis, 1018 01:05:05,060 --> 01:05:06,920 cardiomegaly. 1019 01:05:06,920 --> 01:05:08,750 Cardiomegaly is an enlarged heart. 1020 01:05:08,750 --> 01:05:10,880 A lot of these other ones, edema consolidation, 1021 01:05:10,880 --> 01:05:15,140 atelectasis, they pertain to lung conditions. 1022 01:05:15,140 --> 01:05:17,310 There's fracture. 1023 01:05:17,310 --> 01:05:22,560 So this is the PA versus AP as I mentioned, 1024 01:05:22,560 --> 01:05:28,290 is the orientation of the person and the imaging equipment. 1025 01:05:28,290 --> 01:05:32,685 And finally over here, so I'll make this a little bit bigger. 1026 01:05:32,685 --> 01:05:35,810 Well, actually it's probably easier to just look at the impression section. 1027 01:05:35,810 --> 01:05:38,960 So the impression section is again, our boiled down version 1028 01:05:38,960 --> 01:05:41,070 of the radiology report. 1029 01:05:41,070 --> 01:05:47,300 And you can see, actually one big, let me go ahead and make more. 1030 01:05:47,300 --> 01:05:49,895 I'm just going to ask for more examples here. 1031 01:05:49,895 --> 01:05:52,540 1032 01:05:52,540 --> 01:05:59,850 So you can see this is a good example of one of the problems 1033 01:05:59,850 --> 01:06:03,270 that we'll have with our data, which is that it's very templated. 1034 01:06:03,270 --> 01:06:08,220 So when a radiology report is normal, they'll 1035 01:06:08,220 --> 01:06:13,440 often say something like normal study or no acute intrathoracic process, which 1036 01:06:13,440 --> 01:06:18,450 means nothing's going on in the chest and no acute cardiopulmonary process, 1037 01:06:18,450 --> 01:06:21,390 nothing's going on with the heart and the lungs. 1038 01:06:21,390 --> 01:06:28,170 The other ones that are for abnormals are going to be a lot more involved. 1039 01:06:28,170 --> 01:06:42,575 And we're going to, let's see if I can, I have to do a little bit of whoops. 1040 01:06:42,575 --> 01:06:52,430 1041 01:06:52,430 --> 01:06:56,880 OK this is just to get a little bit wider view of each of them. 1042 01:06:56,880 --> 01:07:01,350 So you can see there's a lot of text here. 1043 01:07:01,350 --> 01:07:06,750 And we're going to try to get our model to understand most of this text. 1044 01:07:06,750 --> 01:07:16,247 So this is not too interesting again, just setting up our frame a little bit. 1045 01:07:16,247 --> 01:07:18,080 I'm going to skip this part, skip this part. 1046 01:07:18,080 --> 01:07:21,270 1047 01:07:21,270 --> 01:07:23,858 So the first thing that we'll do is what I 1048 01:07:23,858 --> 01:07:26,150 mentioned earlier, which is we can take our embeddings, 1049 01:07:26,150 --> 01:07:28,790 and we can train a very small model. 1050 01:07:28,790 --> 01:07:35,450 So you can imagine actually, let me go ahead here's 1051 01:07:35,450 --> 01:07:41,030 our GitHub, which I will have links to, and you'll 1052 01:07:41,030 --> 01:07:43,130 need to if you want to use this tool, you'll need 1053 01:07:43,130 --> 01:07:44,900 to just fill out a very quick form. 1054 01:07:44,900 --> 01:07:51,140 But it's just kind of getting to know who you are, nothing too heavyweight. 1055 01:07:51,140 --> 01:08:03,440 So if we look at our model here, OK, so the important part is actually here. 1056 01:08:03,440 --> 01:08:05,870 So we're putting together just two of these layers that 1057 01:08:05,870 --> 01:08:07,130 are called dense layers. 1058 01:08:07,130 --> 01:08:12,230 So that's this line of those circles that I mentioned, those neurons. 1059 01:08:12,230 --> 01:08:14,180 And there's just two of them. 1060 01:08:14,180 --> 01:08:15,690 So that's it. 1061 01:08:15,690 --> 01:08:17,779 That's how complicated our model is. 1062 01:08:17,779 --> 01:08:23,210 And with Caris, which is a library that works on top of TensorFlow, 1063 01:08:23,210 --> 01:08:27,260 it's as simple as just saying give me one layer, give me another layer. 1064 01:08:27,260 --> 01:08:29,240 And then I put my model together. 1065 01:08:29,240 --> 01:08:32,340 So the rest of this stuff is a little bit. 1066 01:08:32,340 --> 01:08:35,120 You don't need to know too much about it right now. 1067 01:08:35,120 --> 01:08:37,370 It has to do with things like your learning rate. 1068 01:08:37,370 --> 01:08:41,270 But I just wanted to point that out to show you how simple the model is. 1069 01:08:41,270 --> 01:08:57,439 OK, so now, I'm going to start this training so that it will oh oh, OK, I'm 1070 01:08:57,439 --> 01:09:03,260 going to start at training so that it can run while I'm talking. 1071 01:09:03,260 --> 01:09:09,529 So what I've done is I've given it my labels, which are 1072 01:09:09,529 --> 01:09:12,109 for the condition called cardiomegaly. 1073 01:09:12,109 --> 01:09:16,050 And they're just one for positive, zero for negative. 1074 01:09:16,050 --> 01:09:20,029 These are just all of our hyper parameters. 1075 01:09:20,029 --> 01:09:23,870 So batch size, how many images are we going to look at once? 1076 01:09:23,870 --> 01:09:25,130 512. 1077 01:09:25,130 --> 01:09:26,779 How many epoch are we going to train? 1078 01:09:26,779 --> 01:09:31,439 20, we're going to go through the whole data set 20 times. 1079 01:09:31,439 --> 01:09:35,960 So what's nice about this, you'll see the first epoch, 1080 01:09:35,960 --> 01:09:39,859 the first pass through all the data is going to be pretty slow. 1081 01:09:39,859 --> 01:09:44,330 But then I'm going to cache the data set, and it will fly. 1082 01:09:44,330 --> 01:09:49,800 So as it's training here, you can notice the training loss is going down, 1083 01:09:49,800 --> 01:09:51,210 which is good. 1084 01:09:51,210 --> 01:09:54,920 The AUC, which is going up it's getting up to 0.81. 1085 01:09:54,920 --> 01:09:57,230 This is all for the training set. 1086 01:09:57,230 --> 01:10:00,830 So what we mostly care about is our validation set. 1087 01:10:00,830 --> 01:10:06,860 We want to know how does this do on data that it hasn't seen yet. 1088 01:10:06,860 --> 01:10:09,590 So we'll leave that off for a minute. 1089 01:10:09,590 --> 01:10:14,393 I wanted to ask David, are we able to go a little over time if I. 1090 01:10:14,393 --> 01:10:15,560 DAVID MALAN: Think we're OK. 1091 01:10:15,560 --> 01:10:18,030 Folks don't mind sticking around with us. 1092 01:10:18,030 --> 01:10:19,220 Happy to finish up. 1093 01:10:19,220 --> 01:10:20,512 ANDREW SELLERGREN: Yeah, sorry. 1094 01:10:20,512 --> 01:10:23,550 I underestimated how long this would go. 1095 01:10:23,550 --> 01:10:25,880 But I definitely, I'm able to stick around. 1096 01:10:25,880 --> 01:10:29,770 So if people have extra questions and so on. 1097 01:10:29,770 --> 01:10:31,520 DAVID MALAN: Absolutely, we can leave some 1098 01:10:31,520 --> 01:10:34,103 of the questions that were asked in the chat for the very end. 1099 01:10:34,103 --> 01:10:36,075 So folks can join or leave as needed. 1100 01:10:36,075 --> 01:10:39,880 ANDREW SELLERGREN: OK, sounds good. 1101 01:10:39,880 --> 01:10:42,800 OK, let's see. 1102 01:10:42,800 --> 01:10:49,400 Other the things I wanted to call out here, let me, 1103 01:10:49,400 --> 01:10:57,910 oh I don't have any so two of the things that I didn't-- 1104 01:10:57,910 --> 01:11:02,320 that I glossed over here were your learning rate and your optimizer. 1105 01:11:02,320 --> 01:11:05,720 So these are the things broadly speaking, 1106 01:11:05,720 --> 01:11:08,740 when you're doing machine learning, and it's doing back 1107 01:11:08,740 --> 01:11:13,130 propagation you're taking steps toward your end result. 1108 01:11:13,130 --> 01:11:16,480 And these just determine how big those steps are. 1109 01:11:16,480 --> 01:11:20,350 And the learning rate particular defines how big that step is. 1110 01:11:20,350 --> 01:11:22,750 You can do things like a learning rate function 1111 01:11:22,750 --> 01:11:25,610 where it will get smaller over time. 1112 01:11:25,610 --> 01:11:29,230 So that you're taking smaller steps sort of getting closer to your result. 1113 01:11:29,230 --> 01:11:30,980 OK, here we go. 1114 01:11:30,980 --> 01:11:34,750 So a couple of interesting things here before we move on, 1115 01:11:34,750 --> 01:11:37,270 you can notice that over each of the epochs 1116 01:11:37,270 --> 01:11:39,580 are this is our training that you see. 1117 01:11:39,580 --> 01:11:42,790 And it's getting much, much better, getting almost close to one, 1118 01:11:42,790 --> 01:11:44,890 which is the best possible result. 1119 01:11:44,890 --> 01:11:49,810 Our training loss here is constantly decreasing. 1120 01:11:49,810 --> 01:11:54,460 But our validation loss, so it starts off decreasing, 1121 01:11:54,460 --> 01:11:56,380 and then it basically plateaus. 1122 01:11:56,380 --> 01:11:59,110 It even starts to increase a little bit at the end. 1123 01:11:59,110 --> 01:12:04,000 And our AUC for the validation set stays about the same. 1124 01:12:04,000 --> 01:12:06,530 But it's still good, it's 0.86. 1125 01:12:06,530 --> 01:12:09,100 So this is an example of overfitting. 1126 01:12:09,100 --> 01:12:13,030 When your model, even one that's this small, 1127 01:12:13,030 --> 01:12:16,330 but the embedding is being as feature rich as they are, 1128 01:12:16,330 --> 01:12:19,720 it learns too well in some ways. 1129 01:12:19,720 --> 01:12:25,480 It's picking up on things that don't translate well to other data sets. 1130 01:12:25,480 --> 01:12:26,860 So we call this overfitting. 1131 01:12:26,860 --> 01:12:29,320 And there's a bunch of different ways to handle it. 1132 01:12:29,320 --> 01:12:32,260 A lot of techniques are called regularization 1133 01:12:32,260 --> 01:12:34,550 that try to address that. 1134 01:12:34,550 --> 01:12:40,090 And that's a lot of what we do as machine learning specialists. 1135 01:12:40,090 --> 01:12:43,720 So here, we can just very quickly look at model summary. 1136 01:12:43,720 --> 01:12:48,530 We'll give you a quick breakdown of the whole network. 1137 01:12:48,530 --> 01:12:51,040 So again, it's very, very small. 1138 01:12:51,040 --> 01:12:57,843 It has two dense layers of 512 and 256 neurons. 1139 01:12:57,843 --> 01:13:00,260 And the total number of parameters is less than a million. 1140 01:13:00,260 --> 01:13:01,635 So that's a pretty small network. 1141 01:13:01,635 --> 01:13:04,370 1142 01:13:04,370 --> 01:13:07,430 So let's do some more interesting stuff. 1143 01:13:07,430 --> 01:13:11,920 So now, as I mentioned before, we often take one data set, we train on it. 1144 01:13:11,920 --> 01:13:13,240 We validate on another. 1145 01:13:13,240 --> 01:13:15,980 We test on another one to see how it's doing. 1146 01:13:15,980 --> 01:13:20,740 So the MIMIC data set comes from MIT. 1147 01:13:20,740 --> 01:13:26,090 I can't remember which hospital systems, but it's mostly ICU patients, 1148 01:13:26,090 --> 01:13:27,640 so intensive care patients. 1149 01:13:27,640 --> 01:13:30,320 CheXpert is a data set that comes from Stanford, 1150 01:13:30,320 --> 01:13:35,150 so all the way across the country, different imaging equipment, 1151 01:13:35,150 --> 01:13:39,050 different distribution of patients. 1152 01:13:39,050 --> 01:13:42,670 So we're going to actually test our model on CheXpert 1153 01:13:42,670 --> 01:13:45,070 without having trained on it at all. 1154 01:13:45,070 --> 01:13:48,910 We want to see how well it does when it hasn't seen the data yet. 1155 01:13:48,910 --> 01:13:52,540 So I'm going to go ahead and load up our train 1156 01:13:52,540 --> 01:13:56,380 and our validation sets for CheXpert. 1157 01:13:56,380 --> 01:14:01,030 And this is another labels file that I've created. 1158 01:14:01,030 --> 01:14:07,240 You can see these numbers correspond to the various conditions, 1159 01:14:07,240 --> 01:14:08,500 cardiomegaly among them. 1160 01:14:08,500 --> 01:14:10,958 1161 01:14:10,958 --> 01:14:13,000 So now, I'm going to gloss over this a little bit 1162 01:14:13,000 --> 01:14:15,625 because I need to start the training and then we can come back. 1163 01:14:15,625 --> 01:14:29,920 1164 01:14:29,920 --> 01:14:35,710 OK, this is even messier than I remember. 1165 01:14:35,710 --> 01:14:42,080 So how are we going to feed our radiology report into the model? 1166 01:14:42,080 --> 01:14:46,760 Well, we need to convert it to numbers just like the images numbers. 1167 01:14:46,760 --> 01:14:50,050 So that's all that this stuff is doing here. 1168 01:14:50,050 --> 01:14:52,750 Here, I'm just setting up a little bit of a way for us 1169 01:14:52,750 --> 01:14:58,820 to use CheXpert as the evaluation set. 1170 01:14:58,820 --> 01:15:03,290 So now, our model is a little bit slightly more complicated, 1171 01:15:03,290 --> 01:15:06,190 but it's not too much so. 1172 01:15:06,190 --> 01:15:11,860 So we have a couple of layers here that are going to get added onto our image. 1173 01:15:11,860 --> 01:15:13,750 And then those a couple of different layers 1174 01:15:13,750 --> 01:15:15,730 are going to get added onto our text. 1175 01:15:15,730 --> 01:15:19,570 And then we're going to take the outputs of those two and compare them. 1176 01:15:19,570 --> 01:15:22,850 And that's going to be our task. 1177 01:15:22,850 --> 01:15:31,870 So here, in order to get numbers from our radiology reports, 1178 01:15:31,870 --> 01:15:34,510 we're going to pass them through this preprocessor. 1179 01:15:34,510 --> 01:15:38,860 So we're going to tokenize them in the language of natural language 1180 01:15:38,860 --> 01:15:40,250 processing. 1181 01:15:40,250 --> 01:15:46,520 So basically, that's just a dictionary of OK, this word is the number one. 1182 01:15:46,520 --> 01:15:48,680 This word is the number two and so on. 1183 01:15:48,680 --> 01:15:50,230 In fact they're actually word pieces. 1184 01:15:50,230 --> 01:15:55,750 So they're syllables or a few one part of the word. 1185 01:15:55,750 --> 01:16:01,810 And this is all a bunch of fancy stuff that just says, OK do that assignment. 1186 01:16:01,810 --> 01:16:04,010 Now, I mentioned the clip model. 1187 01:16:04,010 --> 01:16:06,700 This is where the magic is happening. 1188 01:16:06,700 --> 01:16:10,240 I'm just going to, let's gloss over that part. 1189 01:16:10,240 --> 01:16:13,870 All this is doing is passing our image through one side of it, 1190 01:16:13,870 --> 01:16:15,170 our text through another. 1191 01:16:15,170 --> 01:16:22,210 And again, this loss function here is a cosine similarity. 1192 01:16:22,210 --> 01:16:26,740 So it's going to say how close are these together. 1193 01:16:26,740 --> 01:16:31,430 The farther apart they are, the more, we want to bring them together. 1194 01:16:31,430 --> 01:16:34,570 And that's what our loss function is going to be. 1195 01:16:34,570 --> 01:16:38,450 This is just another learning rate function. 1196 01:16:38,450 --> 01:16:42,280 So here's our data set in TensorFlow. 1197 01:16:42,280 --> 01:16:48,430 You can basically define your data set sort of iteratively 1198 01:16:48,430 --> 01:16:51,310 where it reads in these files. 1199 01:16:51,310 --> 01:16:54,340 And then it processes them as you go. 1200 01:16:54,340 --> 01:16:56,290 And that will be a little bit faster than 1201 01:16:56,290 --> 01:16:59,060 if you tried to load it all into memory. 1202 01:16:59,060 --> 01:17:02,170 So let's look here. 1203 01:17:02,170 --> 01:17:09,400 I'm now initializing our model, our clip loss model. 1204 01:17:09,400 --> 01:17:15,820 I'm giving it a cosine decay, which is a type of learning rate. 1205 01:17:15,820 --> 01:17:19,780 I'm giving it the AdomW optimizer. 1206 01:17:19,780 --> 01:17:24,140 And then I'm just calling model.fit on it. 1207 01:17:24,140 --> 01:17:29,868 So each iteration is actually going to go for about 500 steps. 1208 01:17:29,868 --> 01:17:30,910 Then it's going to pause. 1209 01:17:30,910 --> 01:17:33,010 You can see it's paused right now. 1210 01:17:33,010 --> 01:17:39,460 And it's going to evaluate on a validation set of CheXpert. 1211 01:17:39,460 --> 01:17:44,380 So but what's interesting here is that, and I maybe 1212 01:17:44,380 --> 01:17:47,560 buried the lead a little bit here, but CheXpert, we 1213 01:17:47,560 --> 01:17:51,850 don't have the radiology reports for it. 1214 01:17:51,850 --> 01:17:54,700 So we have the radiology reports for MIMIC 1215 01:17:54,700 --> 01:17:56,630 and we're using that to train the model. 1216 01:17:56,630 --> 01:18:02,320 So how do we get a result from CheXpert at the time of evaluating our model? 1217 01:18:02,320 --> 01:18:04,130 Well, I glossed over this before. 1218 01:18:04,130 --> 01:18:06,430 But let's go back here. 1219 01:18:06,430 --> 01:18:11,440 The way that we're defining our labels now for CheXpert 1220 01:18:11,440 --> 01:18:18,040 is this so-called zero shot or kind of a soft prompt, right? 1221 01:18:18,040 --> 01:18:23,120 So we get to pick whatever we want to define our labels. 1222 01:18:23,120 --> 01:18:27,430 So if you want to search for a right pleural effusion 1223 01:18:27,430 --> 01:18:29,860 or a left pleural effusion, you can do that instead 1224 01:18:29,860 --> 01:18:32,860 of just a label of effusion or not. 1225 01:18:32,860 --> 01:18:37,690 You can specify moderate cardiomegaly, severe cardiomegaly. 1226 01:18:37,690 --> 01:18:41,720 And we're going to do a little bit of math to turn this into a prediction. 1227 01:18:41,720 --> 01:18:48,250 So over here on the right, this set of prompts 1228 01:18:48,250 --> 01:18:50,330 is going to say, OK, this is normal. 1229 01:18:50,330 --> 01:18:52,330 So as I mentioned earlier, this sort of template 1230 01:18:52,330 --> 01:18:57,550 of saying no acute cardiopulmonary process, that means nothing 1231 01:18:57,550 --> 01:18:59,570 is going on there. 1232 01:18:59,570 --> 01:19:02,450 And there's no pleural effusion. 1233 01:19:02,450 --> 01:19:07,730 So we can get our prediction from this. 1234 01:19:07,730 --> 01:19:12,830 Let's see how our training is going here. 1235 01:19:12,830 --> 01:19:16,950 So we got our first result. I'm actually going to just halt it 1236 01:19:16,950 --> 01:19:20,890 there since in the interest of time. 1237 01:19:20,890 --> 01:19:24,030 So you can see we're getting really good AUC 1238 01:19:24,030 --> 01:19:26,310 scores even though the model has not seen 1239 01:19:26,310 --> 01:19:31,650 this data set or these exact phrases. 1240 01:19:31,650 --> 01:19:37,080 We're sort of asking a question of the model that hasn't been asked before. 1241 01:19:37,080 --> 01:19:42,310 And one cool thing that we can now do with this, 1242 01:19:42,310 --> 01:19:46,800 so what if we wanted to write a plain text chest x-ray search engine? 1243 01:19:46,800 --> 01:19:50,790 So as I mentioned earlier, we have the ability 1244 01:19:50,790 --> 01:19:58,950 to query now our CheXpert data set where we don't have the radiology reports. 1245 01:19:58,950 --> 01:20:03,910 What we do is we feed in our query, which is in natural language. 1246 01:20:03,910 --> 01:20:07,740 We turn it into numbers, and then we ask the model 1247 01:20:07,740 --> 01:20:12,190 how similar is this to the image? 1248 01:20:12,190 --> 01:20:18,450 And when we do that, we can then rank them and say here's the top five. 1249 01:20:18,450 --> 01:20:24,300 So I have a query here of large bilateral pleural effusions. 1250 01:20:24,300 --> 01:20:27,795 And this is going to give me results that have some of those things. 1251 01:20:27,795 --> 01:20:30,330 1252 01:20:30,330 --> 01:20:44,680 And if we wanted to look for a, let's see, a normal chest x-ray, 1253 01:20:44,680 --> 01:20:45,775 we can do that as well. 1254 01:20:45,775 --> 01:20:52,960 1255 01:20:52,960 --> 01:20:56,380 So as you can see, these are a lot more normal. 1256 01:20:56,380 --> 01:21:00,300 So we just built a search engine for our chest x-rays. 1257 01:21:00,300 --> 01:21:03,060 You could imagine this would be useful if you were a radiologist 1258 01:21:03,060 --> 01:21:05,730 and you want them to look up an example of a case, 1259 01:21:05,730 --> 01:21:09,430 but we didn't have radiology reports for it. 1260 01:21:09,430 --> 01:21:12,100 So I am going to have to hurry through this last one. 1261 01:21:12,100 --> 01:21:15,810 So let's just get to it. 1262 01:21:15,810 --> 01:21:22,740 For my next trick, we're going to train a radiology report generator. 1263 01:21:22,740 --> 01:21:25,830 So this is not going to be perfect, but it's 1264 01:21:25,830 --> 01:21:28,680 just a pretty cool demonstration of what we have, 1265 01:21:28,680 --> 01:21:31,780 what the possibilities are here. 1266 01:21:31,780 --> 01:21:34,800 So the setup is actually quite similar to our clip model 1267 01:21:34,800 --> 01:21:42,240 where we have a text model over here on the side in addition 1268 01:21:42,240 --> 01:21:45,120 to our vision side of things. 1269 01:21:45,120 --> 01:21:50,470 In this case, it's going to be what we call an encoder decoder. 1270 01:21:50,470 --> 01:21:52,110 The encoder is actually not used. 1271 01:21:52,110 --> 01:21:57,760 The decoder is a BERT, which is a type of transformer, 1272 01:21:57,760 --> 01:22:03,420 which is used a lot in natural language processing as I mentioned. 1273 01:22:03,420 --> 01:22:08,760 So we're basically going to give it our image embedding. 1274 01:22:08,760 --> 01:22:12,810 We feed that into the text model, and then we 1275 01:22:12,810 --> 01:22:19,290 say give us some text for this, which is a crazy thing 1276 01:22:19,290 --> 01:22:25,810 to do to be honest to ask it to do something like that. 1277 01:22:25,810 --> 01:22:28,920 So you can see it's going to take a couple of minutes to train. 1278 01:22:28,920 --> 01:22:33,060 But thankfully, I did pre-fab the results I hope. 1279 01:22:33,060 --> 01:22:35,750 1280 01:22:35,750 --> 01:22:38,360 One interesting thing to note here, I had 1281 01:22:38,360 --> 01:22:40,320 to reduce the batch size considerably. 1282 01:22:40,320 --> 01:22:42,770 It no longer fits in my GPU even, which is 1283 01:22:42,770 --> 01:22:46,340 why it's also slower, some of the things that you just sort of wrestle 1284 01:22:46,340 --> 01:22:50,810 with as an ML practitioner. 1285 01:22:50,810 --> 01:23:07,010 And let's just say this is my final trained model I hope. 1286 01:23:07,010 --> 01:23:11,510 So now, I'm going to run through our validation data set. 1287 01:23:11,510 --> 01:23:14,840 And I'm going to ask it to give me the radiology reports. 1288 01:23:14,840 --> 01:23:19,860 And I'm going to compare that to what the actual ones are. 1289 01:23:19,860 --> 01:23:23,670 So to be honest, it's surprising even to me how well this works. 1290 01:23:23,670 --> 01:23:26,330 So here's a normal one, no acute intrathoracic process, 1291 01:23:26,330 --> 01:23:28,040 no evidence of pneumonia. 1292 01:23:28,040 --> 01:23:30,650 1293 01:23:30,650 --> 01:23:32,760 Here's an example where it's not working, 1294 01:23:32,760 --> 01:23:34,910 which is it's always good to start looking at. 1295 01:23:34,910 --> 01:23:38,840 So Dobhoff tube is one of a type of tube for I 1296 01:23:38,840 --> 01:23:41,400 think it's for feeding, a feeding tube. 1297 01:23:41,400 --> 01:23:44,810 And you can see here, instead of Dobhoff, off it says do something. 1298 01:23:44,810 --> 01:23:47,450 So it doesn't quite know what to do with that. 1299 01:23:47,450 --> 01:23:51,680 Same here, possible cavities, well, it says possible cat. 1300 01:23:51,680 --> 01:23:55,550 But it's getting there. 1301 01:23:55,550 --> 01:24:02,270 And it's pretty crazy that it can do this with just our embeddings. 1302 01:24:02,270 --> 01:24:06,590 And it gives you a sense of what you could do with yours. 1303 01:24:06,590 --> 01:24:15,920 So I do I want to quickly, before I wrap up what's so to read more about this 1304 01:24:15,920 --> 01:24:20,270 and get more involved, we did publish our paper in Radiology 1305 01:24:20,270 --> 01:24:23,480 it's called Simplified Transfer Learning for Chest Radiography 1306 01:24:23,480 --> 01:24:25,880 Models Using Less Data. 1307 01:24:25,880 --> 01:24:29,810 There's the free API, which I'll have links to where you can, 1308 01:24:29,810 --> 01:24:32,980 and we'll have a collab where you can use some of these free data sets. 1309 01:24:32,980 --> 01:24:36,500 MIMIC and CheXpert, you can also work with, which for Mimic, 1310 01:24:36,500 --> 01:24:38,270 you'll need to go through a training. 1311 01:24:38,270 --> 01:24:42,140 For CheXpert, you, I think you need to sign some agreements. 1312 01:24:42,140 --> 01:24:44,720 But they're also available to you. 1313 01:24:44,720 --> 01:24:47,150 Our goal really is to accelerate research. 1314 01:24:47,150 --> 01:24:49,490 We're always looking for collaborations if you 1315 01:24:49,490 --> 01:24:52,640 know people, if you are people that want to collaborate and build something 1316 01:24:52,640 --> 01:24:55,580 cool with this, please reach out. 1317 01:24:55,580 --> 01:24:56,390 What's next? 1318 01:24:56,390 --> 01:24:59,330 We want to do this for more modalities. 1319 01:24:59,330 --> 01:25:04,280 We want to do it for potentially ultrasound and some others 1320 01:25:04,280 --> 01:25:10,560 that I think would benefit a lot from this I also have some links here. 1321 01:25:10,560 --> 01:25:13,370 There's a software engineering internship. 1322 01:25:13,370 --> 01:25:16,370 Applications are open now through December 1 1323 01:25:16,370 --> 01:25:19,610 for undergrad, graduate, PhD students. 1324 01:25:19,610 --> 01:25:21,750 There's some connect with Google resources. 1325 01:25:21,750 --> 01:25:24,720 So this one references students. 1326 01:25:24,720 --> 01:25:28,308 And I think it's focused on the US, but there also is a link. 1327 01:25:28,308 --> 01:25:29,600 And maybe I can break them out. 1328 01:25:29,600 --> 01:25:33,330 There's links in here for around the world 1329 01:25:33,330 --> 01:25:35,960 as well as industry so a few others that are maybe 1330 01:25:35,960 --> 01:25:40,010 more specific to whatever group you identify with 1331 01:25:40,010 --> 01:25:44,720 and some other resources, Google Careers, OnAir. 1332 01:25:44,720 --> 01:25:47,630 And yeah, again, all of these will be linked for you. 1333 01:25:47,630 --> 01:25:51,560 So I want to pause there. 1334 01:25:51,560 --> 01:25:53,390 Sorry I went even longer than I expected, 1335 01:25:53,390 --> 01:25:58,400 but I do want to stick around for questions. 1336 01:25:58,400 --> 01:26:02,670 I'm able to say around four 20, 25 minutes 1337 01:26:02,670 --> 01:26:06,020 depending on how long people want to stay. 1338 01:26:06,020 --> 01:26:08,400 But again, thank you so much. 1339 01:26:08,400 --> 01:26:11,960 I really hope this has inspired you, gotten you excited about it, 1340 01:26:11,960 --> 01:26:15,350 also giving you the confidence that this is something you can do. 1341 01:26:15,350 --> 01:26:18,380 And you shouldn't be overwhelmed by it or feel 1342 01:26:18,380 --> 01:26:22,220 like it's beyond your reach you absolutely can do it. 1343 01:26:22,220 --> 01:26:26,612 And CS50 will give you the tools to do it, so. 1344 01:26:26,612 --> 01:26:29,080 DAVID MALAN: Thank you, Andrew so much for joining us, 1345 01:26:29,080 --> 01:26:32,410 and maybe a virtual round of applause with your favorite emoji reaction 1346 01:26:32,410 --> 01:26:35,740 or physical hands, so good to reunite. 1347 01:26:35,740 --> 01:26:38,320 If I could ask just one question on behalf of the group 1348 01:26:38,320 --> 01:26:40,300 and then turn things over to you and Carter 1349 01:26:40,300 --> 01:26:43,330 so that I can get to another meeting myself if you don't mind. 1350 01:26:43,330 --> 01:26:46,270 But we'd love for folks to stick around for more one-on-one Q&A 1351 01:26:46,270 --> 01:26:47,117 if you would like. 1352 01:26:47,117 --> 01:26:50,200 I think one question for the group that was recurring in the chat was just 1353 01:26:50,200 --> 01:26:52,703 how folks can learn more, not just about these resources 1354 01:26:52,703 --> 01:26:53,870 that you have on the screen. 1355 01:26:53,870 --> 01:26:57,220 But if they want to learn more about AI, like you did on your own, 1356 01:26:57,220 --> 01:27:00,580 if they want to learn more about AI in medicine, are there courses? 1357 01:27:00,580 --> 01:27:01,900 Are there universities? 1358 01:27:01,900 --> 01:27:05,268 Are there other learning resources you would recommend specifically? 1359 01:27:05,268 --> 01:27:07,060 ANDREW SELLERGREN: Yeah, I would definitely 1360 01:27:07,060 --> 01:27:09,250 as I said before, I would highly recommend 1361 01:27:09,250 --> 01:27:15,490 CS50's AI for artificial intelligence in Python. 1362 01:27:15,490 --> 01:27:17,590 The Coursera course is good, too. 1363 01:27:17,590 --> 01:27:19,420 I would be remiss to not mention that. 1364 01:27:19,420 --> 01:27:23,650 Machine learning was the original Coursera course. 1365 01:27:23,650 --> 01:27:24,955 O'Reilly has great books. 1366 01:27:24,955 --> 01:27:27,080 I always, when I'm trying to learned something new, 1367 01:27:27,080 --> 01:27:29,050 I always go for O'Reilly first. 1368 01:27:29,050 --> 01:27:33,670 Those are the ones with the funny images of animals on the front. 1369 01:27:33,670 --> 01:27:38,230 Those are usually a good, and honestly YouTube. 1370 01:27:38,230 --> 01:27:41,320 There's just so much content these days of people walking you 1371 01:27:41,320 --> 01:27:45,500 through things, getting you bootstrapped and those things. 1372 01:27:45,500 --> 01:27:49,330 But maybe I can come up with a list of resources, too, 1373 01:27:49,330 --> 01:27:53,080 and we can post that with the rest of these slides and so on, David. 1374 01:27:53,080 --> 01:27:54,865 That would probably be good. 1375 01:27:54,865 --> 01:27:55,870 DAVID MALAN: Absolutely. 1376 01:27:55,870 --> 01:27:57,578 I've been sharing the slides in the chat, 1377 01:27:57,578 --> 01:28:01,810 and we will post everything at cs50.ly/zoom as well for folks 1378 01:28:01,810 --> 01:28:03,660 afterward. 1379 01:28:03,660 --> 01:28:07,000