SPEAKER 1: This is a seminar on two of my favorite buzz terms-- machine learning, computer vision. You throw those out there in like a party conversation and besides people saying nerd, they'll also think that it's pretty bad ass. If they're computer scientists, they might question how much you actually know. So machine learning and computer vision, I've kind of put up a cutesy little slide on what they are. I usually will refer to them by their full name. But if I ever say ML I mean Machine Learning. If I ever say CV I mean Computer Vision. Pretty easy. And these are kind of two of the myths that people have about machine learning. They're usually on one of two extremes. Either it's impossible and only for the super technical geeky wizards that can sit at their computers for hours on end. Or they're not that interesting because we're better at doing what they do. And if you fall in any of the betweens of those two extremes, that's cool too. But basically machine learning and computer vision, the context for why we would need them is we want to be able to solve problems that people solve all day, every day but programmatically. So I won't be able to take a computer program, have it look at something and say, that's a cat, that's a dog, that's a person, that's a couch and not accidentally look at a person and say, oh, that's a car. Where's there's license plate. And that's maybe a real world example that you could imagine existing is traffic recognition. If you're just trying to figure out what something is, computer vision is super important. Because I want to be able to tell, is the thing that just ran my red light a brown bear or a car. And that's important. And so that's something that is really what we're kind of after in general here. But we're going to approach a very specific problem just to give us some context. And as far as context for me and my kind of background in these two things is I had never coded before a year ago, a year ago being 2016 so in case this is in the future. And basically I wanted to, for my CS50 final project, do something really cool. I wanted to do it on my own. And I wanted it to be something that was accessible. And so a lot of people were like, oh, machine learning, computer vision are possibly, they break like two of those three criteria. Like, you can't do that on your own and it's not accessible. Those things are impossible to do. Why would you ever approach that? And yes that is maybe true from like a theoretical side. I didn't sit down and teach myself all of the math behind machine learning. I didn't sit down and teach myself how to do contouring using computer vision. I sat down and played around with some YouTube videos. And that's basically the point of the seminar, is to teach or show or prove that these two concepts, while they are buzzwords and they are super cool and there's whole fields around each of them, they're accessible to you, the CS50 student that just started CS, had never seen what a line of code and like what it meant before. That's who this is for. And so if you are some sort of CS guru that does know all sorts of things about machine learning, computer vision, little confused why you're watching but also really appreciative. And hopefully you'll find it at least entertaining and maybe a little been informative. And so basically, my story was I came in, did a final project using these two packages or pieces of software, Keras which is the machine learning part and Open CV 2 which is the computer vision part and built an algorithm or a piece of software that would allow me to do a very specific task. I wanted to convert images of sheet music into machine readable versions of music. Software that doesn't exist right now. And I found out why. Because it's really freaking hard. But it was a cool process anyway. And I thought that it was a worthwhile endeavor. And I would encourage everyone to try it if they want to. So just kind of a real world example, we have kind of four people and this will be to kind of ease us into the idea of pattern recognition. That's what we're after with machine learning anyway. And so I have for stock images. They came with the template of the PowerPoint slide. I just changed their descriptions. And basically I'm going to ask you to kind of think in your head about what patterns you can find between these four people. And as a human being, you can find-- I'm hesitant to say hundreds. I don't know if there's even really enough data for hundreds, but maybe. Right? A very large number of patterns. So now I'm going to restrict that in the sense that machines, they have a limited amount of data that they can pick up. So do we. But we pick up a lot more kind of instantaneously than a machine really is going to, especially a machine that we're working with, kind of on the softer or smaller software side. So you get these three categories. And that's basically how many eyes do you have, are you human, and do you have a ponytail. I was short on time. And so, from these three categories, if I were to ask you a question you should be able to identify which group of people or I were to point at someone and say based on these categories, what category do they belong in, you should be able to do that. And human beings, we're pretty good at that. So if I say is this a person, you'd say with pretty much 100% confidence, yes that is. Same here. Same with that one. Same with number four. It's hard to say, is this a girl. Girl isn't really one of the categories. And looking at the data, well maybe you have some extraneous data that says people with ponytails are probably girls, which is a little sexist. But like, you know, we're going to kind of ignore that assumption and just say ponytails are probably girls. Well, you get this one with we'll say like 95% confidence. But all the rest of these are like, well, no ponytail. So I'm 100% sure they're not girls. Slight problem. That one is. And so this is a very contrived but kind of interesting example where if I limit the amount of data that you're able to look at, you are now restricted in what patterns you can find. And that's a very intuitive thing. That's something that should be almost an instantaneous revelation like a shower thought if you will. But that has kind of some severe manifestations when we're trying to apply machine learning and computer vision. So one of the assumptions that people make is that we can accumulate as much data as we need. However, I'm not Google. You're also not Google. You're not Amazon or Microsoft. We're not one of these big companies that have access to hedabytes of data. And so a lot of people then are like, well, then machine learning is not for me. When I work at Google I will do it then. And they move on. But you don't need hedabytes or even gigabytes necessarily of data to get machine learning to work. If I were to give you a simple pattern, we'll talk about the logic gate and. It takes two inputs, either 0 or 1. And it returns one output, either 0 or 1. And if I say and of 1 and 1, you return 1. If I say and of any other input, you give me 0. It's a very small amount of data. In fact, I can represent the inputs as single bits. And I can represent and in, we'll say, less than 10 bits. Now you have learned an entire pattern, a complete pattern, if you will. And it took you less than even a kilobyte. So patterns don't require lots of data. Complicated patterns maybe require more data, as you'll see. And it's kind of intuitive. But we can kind of get past this problem without having to just collect more data. We don't need to sit down for hours and hours labeling things and manually going, all right, that's a cat. That goes here. That's a dog. It goes here. That's a triangle. Why is a triangle in here? You don't need to actually sit there doing that. There's all these other techniques that exist. And I think that they're not as well publicized. And if you had known about these sort of things, maybe you wouldn't have turned away from machine learning in the first place. So one of the first ones that I have listed there, it's called data augmentation. Another one of those things that sounds like a buzzword. You throw it out there and you're just like, I augmented some data. And then you just kind of move on and hope that nobody actually asks what that means. But all it really is just taking the data that you have and kind of creating new data or making sure that patterns are preserved but changing what it looks like. And so what I mean by that is if you take a picture of someone's face and you stretch it out, to a point, you can still recognize them. But to a machine, each pixel that you stretched it out is more data. Because I've now taken this picture and I've stretched it out. That's the same person but these are two different images. If I were to compare the two images, they are not just by bit by bit comparison the same. And that's important. So that's one of the techniques, is stretching an image. In the example code or distribution code that you'll get access to later, you'll see that there's an entire configuration file dedicated to how do you augment your data. What if they rotated something. Is a triangle still a triangle if I turn it sideways? Yeah. And same thing with a face. If I have your face and I turn it upside down, well, people we'll turn ourselves upside down and try to look at your face. But the same thing is applicable to machines and how they learn. If I can take a piece of data, even a small amount of data, I can amplify it according to all of these different ways of shifting it. What if color doesn't matter. What if I was using a emojis and they could be in any color. You could still recognize the emoji. It's still a smiley face emoji, even if it's yellow or black or blue or pink. But in the particular case that I've just given, there's a slight problem. Some emojis do use color to convey meaning. The angry emoji is red. If you had an angry emoji that was like a bright pink, it might look a little different. We might get a different message. So it's important that when you're augmenting your data, you keep in mind what patterns can change and which ones can't, what information are you actually after. And that comes into the next point of clever data gathering, which is basically what you're doing when you're augmenting your data. You're just not going outside to get more. So if I was collecting data, I was just picking up images off the internet which is often what I'll do if I'm trying to build a machine learning model, I have to make sure that I collect maybe not more but the right kinds. So if I gathered 30 pictures of the same cat and then I said, all right, machine, tell me if this is a cat or a dog. Not very good data gathering. I've just picked up the same thing. And even a human would be like, well, if I only had that information. I would just say it's not that cat. But that's basically the idea here. If you were teaching a toddler something or if you were teaching a small kid or even if you were teaching a full on college student a complicated concept, you have to give them enough of the pattern that they can get it right every time. They can extrapolate from the pattern that they're given. So if I were to give you a number sequence-- 1, 1, 2-- then some people might go, oh, that's the Fibonacci sequence. Well, no. It's 1, 1, 2, 1, 1, 2. And so that might be some example in a very kind of contrived way of a pattern that I didn't give you enough information. And this sort of thing, where you're using an image, that's a lot of patterns. It could be eye color. It could be hair color. What if they're not people? It could be shape. It could be the angle at which things intersect. There's a lot of information there. And we pick it up almost instantly. You look at a single picture and you're like, I could name 400 patterns from here. If you're not thinking, oh, I know exactly 400 patterns but you could start enumerating, oh, well, there's this attribute. There's this attribute, and this one. So you have to make sure that your machine has enough data and the right type of data that it can actually pick up stuff. And a good benchmark for that is if you kind of narrow your mind-- and this is one of the few cases where I'll say, just be narrow minded-- and only look at something in the context of what you're given, if you can still figure it out, there's probably a way for the machine to do it too. And if you can't, the machine probably can't do it either. And so automated data gathering is one of the next solutions to this I don't have enough data problems. Because basically the original, the brute force, solution is to take a bunch of pictures and label them. And we're talking about specifically like image classification here. There are other kinds of machine learning but I'm kind of gearing this more towards image classification because it is a little bit easier to understand and intuit. So if I was manually labeling images. And I've done this before. It's awful. If you can avoid it, don't do it. But you can sit there and you can say, well, this is an A, this is a B, this is a C. But you need enough data to make sure that your patterns are complete. So you might be doing that for seven, eight hours. It's horrible. Find some good music. It'll make it a little bit easier to do. But you can automate that process. If I could generate all of the-- let's say we're doing some sort of letter recognition-- if I could generate in 100 different fonts all of the 26 letters, we'll say, maybe 52 if you do lower case and uppercase. Then all my data gathering is pretty easy. Click a button, it's done. So if there's a way for you to automate your data collection, do it. And now that you've automated it, you might as well generate as much data as you want. And you might find that there is some time restrictions because as your machine learns on more and more data, it might take more and more time. So now you have a balancing act that you have to perform. Do I want to generate more data and get a better machine learning modeled? On want to generate less data and have it be done faster? And is there a point where giving it more data doesn't make it any better but it does make it slower? Because at that point you should stop. If it's not getting any better, than maybe you need to change your model. And that's the last point there, is we need sometimes beefier models and sometimes more clever models. And those are somewhat interchangeable and sometimes not. Just because you have a bigger heftier model, that's what I kind of mean by beefier. Doesn't mean it's better at learning things. People that are just bigger don't learn things faster inherently. But if I have a model that's a little bit more clever about how it learns something, it is faster able to pick up on a pattern. It's probably going to do a little bit better depending on the circumstance. So one of our one of my favorite myths about machine learning that I'm also blocking part of is that it takes a long time. If I want to take a model and get it to work, I have to train it for hours and hours and hours. And that's not true. That's people. People take a long time. You train people for hours and hours. One of the benefits to machine learning is that you don't have to train them that long. A lot of times, I mean, that's with the caveat that if you're training on an enormous data set, you're training a particularly complicated model, it might take a long time. But given that we're doing some sort of CS50 final project, this is not a problem for you. This is not something that pushes this project out of the reaches of your grasp. It's one of those things that is actually just kind of a myth, that machine learning takes too long. And a kind of parallel myth is that computer vision perfectly captures all data in an image. Right And maybe the way that these are parallel is not immediately apparent. But the same idea is present here. This concept that computer vision is a perfect representation of whatever data it sees pushes it outside of our grasp. Because if it is a perfect representation and we can't learn something from that perfect representation, then it's not doable. Might as well throw up our hands and give up. But that's not true. Computer vision is a little subjective. I can choose how my machine sees. I can choose how well it picks up on patterns, how well it distinguishes between what is the foreground and the background. All of those things come into play when you're trying to pick up data. And so we use a very simple example of computer vision, more just to give you a taste in the distribution code of how to interface with open CV2. But it does exist. And it is something that I think is particularly important for image classifiers. But when we're choosing software to do all of these things, to do machine learning, to do computer vision, even just to program in general, it becomes very important to us to kind of see the trade offs between different pieces of software. So in this project and in general, I go to Keras and Open CV. However underneath the hood, Keras uses TensorFlow. Or at least, I have it use TensorFlow. You could also have it use Theano. I don't use Theano. It's a little bit mathier. It was a little bit above my intellectual level. But TensorFlow I thought was pretty accessible and I like it as a company and just kind of in general. I swear I don't work for them. They're just the really cool. And so these two things actually have the same benefits, at least from my point of view. I was a college student that was just learning all of these things, just learning computer science in general. And so I was like, well, you know, what projects exist. I asked my TF. And he was like, oh, go look up like OpenCV, see what that does. And I asked him, like what can I do to a machine learn something. I want to do AI. And he was like, oh AI sounds a little scary. But machine learning is also scary. Well, pick one. And I was like OK, we'll do machine learning. And you'll notice they're not super different. But what I mean by high level interface, I think open source, it's open to the public, I didn't have to pay for it, [INAUDIBLE] college student. But providing a high level interface is something that I've kind of done here as well, is the product, the distribution code that I'll have at the end, it's a high level interface on a high level interface. It makes it accessible in that you just have to say, build model. And it takes care of everything underneath the hood. It just builds the model, whatever that means. And if you want to go look underneath the hood, which I advise that you do if you're building this as a project, you can then see what's going on. But if you don't and you just want it to work, that's what this does. I don't have to sit down and say, oh, crap, I carried a 1 wrong in my math here so my machine now says everything is a square. Like that would be kind of annoying. And maybe tracing between those two things is very difficult, especially if you're using an algorithm that maybe does something that you don't fully understand. You're just sitting here like, that's a lot of math. I think that's a sigma. And there's another letter here. And I don't know why. And then you go look it up on Wikipedia, which I've done, and there's just names and there's no actual numbers in the math anymore. It's all symbols. And it becomes very difficult to read. And from there it becomes inaccessible. And then maybe you give up. Or you just get frustrated and you go eat a piece of cake. That's what I did. And so it's very frustrating. But the main reason that I chose these two pieces of software was that they were usable. I could figure out how to use them. Somewhat ironically, figuring out how to get them downloaded and working was very difficult. I spent around 20 hours doing that. And admittedly, that was because I didn't really know how to read documentation at the time. I also didn't know how to read like through code on like an GitHub page or anything. But I know that I'm not the only one. In fact I have about 728 students that are in roughly the same place right now in that class called CS50. And so I think that having something gathered into one place with an easy way of installing things is a much easier introduction to that. So that's what this distribution code is also. In case you're looking for the distribution curve and you don't want to listen to the rest of my talk, it's of towards the end of the lecture. It'll be there after I get there. But if you do, hang around. So basically, I took all of the packages, there's a lot of them that needed to be installed to get OpenCV work. That one's the really annoying one. Keras is fine. You just do like pip3 install Keras, it works fine. OpenCV is awful. Nothing against OpenCV. I very much appreciate that the project exists. I have a super psyched that I get to use it. It's just a pain in the ass to install. And or at least it was when I was young and naive. And it was very hard for me to sit there reading through documentation and not knowing what they meant by certain terms. What does it mean to use a virtual environments to install things? Why is that necessary? If I don't do that did it break my download? Why doesn't my download work? There's all of these terms and things that get thrown around because they're taken for granted. I know that that is very scary. And at the very least, it's incredibly frustrating. And so what I ended up doing was I said, OK, I'm going to just try all of the solutions and whichever one works, works. So my computer at one point had like 40 different versions of OpenCV on it. Every programmer I think has had that sort of experience where they just downloaded hundreds of things. I built from source at one point. And I was like cool. I don't know what this means but I did it. And that worked well. That actually was the one that I ended up using for my final project. I would not recommend doing that unless you know what you're doing. I screwed it up horribly. And I didn't even realize that I just was missing half of OpenCV. I didn't need it apparently. But like bad, bad deal. So I've finally gotten to the point where we have some code. If you want to there's the bitly links. These are the actual slides in case you want them. They have these links on them so you can go to the slides and then click the link. The GitHub link is my personal GitHub I didn't realize we were supposed to use our actual names when we created GitHubs for school. My name is not powerhouse of the cell. It's actually Nick. But I got to keep it. My TF was fine with it. So we kept it. And this is the bitly version, slightly shorter. So I'll leave those up there while I kind of talk a little bit. And eventually I will pull up some code and we'll get to coding things, or at least giving demonstrations of what the code does. I've found that when you're actually coding things up in front of people, you make about 400 more typos per second. It's really just not a good deal. So I don't like coding particularly much in front of other people. But what you'll find on that GitHub is basically a lot of very cheesy read-me files. I included as many emojis as I thought were necessary. I don't generally use emojis. I also don't use them often when I write things on my GitHub. That's the only one that has them. But GitHub has a nice interface for including emojis. And the reason for that is the problem that I wanted to solve with this sort of algorithm with this machine learning, computer vision for this seminar was classifying emojis. And that's a very broad problem. And I didn't solve all of it. I actually didn't even really solve it. But I started us on that path. And so I said, OK, well I want to do something with the emoji because that's kind of hip, kind of cool, also kind of dorky. And that kind of fits me, the latter one, not the first two. And so what I ended up doing was I said, you know what, we're not going to classify the like hundreds of emojis that exist. We're going to just take like 15 happy looking ones, 15 kind of neutral-ish ones-- I think I actually get 13 of those-- and then 15 kind of angry or negative looking ones. We're going to call those three groups classes. We're going to say, there's something positive, there's neutral, and there's negative. And I want the machine to be able to tell me is an arbitrary emoji that I'm looking at positive, neutral or negative. And that seems kind of trivial. Human beings do that all the time. And we're very subjective about it too. We're kind of like, ooh, that person she's looking to me just like angrily. He's got just like an aggressive face on him. He's just chilling there sipping his tea. You messed up her shoe. Something like that can be very difficult for us to perceive. And even in that example, that's totally subjective. What I just said is basically up to whoever is viewing it. And that is where this becomes a difficult problem, is how do you provide enough data to get this to work. So I actually did a couple of disservices to you, the user of this code. One, the machine that you are provided with, well, it does work and will train and learn , it doesn't it very well. By the end it's basically randomly guessing. It'll say, it's about, I don't know, 33% chance of this, 33% chance of that, and 33% chance of that, which you'll notice there are three categories, 100% across all three categories, about 33%. So the machine doesn't do a very good job of figuring it out. And sometimes which I found when I was testing which one I was going to demo, sometimes the machine that I provided you actually just gets it completely wrong, but it's like super sure of that. I give it a very happy looking emoji and it was like, I'm 100% sure that is negative. Negative look emoji right there. And so that's kind of one of the funny things. I think that a lot of times when you're doing machine learning, you feel like you're training a toddler, particularly annoying toddler, that is not a danger really to itself or anything around it, but it particularly hates you. In fact, it wants to make sure that you never get whatever assignment you're trying to do done. That's how I've kind of learned machine learning works, especially when I was working on this being like, oh, crap I have a seminar to teach. This was not working. It just refused to do what I wanted it to. And that was very frustrating. But something that shares a lot of parallels with this toddler analogy. If you were to take a toddler and every time that they just like didn't do what you wanted them to do, just be all right, well, getting a new one, and just like go get a new toddler, that'd be weird in a number of ways. It's also kind of weird with this too. It's a little less extreme. You can trade out machines, no problem. But you'll find that the machine that I have handed you actually does have a couple of things that can be modified within it to make it a lot better. And it doesn't mean that you just copy and paste the same layers over and over again and make your machine much longer, but rather you can make a little bit more clever. So basically, when we were training the machines-- or when I was training-- I say we. I mean I. When I was like lonely in my room training the machine models, I was saying, all right, I've got to get this to work somehow. I don't know what I'm going to do. And if you'll recall, I said I had about 15 images of positive and neutral and sad. So I had 45 images total, roughly. That's kind of a very small amount of data. So I actually used some techniques in the code-- and I'll try and point them out when I go over there-- that allow me to augment my data. We do that first step. We augmented our data. What I didn't do was add very good data. My data collection as partially a product of my laziness but now retrospectively, a product of I wanted to teach, it was a teaching moment, is the data was collected kind of arbitrarily without much thought as to what patterns were being picked. I kind of ignored my own sort of second rule, if you will. And yes, that is mostly because I was just being lazy. I just had to pick a bunch of data and like throw it in there and hope it worked. And it doesn't work that well. That strategy won't help you very much. I didn't think about it that hard. But you can get there by thinking about it just minimally. If you have a smiley face emoji, for example, since we're talking about this in this case, it's not too hard to find enough different smiley faces to cover the general case. It's mostly that smiley little half circle on the bottom of their face. And so covering as much data as you can while still keeping it small is not that difficult. You just have to be a little bit smarter about it than I was. And I believe the data is included in that GitHub page. So you'll see my crappy collected data there as well. I really hope it's not a copyright problem. We'll find out if someone shows up to arrest me. So we're going to actually switch over to looking at the actual GitHub page. And this is where the cheesiness comes in. I called it machine feeling. I was feeling a little dorky that day. I'm feeling a little dorky every day. And then I included as many emojis as I could because I was like, oh, crap. Like you can include emojis in markdown. That's cool. So I through those in there. And you can read this on your own if you want to. Maybe you don't. I don't blame me if you don't. There is this requirement .txt right here. And so that allows you to basically just immediately install all the requirements for this entire project. That's it. No 20 hours of searching through Google or anything. Just that. And then we have our source folder. And originally I was going to provide some skeleton code to kind of complement the actual code. I decided against that because I ran out of time. And I also thought it was a little mean. So instead we have a fully working sample code. And I've separated things out a little bit just to make it easier to comprehend what's going on. And so right up here, up at the top, you basically have the computer vision folder. There's only one thing in it. It is just the thing that provides computer vision basically properties to our code. And then you have the data. So we'll a short look into here. It's not very large. There is testing data and there is training data. It's pretty segmented out. But you can see that it's basically just a bunch of .pngs. This one happens to be pretty sad. And they're are also all cropped so that they look the same height and width. They're all 200 by 200 pixels. You don't have to do that, although I would recommend it as just one of the things that you can normalize across. Because what is the size the data is different? There are techniques for dealing with that. You can just shrink it to the right aspect ratio. You can do kind of a variety of things. But if you can, you want to keep your data pretty consistent across things that don't matter. So like whether this is this size or this size, it's still a sad face. It's still negative. So the classification doesn't depend really on what size it is. So I wanted to keep all of my data the same size so that in the event that the machine was like, oh, images that are 201 pixels, those are sad. Images that are 400 pixels, they're happy. That be really unfortunate because that's not even close to the actual pattern that we're going after. And you could understand that even complicated examples, you'll want to be able to be aware of which patterns matter and which ones don't, and which ones are you actually introducing into your models. And that might sound kind of complicated. But the example that I just gave there, not too difficult. Making sure that the size of the image doesn't actually play into the machine learning kind of what it actually learns. That was very intuitive. And most of these are like that. They're pretty intuitive. This sort of project, even though it sounds kind of complicated, isn't too bad. It's not particularly complex if you make it analogous to like human beings or toddlers. You can think of it as like your young niece, nephew, daughter, if you have a child, brother, sibling, smaller people, little human beings, and how you teach them. And if you can teach a child that, you can probably teach the machine with some caveats. So in the rest of this folder, we have the ML. It was right around here. It's just that, just the machine learning portion. There's a file in there that does all the machine learning parts. I built a class for us to just have a very high level model. But it also gives you a low enough level that you can tinker with the model itself. So depending on what you want. And we have a configure file which has a bunch of variables inside of it. We might take a look inside in a little bit. And then we have our actual run file which allows us to execute the entire piece of software. And then I have a test.png which is just a test image that I was using earlier. I left it in the it's kind of cute. So that's all of the code on the GitHub. You're welcome to clone it, download it, make pull requests, preferably don't sell it for a profit. If you do, that's really cool. I'm just proud that it worked. Any of those things is awesome. But we're actually going to show just a little bit of code over here. So I'm already within the directory of the actual code. So if you were to have GitCloned this, you'd have ended up somewhere around here. So this is the actual kind of root directory. I know this is kind of a boring terminal screen but it'll get more interesting shortly. And so here you can see there's just a bunch of files that tell us what license, MIT, what the read-me says, and like there's some caching here, source files and requirements. We're going to go into SRC because that's source. And we're actually going to go specifically into the sample directory. And we're back where we started. So if I wanted to run this, I can basically say, /run.py and-- oh, yep. Like I said, typing in front of people, you make so many more mistakes. And this will bring up kind of our help screen which is meant to be as non-obscure as possible. However, I am no expert coder so it might be a little obscure. It's intended to be pretty easy to use though. So there's -o for an output file. And this is all just kind of software stuff. Not particularly interesting. If you're interested afterward, please do go ahead and let me know. And I'll be happy to talk with you about it. But if you wanted to just run this, then we can say, OK, well, I want my output file to be seminar. And I want it to go through, we're going to say one round of training, unless you guys want to sit here for the next 45 minutes. And I don't want it to load another model, one that already exists. I want it to just kind of do its own thing. And that's all I really need to do. From the command line, that will train it. And I say that and this'll be the one time that it breaks which is absolutely fantastic. But it tells you that it's going to use the TensorFlow back end. It found all of the data that I had handed it. And now begins training. And so the reason I bring this up is because I think it's kind of-- you're not quite sure what each of these things mean. And what's kind of funny is you can customize each of these metrics anyway. So basically, if you're looking at this screen, you see their kind of cool little animation if you will. But this is really just telling you how many steps through the training round it's gotten. [? Epoch or ?] Epic is going to usually be the actual training round that it's on. And so within a training round, your machine is basically saying, all right, I'm given some amount of data that you specify. And I've got to figure out what the hell this means. I've got to classify it. And what it does is it sits there and it says, hm, that looks like a cat. That's a bird. That's a dog. And, or in this case, that's positive, that's negative, that's neutral. And it throws out those answers. And it encodes them somehow-- 0, 1, 2, totally reasonable. And what it does is it says, OK, here are my answers to all of the data that I've been handed. And then it looks at the answers. And it says, oh, crap, I missed this one, this one, and this one. So I've got to do some magic. I'm going to re-weight some of my numbers. I'm going to do some hardcore math stuff. And then I'm going to try again. And that's the new training round. Now you'll notice that kind of towards the right side of each bar or each row, there's this val loss and vowel ACC. And they correspond to loss and ACC over here. Loss being, well, loss which is a metric used in the actual algorithm or the math underneath. And ACC being accuracy or the accuracy of the model given that it is categorically trying to tell what sort of image it is with multiple categories. And you can specify all this within the file that builds this model. But for now we're going to just kind of take it as is. You don't need to use that accuracy metric, for example. The val versions of each of those are the validation versions. They're the ones that say, all right, here's one that you've never seen before. How do you do on that? And for that one, it doesn't readjust its weights, That just evaluates it a little bit. It just checks that you're not overfitting. So that was the first term that was thrown at me when I was starting to learn this is what does it mean to overfit your data. And it's kind of intuitive. You're doing fitting too much. And if you think of this training process as fitting, then you're just training it too much. And you can think of this as like with a toddler if you give it too much of a limited pattern, is maybe you tell it, the machine, that everything is kind of so and so. You give it all the data pieces that it gets. And it just memorizes the data, but not the actual patterns. People do this all the time. You give them a bunch of chemistry facts, they memorize those facts. If you ask them an extrapolation on those facts, they have no idea. That's a very common problem, especially in public school systems for example. Small, political jab. But that is something that happens with machines too. If they just end up memorizing their data, yes they get it right, at least on this kind of loss accuracy metric, but they won't get it right on the validation accuracy because that is stuff they've never seen before. They couldn't have memorized it. It's basically the same idea behind test taking. I give you some data. I expect you to learn the patterns, not the actual data. And then I give you a test that has data you've never seen before but has the same patterns, you should be able to figure it out. And so you can see or unfortunately it got kind of locked over to the next one, but we're going to say that these 0.3s threes are roughly about the same. You'll notice they are, it's like 0.32, 0.35, 0.38, 0.38. And you'll notice they're roughly guessing. The machine is basically saying, hey, if I say that this one is neutral, this one is negative, this one's positive-- neutral, negative, positive-- I get it roughly right which is not so good. And it's because the model that I handed you is, well, not particularly intelligent. Also the data is not very well collected either. And you'll notice that even the accuracy-- it didn't really get the chance to memorize everything, thank god-- is still pretty low. It's roughly guessing here too. So that's not too good. And it asked me, do I want to save the model. And you'll notice that one my points about the myths of machine learning, that didn't take that long. We were talking here for a couple of minutes and it's done. It's now trained. So now we have kind of a computer vision part of this, which is somewhat annoying because I did hack it together. But that's OK. So we have our kind of live feed of the screen as it's going right now. And this allows us to take pictures of things. So there is an actual screen shot software that you could use that is easier than this, but it was a easy way for me to introduce the ideas of computer vision specifically. So what I can do is I can say, all right, let's pull up an emoji. Because I want to take a picture of that emoji and I want my machine to tell me what that emoji is. Is it positive, negative, or neutral? And I can say smiley emoji into Google. And the reason I do this live is to prove to you that I didn't just hardcoded into the machine. I'm willing to bet it will not get it correct. But if it does, kudos. So we have this emoji. And you'll see it pops up in our feed right over here. It actually pops up I think an infinite number of times if you were to like look close enough. But the reason I have it pop up in the feed is that I can drag over the actual feed and have it select that picture. And so it takes that picture. Oh, I'm so psyched that I got that one right. That's lit. So you'll notice that in the actual terminal output, it gave me probabilities that the thing was correct in being positive or negative or neutral. I'm really just psyched that it got it right with really high probability. If you're above like 70% probability and you have a good enough number of labels, you should be pretty psyched. The fact that this got away with 94% likelihood, it was probably just guessing. It's like the toddler or this small child that's like, it's that one. And they happen to be correct. And they're like, yes, I'm so smart. Like this machine, it's really not that great. But to its credit and to the seminar's credit is we have a dumb machine that I've handled very little data and I've trained for a total of like four minutes in front of all of us. And it got it right. It was able to figure stuff out. It figured out a pattern. And it said that it also could have been a little bit negative. And you'll notice that there are some attributes that are shared among smiley faces and frowny faces. They both have eyes and in particular emojis are pretty standard. They have that same rough shape. They're the same color. And they do kind of have the same width of smile or frown even though it's in different orientations. So the machine didn't do a terrible job. And that's kind of nuts. And hopefully that proves at least in a small way that it is accessible and it is easy to do. You might have to sit there and tinker with code. But if you're not sitting there tinkering with code are you really coding? If you're not sitting there debugging things, why are you here? So the debugging is a good amount of coding. And this, just like any other piece of code, can be done and debugged even by people that just started programming. You can do this at this level in a couple of hours, maybe a couple of days. You might have to research a little bit and say, oh, crap, what does it mean to overfit. What did that guy say, the crazy dude that talked about cats and triangles for a while? Well, that's OK. That's how this works. But it's not any more difficult than if I said, oh, go use an API and retrieve some information via PUT request for me. Just as complicated sounding. But it's all the same idea. You have to just sit down and learn it for a little bit. And in this case, you have a pretty decent example. That was-- I'm gonig to stress that that was kind of luck that that worked. I'm very proud of it. But still kind of ridiculous. So let's say that we wanted to improve on a model that already exists. So there are smarter people than me that have written lots and lots of machine learning algorithms. There are, I would argue, more intelligent people than I am that have done this than less. So I actually included one of those pre-trained models because I figured it be kind of cool to demo. And so in the code that you have, you actually have the ability to pull up a pre-trained model. It's called Inception V3. I think it's pretty bad ass that they call that. A lot of the other ones are like VGG16 and stuff like that. But this one is called Inception V3. I like the sound of that name. And so you can run this program with that flag, the pre-trained flag. It still pulls up the TensorFlow back end because it is a TensorFlow model. TensorFlow being the underlying machine learning software of Keras, or at least the way that I designed it. It still loads the data even though it doesn't have to in this case because we're going to look at a different piece of data. I don't really want to save the model. It's a little big. But it's going to bring up that same feed so that I can take a picture of my screen. And basically, what we're going to do, is we're going to pull up a picture of a cat, particularly this cat. I really like this cat. It's kind of cute. So this is an Egyptian cat. And what I want to do is I'm going to take my mouse, I'm going to click, drag it over. I'm going to take a picture of that cat. And what I can do is then I can say, all right, let's take a look at what my machine said it was. And if you'll read carefully, this one returns five labels. There are actually 1,000 labels it has access to. It's not just these five. I just picked the top five. And you'll notice that while the bottom one is a Windows screen, which is not wrong, that isn't the most accurate one, not even close. Because these are percentages, not fractions. The closest one by far was by 94% or roughly the same accuracy that my other model had. And it's an Egyptian cat. And so that's one of the powerful parts of machine learning is this model, that was even faster than the previous one. And it got just an arbitrary picture of a cat that I picked off the internet correct with 94% accuracy. That's nuts. I just took a random picture and then picked it and it works. And that's really the point that I want to stress here is in a couple of minutes, admittedly I've had the advantage of prepping this for a little while, you can sit here and build an algorithm that identifies things pretty accurately. And so if you wanted to build a facial recognition software algorithm, it's this. It's the same idea. You just change the data. And you change your model a little bit. Make it a little smarter. Make it better suited for specifically faces. But that's it. That's really only the big difference here. This idea, these buzz words, machine learning, computer vision, they're just as accessible to you and me as beginners in computer science as they are to someone who has done a bunch of years of computer science and is maybe a computer wizard. Maybe they can do cooler stuff with it. They can put all sorts of APIs and other acronyms and scary sounding words behind it. It's the same thing underneath. It's all just working as a machine should work, deterministically and hopefully the way you want it. So we're all a little bit at the code because I think that that is a worthwhile endeavor. This is also like the worst possible way you can check what director you're in, but I'm talking at the same time so I feel like it's justified. I use Visual Studio Code. And I really hope that I don't have anything like ridiculous open. We're going to just expand it a little bit, make it easier to read. So we have code. And this is usually the part where people are like, all right, now I'm out. We got there, we're done. And if you're not, awesome. I would have thought that the math would have scared you away. And since I've shown that there's no math, I'm hoping that you're still here. So we're sitting here looking at a pretty random file, but this is actually the ML model file. So this is a file that tells you or actually that codes in all of the attributes of the actual model. This is the class that has the Save method of the model. It has the part that builds it or predicts on data. It has all of the things that you could maybe need to get what we just showed you in the example. So we're looking at here and what I want to kind of draw our attention to is right around here. Looking at this part. This is pretty much the bulk of the model that you just saw. That's it. If you don't count the empty lines, it's just five lines of code, and my mouse highlighting everything. So it's pretty simple. It's pretty straightforward. Now, a lot of these terms are maybe a little bit more confusing. Max pooling with drop out and then you flatten it like a pancake and then you do a dense of something, god knows what. You activate that but there's a pool size here, there's a random number there. I think it's magic. I don't know why. And then it gets very complicated very quickly. But again, like in CS50 and like an any problem, really just break it into smaller and smaller pieces. Let's start with maybe the easiest piece of code here-- flatten. It has no arguments. So all we had to do was add flatten. And maybe even easier is why are we adding things to the model. How does this model work? You can think of it as like a stack of layers. And you take the input and depending on how your stack is oriented, not the data structure stacked like a literal physical stack, you're either dropping inputs in or you're putting them up but either way it's going through the stack. And then that first layer takes in that input and it says, all right, we're going to do some magic with that and drops into the next layer. And then that does the same thing. So like it's the last one which is located here. And that last one says, I know what it is. It's a triangle. And it throws out that number to you. And that's all this really does. It's just a bunch of math that takes in data points and does stuff with them. So that's why we're adding them. And the order in which we add them changes the order of the stack. And that's not too bad. But then we have these weird words, like max pooling, drop out, flatten and dense. And those aren't as difficult to understand as you may think either. We're going to start with flatten because it takes no arguments. But it will be pretty easy to move on from there. So adding a flattening layer, this might seem a little ridiculous. It might seem unnecessary even. But if you're looking at a picture and that picture captured all sorts of data points and maybe it was x long and x wide, and some amount thick, we really only have to worry about the width and the height. And every other piece of information can probably somehow be encoded without having it be stretched out like this. Like let's say that that stretching out is color, r, g, and b. So even if we have our image kind of laid out in kind of this rectangle, there's some three layer of depth to it. The first layer is how much red is in that pixel, how much green, and then how much blue. But what if we don't really care or we can encode that data somehow some other way? Then we can flatten the picture, so to speak, and hand you a two dimensional thing instead of a three dimensional one. And if you can do the same but taking two dimensional things and collapsing it into a line, you would flatten again. And so this concept is really not that difficult. It's actually something that we do anyway. If you wanted to analyze an image and you didn't really care about the color, for example, you could flatten it, make it black and white. You've now flattened an image. And so this, although it might be a little bit strange or weirdly worded, it does something that we're actually not too familiar with. The next easiest one is probably drop out. And this plays a role in something that we've already seen. This plays a role in basically overfitting. So we've talked about this term before. We've taught a toddler a bunch of facts. And that toddler knows those facts. It knows what a brachiosaurus is. That's it. And so now what we want to do in our model is make sure that our model isn't doing that. It's not just going, the answer is a, the next answer is b, the next one is c, and so on. We want our model to pick up on patterns and say, well, according to how those patterns work, that should be this. That's a much better model. And so in this case, what we do is we introduce dropout. And you could think of that as every once in a while we just kind of randomly kick out some data. It's with 50% probability, supposedly. So the fraction there is to tell it how much data to kick out, not the probability. My mistake. This is actually a fraction of the data that's going to be kind of dropped in a given section, in this layer. And so what that layer says is, like, all right, we're going to just kind of everyone once in a while not pick a piece of data. And then we're going to move on and do something else. And in that way, we don't give it the same data set every time. We give it a little bit less. We say, all right, here's the data you get to pick from. We're actually only going to hand you this much. Here you go. And then the next time it comes around, it might be a different subset. Maybe I'll hand you this subset instead of the previous one. And in that way, we can avoid to a degree overfitting. Now 50% of the data being dropped out every time, pretty high. And so I've introduced that here to kind of combat the overfitting that will occur if we are training something hundreds of times in a couple of minutes. But you could lower that and see what happens. It'll probably get very, very good at the kind of training data. But it'll be pretty bad at the actual validation or testing data. So maybe not ideal. But you could also increase this so much that it never gets good at the training data and well the testing data will follow suit. And that's not ideal either. So there is some kind of give and take here. You do have to mess around with it a little bit. And you can add more or fewer of these layers as you see fit. You'll notice there is some dimensionality that needs to happen. For example, if you got rid of the flattened layer, Keras will just be like, I don't understand what's going on. And it'll kind of freak out on you. But other than that, you can play around with these more or less as you see fit. The other kind of major one before we go into max pooling 2D is dense. And dense has some activation. If you were to imagine dense as being some distribution of weights is what they're called or numbers that tell the computer what the value of the decision it makes is, so if I tell you that you touching a stove has a value of 100 and you touching the ground has a value like 40, and you touching your own skin has a value of like 0, then you can pretty easily tell where my value system is going there. Which one is more dangerous. And that sort of value system is at play here, but the activation tells it well how do we want to just kind of pre-weight things to a degree. And [INAUDIBLE] happens to just be a common one that people use with images. There are a bunch of them. You're free to look them up. Keras comes in built with a ton of them. There's like Softmax. There's 10H. There's all sorts of other ones. And they all mean varying things. That can be very technical. Sometimes you can just play around with them and see which ones work better. You can just swap them out every once in a while and try it. Which one works? Which one doesn't? And you'll often find that [INAUDIBLE] works particularly well with images just because of the math underneath. And if you're interested in that math, feel free to talk to me afterward. But if you're not, we're going to just kind of leave that as an activation that tells it the value of its decisions. And that's the starting value of its decisions. Later, it re-weights itself. It says, oh, yeah, no, that was a bad decision. We're rechanging that. And then dense is the actual thing being added to this layer. That is the name of this layer. And dense and of itself is really just saying, hey, we're going to have a of nodes or neurons. We're going to have 16 of them specifically. And we're going to have them all be able to communicate with each other. And so what that just says is if I make a decision, I'm going to tell everyone around me that's the decision I made and it was bad. Don't do that one. That was a terrible decision. It's like if you get a little bit too drunk on water one night and you just go around to everyone the next day like, hey, guys bad plan. Don't do that. Do you not make that decision. Very easy way of dealing with that. And in that layer you have a bunch of neurons all talking to each other. And some people's immediate solution is well I could just add more neurons. Sometimes. Sometimes not. And you'll notice that it makes your computer a lot slower. So there is always a trade off. And then we have our max pooling 2D, which is actually pretty intuitively named if you know what's going on underneath. But if you don't, it's just like what the fudge. So what we have ending up going on here is I gave it a pool size. I said 2 by 2. And so if you imagine that in your image you have 2 by 2 like sections of pixels, basically a square that has four pixels in it sliding across the image. Then really what I'm doing here is I'm kind of pooling them all together and taking the max. That's it. And I'm taking that max one and saying that that is probably the feature that determines things. And in images, that can sometimes be the case. Particularly for this kind of image, it works pretty well. There's also min pooling. You take the minimum. That's the one that matters. There are cases where that might be particularly relevant. What if you're looking at the negative of images? Maybe that applies here. Maybe it doesn't. And so that's something to keep in mind. And there's, I believe, also average pooling. It might be called mean pooling in Keras. But it does the same thing that you just would think of, takes that, averages it, and then does that as it goes. And the pool size can change if you think that's appropriate. 2 by 2 is pretty fitting here because we aren't really saying, yeah, this whole thing, if you just take the biggest point there, that determines whether it's happy or not. That's it. That's not very accurate. We wouldn't be able to get far from that. So this helps us condense our data a little bit. We kind of just take the information we're looking at and throw out some of the fluff. And you do this a couple of times. And then at the end we spit out our output. So that is kind of your very topical overview into machine learning and hopefully an introduction to the idea that it is accessible as a final project for CS50, specifically. But even in the kind of real world, outside of CS50 and outside of classes, if you wanted to tinker around with this that is totally within your capabilities. And I mean your not as someone who has done a year of CS and is now teaching a course, but someone who has started where you all started or worse. I started with no experience of this and this was where I went. This was the direction I chose. And it's totally accessible. You can do that. That is entirely within your grasp. And so if you're at all interested in it, I would recommend pursuing it. You'll find that it is difficult. There are points where it is frustrating. But that is the case with anything that you are going to do in CS. There are points where they will be difficult, where they will be frustrating. So I would encourage you to not give up but rather think that that is basically the right path. You're going down the street. Just keep going. Because you might as well, if you're going to do this on any project, do it on one you're interested in. And that's more of a piece of advice specifically directed at the final project for CS50. Don't waste your time for three weeks. Build something cool. If it's hard and it takes a lot of time and it's very annoying to debug and there's things that don't work up until the last possible minute, you're probably doing it right. That's probably about right. A lot of the best work in CS happens at the last possible minute. It's that moment is where you're like, I got it. And then you're good. And that sense of relief is why a lot of us are still in CS, is we like that feeling of being satisfied with what we produced. So if you ever think that CS is not for you because it's too difficult or because everyone seems to get it but you, that is 100% not the case. Machine learning is hard. Computer vision is hard. Computer science is hard. Learning is difficult. All of this, we can do. So I would recommend always pursuing it. What are some questions you guys have about machine learning, computer vision? I figure in my last like seven or so minutes I'll open it up to any questions. Sure. SPEAKER 2: I was wondering if you could talk maybe about the min pooling and max pooling, and when to use a 2 by 2. Like what are some circumstances where you'd use like 100 by 100 or [INAUDIBLE] SPEAKER 1: You can imagine, I'm not really in a pull up any image. But maybe I'll keep the cat one up here it's cute. So this image has a set amount of pixels in it. So the question being why min pooling versus max pooling versus mean pooling and what does it mean to have a different pool size. Why is that relevant really? And so we're going to talk about this image in particular, because it's cute. And I think brown or red. I can't really tell. But it's an Egyptian cat and they have the beautiful like wide eyes and big ears. They're awesome. And basically this image has a set number of pixels. Even though I'm displaying it on some number of pixels, the image itself is, let's say, I don't 400 by 200. Not quite right but close enough. So if it's 400 by 200, then in a given, we'll say like 20 by 20 box, we can only get so much data. Let's say that's just the tip of the ear, 20 by 20. Well, if I take just the max of that, then you can think of it as the actual 20 by 20 section, if I do a max of 20 by 20, well, then the entire tip of this ear becomes one point. I have one data point that is the tip of the ear. And same thing as we iterate through the entire image. So that this image gets significantly condensed. If it was 400 by 200, well you can think of it as now being reduced by a factor of 20 which might be appropriate. Maybe all you really care about is the general shape. Is it a cat or is it a doorknob? That's a pretty easy classifier to build. All I have to really care about is it not a circle, not a doorknob, good. But maybe your class of doorknobs is different. It can get more complicated from there. But in this case, you'd probably want to use to preserve detail a pretty small pool. We're just trying to condense our image a little bit. We're trying to get rid of some of the fluff, some of the noise. Like there's some fur here. But it doesn't really matter what that fur actually does, unless you're looking for a very particular machine classifier in which case you're probably not looking at pictures of whole animals. So if we're looking at, is it a cat versus a dog, well, does it really matter if there's a speck of fur here or some extra noise captured by the camera that took the picture? Not really. So we can just kind of ignore that and average over it. Or use max pooling over it and just say, you know what, we're just going to pool all of our details, all the biggest details, together. And while that can be appropriate in this case, what if this picture was 4 million pixels by 2 million pixels? Now your pool size might want to be scaled up a lot. We don't need all of that extra information, especially if it's the same picture. We can just say, you know what, we're going to reduce that by a factor of like a million. And now you have a 4 by 2, which might be a little too much. Now you've basically just got four pixels down and two across and hopefully it's still a cat. But you can play around with that. And that might be a case in which you would need to change whether you're doing a min or a max or even just how you're analyzing this image. Is it appropriate to take just one image and do this? Or is only one image in your data set extra large and then all of the rest of them are like 150 by 150? Then you might want to change that. SPEAKER 3: So like [INAUDIBLE] like if you had one image that was say like 4 million pixels long, it would probably make more sense then to preprocess that data before you go in, [INAUDIBLE] size to like a certain value [INAUDIBLE]. SPEAKER 1: Yes. Actually there is a little bit of that in the sample code. There is a bit of what was just brought up, which is pre-processing of an image. It's another kind of fun little word that people will throw out there, just like oh, yeah, preprocess your image before you use them. And they like turn around and just ignore that they just dropped a whole thing on you. Pre-processing is really just basically what was just mentioned is that you want to take your images and kind of normalize them a little bit. You don't want to have this outlier in your data set that's 4 million by 2 million when the rest of like 100 across. You want to take those and maybe resize them, scale them down using appropriate methods. And whatever those methods are might change depending on the data you're looking at or depending on how you want to do it but being able to normalize across them is going to be some sort of preprocessing. And it's called preprocessing if you think of it as the processing part is throwing it into your machine learning algorithm, you do this beforehand-- preprocessing. And so that's where that terminology kind of comes in. And it comes up a lot with images in particular because images can be taken by cameras, which are used by people and people are pretty stochastic. I might take the same picture 400 times and it might look different every time. And that's kind of a problem with how people take pictures, especially for real world scenarios where you're applying it to some sort of pictures of living animals or people's faces or things like that. You'll probably want to find a way to preprocess your images so that they're roughly the right size and give or take roughly the right thing that you're looking for. Maybe a picture of someone's face is zoomed all the way out here and maybe every other person's picture is like zoomed in super close so you just have their face. Harvard IT uses that to identify you. They preprocess all of the images they take of you so that it's just your face and they can identify you. And that's something that comes up a lot in machine learning. It's part of the project. And I think I'm right about out of time, but I'll be hanging around afterward for any questions. But as far as the livestream goes, thank you for watching. I'll be on campus kind of doing my own thing. But I really appreciate you hanging in all the way through the weird cat picture at the very end. So thank you very much. Thanks for showing up, you guys.