[Seminar] [A Programmer's Introduction to APIs] [Billy Janitsch] [Harvard University] [Tommy MacWilliam] [This is CS50.] [CS50.TV] Hi, everyone, I'm Billy, and today I'm going to be talking about APIs, or application programming interfaces, specifically in the context of CS50 final projects and that sort of thing. In general what is an API? In very broad terms, it's sort of a middle man that allows 2 pieces of software to communicate with each other. That's a kind of very broad definition and isn't that relevant for what we're looking at. What we really want is some sort of useful middle ground to communicate with some sort of database somewhere. Here's a chart, and basically the idea is that we are an application, and we want to get data from a database, but we don't want to query the database directly. Instead we want to go through this sort of middle man, the API. The idea behind that is numbers 2 and 3 on the chart are both going to be very complicated and messy. In other words, when the API is querying the database, it's probably going to be using SQL tables and all of that sort of stuff, and we've learned a bit about it in CS50, but overall, you've noticed that it's a bit of a pain. It gets very, very complicated and messy, especially when you're making complex queries and that sort of thing. What we really want is some sort of useful and simple way to get that data, and that's the idea behind numbers 1 and 4 on the chart. In other words, we want a really simple way to tell the API what to get for us and a really simple way to get that data back. There is one main way that that data is usually sent and received, which is JSON, or JavaScript Object Notation. That can vary a little bit as far as how you send the request to the API. In other words, if you want some certain amount of data, how you tell the API to get that data can vary a little bit. Usually it involves making some sort of network request. In other words, accessing some sort of URL that's going to tell the API exactly what you want, but the data is almost always sent back, in other words, number 4 in JSON. What is JSON exactly? As I said, JavaScript Object Notation. It's basically the universal standard for transmitting and receiving data. The idea is that you have these 3 categories of things. You have arrays, hashmaps, and primitives. Arrays and hashmaps you've looked at a little bit in CS50, but you've sort of gotten a very strict sense of what they are. In other words, with arrays you know that they're type bound, so you only have one sort of type that goes throughout the entire array. JSON is a lot more lenient with that sort of thing. Basically the idea is you construct this object, which can be composed of any of these 3 things and can be composed of multiple ones of them, and they can be nested. Here's sort of an example of JSON, which is these curly brackets here represent your hashmap, and a hashmap is basically a mapping from some sort of key to some sort of value. You'll see here that we have the properties key, and that's mapping onto an array, which is this whole thing. We see another element of the hashmap, which is this key isAwesome, which maps to a primitive value of true, in other words, a boolean. Primitives can be strings. They can be integers. They can be bools, anything like that. And you see the contents of this array that properties points to has 2 strings in it, self-similar and wonderful. Those are 2 properties of JSON, and we see that JSON is awesome. To look at that a little more closely I'm going to construct a more complex example of JSON here. Let's start with an array, for example, just an empty array. But that's sort of boring, so we're going to fill it up a bit, and as I said, arrays in JSON are type bound, so we could also have a string here, which is hi, and that's another element of that array. And likewise, we could add a hashmapping here, which is going to have a few mappings. It's going to have a mapping from name to the string Billy. We have a mapping from name to Billy, and we have a mapping of favorite color to blue. That's basically a good example of JSON. It kind of gets into—whoops, need a comma there—all of the different parts of it. Again, it's not type bound at all, so you can have any kind of types inside anything you want, and the idea is it's self-similar. In other words, this right here is a JSON object, as is this whole thing, as is just this, so you can have a primitive be an object, an array be an object or a hashmap be an object. As you can kind of see, JSON is really, really useful in that it's so versatile. You can have any possible data that you can conceive stored in JSON. That makes it a really nice language to use with APIs because it pretty much means that no matter what data that you want there's going to be some way to get it back in JSON. A few properties that make JSON particularly good for this sort of thing. As you can see, compared to a lot of things that you've been working with in CS50 it's comparatively very easy to read and also very easy to write. You can indent it out if you want, like I was doing in that example, which gives you a nice, pretty version that you can see really well. But moreover, it's also easy to read and write for a computer. In other words, it's easy to parse and easy to encode, which means that it's pretty fast as far as reading the data is concerned, and JSON can be generated really quickly. It's also very easy to access different parts of JSON and that sort of thing. That's nice, and furthermore, the fact that it's self-similar, in other words, the fact that you can have JSON within JSON within JSON is really nice for storing data. Another part that is generally really useful in working with APIs is jQuery. You've learned a little bit of JavaScript, which is a nice way to manipulate HTML and CSS within a website. But it can kind of be a pain to code in plain JavaScript, largely because JavaScript is a really verbose language. You have to learn a lot of syntax, and just to do very simple things it takes a lot of code, so jQuery is a library for JavaScript. In other words, it's a JavaScript file that you can load and then use jQuery functions to do certain things. And jQuery basically makes your life a whole lot easier. It simplifies what would take hundreds of lines in JavaScript down to a few lines in jQuery. It's particularly useful if you're using APIs because generally how you'll be accessing APIs is by making AJAX requests, and I believe David has mentioned in lecture that AJAX requests are generally when you're making a network request to some sort of server and getting back some sort of data and updating a page instantaneously. Whereas in plain JavaScript that would take crazy numbers of lines to validate all of the headers and do all of that sort of stuff, jQuery has a really simple function called AJAX, and all you have to do in AJAX is give the parameters that you want to give the API, the location of the API and any additional sort of options that you want to configure. It's really, really nice and very useful for this kind of thing. That's all we need to start getting our hands dirty in APIs. I'm going to bring up a few examples and explore their different properties and why they're useful for different kinds of things. The first thing I'll actually show you is something that I'm working on at my research lab, which is an Ngram Viewer, and basically the idea of an Ngram Viewer is you can search for some kind of word or phrase and see how often it's appeared in a certain set of text over time. This example here is this data set of babies that were born in New York between 1920 and 2000. We can search, for example, for the name Jennifer, and we see that pre-1960s it really wasn't used all that much, and then as we get into later years it's becoming used more and more. We can also do comparisons, so if we compare Jennifer to, for example, Thomas, we can see Thomas has been pretty prevalent throughout history, whereas Jennifer is a more recent name. We can do that kind of thing. How does this application work? Basically, it works via an API. In other words, we have certain parameters here. We have the parameters of what we're actually searching for, which are these names, and then we have a few other properties, like the Y axis and the X axis. You can see we have a few different options as far as the time resolution to use and that sort of thing. We have these options as far as what data we actually want from the database, and we want to get that data back in some useful way. Ordinarily, if we were querying the database directly it would sort of be a pain to do because presumably this data about baby names lives in some database somewhere, and it would be really complicated to have to query it manually and decide exactly what data to return. In other words, we only care about Jennifer and Thomas in this case, and we only care about on a certain axis and all of that sort of stuff. How do we get around this? To dig into this API a little more I'll show you another example of this platform which uses a slightly different data set. This data set, instead of being baby names, is actually just the entire print publication database of Open Library, which is a giant source of texts published throughout the last 100 or so years. The idea is we have this compository of millions and millions of text, which we can now search for different words and phrases in. Here's an example that varies a little differently from the previous example I showed you, which is we have these 3 search queries, war, war, and the French word for war, which is guerre. And we're searching within 3 different sections of the total database. In other words, in this first query we're only searching in the USA, in the second one only in the UK, and the third only from works published in France. We see some interesting patterns emerge. For example, we see right around here which— oops, I messed up the axis a little bit, but you can see right in this range here around the Civil War there's a big spike in the American edition but not such a big spike in the other two, and that's obviously because the American Civil War was happening at that point. We can see some cool stuff there, but what we really care about is how we got this data. I'll take you behind the scenes in this app in a little bit. A neat trick is if you're working with the site and kind of want to know what's going on behind the scenes, you can open up the developer tools. I'm going to be using Chrome's developer tools, and to get to those you can do control, shift, J, and that takes you to the JavaScript console. There are a few tabs here. They can all be pretty useful under different circumstances, but I care about the network tab right now, and I actually have to refresh to get that working. Oh, sorry. It likes to give a random example. Okay, we'll use this example instead then. The idea is there's this API here, and you can see exactly what the API is returning. This is what the application is getting back from the API having sent that request. Let me zoom in a little bit, and we can basically see it's just a series of key value pairs in JSON. In other words, we have this hashmap here that's mapping values. In other words, it's mapping years to values. In 1765 whatever word we initially searched for is used 90 times out of 1 million, so we're getting back this result. It's not exactly JSON since we have this little result header here, but notice that this whole object here is just a great big JSON blob. We have an array here which contains this whole element, and you can see that whole element ends there, and then we have another big element that goes all the way down to the end, and that ends here. We have a really big array with 2 objects in it, and each of those objects is a hashmap. You can see within each of those hashmaps we have a mapping of this index value to 0 and this value's value to another hashmap, which again is mapping X axis values to Y axis values. You can see JSON gets a little bit complicated, but overall, it's actually very useful, and it's very easy to access compared to other different forms of notation. As far as what we're actually sending data to the API to get, I'm going to go into the back end a little bit here. This is the big JavaScript file that's handling all of the interactions of the web app, and so we don't care about most of this, but we do care about some of it. For example, we care about this buildQuery function, and the idea of this function is basically it's looking around the page, figuring out what the user wants to query, in other words, checking those boxes where they've input their search terms, checking the different Y and X axis values that they've chosen and all of that sort of thing, and it's going to spit out this query value, which I can then send off to the API. This looks complicated, and it is pretty complicated but what I'm going to do—in fact, I'm already doing this, which is great— is that I'm going to get the console to print out exactly that query value that it's sending off to the API. That's actually right here. Sorry, it outputs a lot of things. But this is what we care about, this object right here. This is the query object. In other words, this is exactly what the web application is sending to the API, and so let's look inside a little bit, and we see we have a few values here. We see we have this count type, which is occurrences per million words, which is exactly what we've chosen in the Y axis over here. That's where that's coming from. We have a database value, which means that there's some certain database that this data is living in, and we want to access that data specifically as opposed to the baby names data, for example. Then we have this groups value, which is saying that we want to search by year as opposed to any other X axis value. Then we have a method, which some APIs will do multiple things. In other words, this API can also return other kinds of data, but in this case, we want that mapping of X axis values to Y axis values. That's what that is telling it to do there, and we have this search limits array, which contains 2 values. The first one is what we see here, which is all of the values contained within that first little box at the top. In other words, we want to look for the word battle, and we want to filter it by English texts within American literature. We have this country, which is USA. We have a language, which is English, so we have all of these different parts that are all telling the API exactly what we want. We don't know what the data that we get back is yet, but we know that the data is going to take a certain form. This example is sort of on the complicated side, and you wouldn't necessarily be using an API this complex, but this is to show you the range and power of what APIs can do. In other words, using a relatively simple query system we basically have an input box with a few other selectors in different places. Let me zoom back out here. We have an input box with a few different metadata selections, and we have Y axis and X axis selections. We don't actually have that many fields, and we can see very easily we're able to query some sort of API and get data back and then put it into this chart, which is then going to display it in a useful way. To look at another example that might be a bit more familiar to you guys we're going to turn to Facebook. Facebook's API is called the Facebook Graph, and basically what that means is Facebook sees itself as this massive database of lots of different parts that all have certain relationships to each other. In other words, I'm a user on Facebook, so I have a profile, and I also have certain friends, and each of them has a profile, and each of my friends has a wall, which has different comments on it, and each of those comments has likes and all of that sort of thing. There's lots of different parts to Facebook. It's a hugely complex API, and there's tons you can do with it, but it's actually pretty simple to use. I'm going to start out by going to graph.facebook.com/billyjanitsch, which is my unique account name, and your account name will either be some kind of word if you've chosen it, or it might just be a string of numbers. What we get back is pretty basic information. We see that I have a first name, which is Billy, a last name, which is Janitsch. There's a unique Facebook ID which I have. You can see that I'm male and that I have my language setting to British English. In other words, we're seeing very basic information here. It's not too much, but it does give us an idea of what's there. We can do the same thing to David Malan, for example. I think his name is dmalan. We see David Malan has a unique ID. He has a name, first name, middle name, last name. We also see that he's male and has his language set to US English. In other words, we're seeing pretty basic information here. Now, what happens if we try to check out something else? Let's say I'm interested in what David Malan has liked on Facebook. I can do /likes. Now we've run into a problem. We've got some sort of error that says an access token is required to request this resource. But if you think about it, that actually makes sense because it would be weird if you could access every single part of Facebook's database just from some sort of simple API, right? In other words, presumably your information can't be accessed by anyone who wants it. This error is precisely what that means. Some APIs require certain permissions in order to access their data. And even more advanced APIs, like the Facebook one, will require certain permissions to do certain things. I can see this basic information about David Malan. I can see that he's male and that he lives in the US, but I can't really see anything past that. To get around this for now, Facebook has this nice tool which is the graph API explorer, and the idea of that is you can sort of make up permissions for yourself based on your own account and then view things that specifically your account can view. For example, if I do graph.facebook.com/billyjanitsch/likes— whoops, I guess I have to revalidate my token here. Okay. If I do that again, great, now I see that I get this object back which says that I like pool noodles, which are in the category Games and Toys. I like walruses, which are in the category Animal. These are my actual Facebook likes. They're kind of embarrassing. But we can see this data is all returned in JSON. It's pretty readable. In other words, we have this mapping of data to some sort of an array, and each element of this array is a hashmap which maps the name of a like and the category of a like. Each like has a unique ID. There are all sorts of different things of data that we can get, and if you're interested in using the Facebook API for a CS50 final project or for anything like that it's actually quite doable. Basically how you get around the authentication thing is Facebook uses a system called OAuth, or Open Authentication, and I don't want to get into it now because OAuth or the different type of authentication tends to vary a lot between different APIs, so I could spend a long time going over each one, but they're actually pretty self-explanatory. If you Google Facebook API it's very readable. There's a whole spec. For example, this is the documentation for the Facebook API, and you can see I'm on the User page, so I can learn all about the different kinds of things that are available to get as far as data and also the different permissions that I need in order to access them. As we saw, we don't need permissions to access the name or the gender, but beyond that we do need permissions for most things. This page, or rather, this website will also tell you how to get a token to be able to authenticate yourself. Most authentication systems use some sort of token where you get this unique value, which is a really long and random string, and that way they can associate the request that you're making with you. In other words, they know that you're not doing anything suspicious with their data. They know exactly what you're getting. They also know that you have permission to view that information. If you've made a Facebook app and your app has certain users, and those users have allowed that app to access certain parts of their profile, then whatever API key or token that that app is using will be able to access the data for those users. This might sound complicated, but it's not too bad, and if you want to use Facebook I would highly recommend that you consider playing around with their API. It's very cool, and you can do a lot of different things with it. If the user grants you these permissions you can even go back to the API and say I want to actually post to this user's wall, or I want to have them post a photo, and that's why on your news feed you'll sometimes get those annoying things saying your friend has watched this video on some sort of weird site or something like that. That's because that app has been granted access to post on that person's wall. The idea overall, the Facebook API is pretty complicated but also really useful. Definitely worth checking out if you're still looking for a final project. Another suite of APIs that I'm going to go over is CS50 APIs. Let me zoom in here. CS50 has actually put together a whole series of APIs that you can use for a final project or just for anything that you're making. And they're mostly Harvard related, and they vary from the HUDS menu, for example, to this Harvard Events API, which will let you access a list of different events that are going on at Harvard and that sort of thing. And so we can click on any one of these and get a spec for it, which you'll be able to find for any API, and the idea is it lets you know, A, specifically what to request from the API and how to request it. In other words, if I want all events that are happening tomorrow then I've got to obviously give it that date that I want in a certain format, and B, it will tell me exactly what it's going to give back to me. It will say I'm going to return you this JSON object, or like you can see, there are different formats. You can also return the data as a CSV, for example. But you know exactly how that data is going to look when you get it back so you can expect to do certain things with it. We can scroll down and see, for example, if we want to query the API to get a calendar, then we can use this particular URL and give it certain parameters which are going to be the data that we want exactly. And likewise, if we want the data back in a certain format, then we can ask it to output the data in a CSV, and that's just another parameter that we're passing to the API. Lots of cool things to do there. I would definitely recommend checking out the CS50 APIs. I'm going to look at this Harvard Food API in particular for a little bit. One thing I've actually designed is this Harvard Noms website, which uses the CS50 Food API to retrieve the HUDS menu for the day. And for extension school people, HUDS is the dining service at Harvard. What you get is this page which contains all of the meals for the day, so we see lunch. We have a few different categories. We have the bean and whole grain station. We have the brown rice station. We can see for brunch we have these few food items. If we click on them, then we get the nutrition information. You see this is the nutrition information for grapefruit, in case you were wondering. And so again, we're going to peer into the back end here a little bit and see what exactly this is doing to get this data. And it turns out to not actually be very complex at all. This file looks a little messy, but keep in mind that this is handling the entire website, and if I scroll down we see this change data function. Now, just to be clear, this is written in CoffeeScript, which is a language that you probably haven't seen before. But it's pretty readable, so I'll walk through it as though it were pseudocode. Change date is a function that's going to take in this date value, and it's also going to take in a first, which we don't care about as much. But the important thing is that it has this date, and that date is the day that we want to request all of the food items for. And then you see we have a little bit of syntax here, which is basically parsing that date into a readable format. In other words, the API requires the date in a certain format. You can't just say November 16th, 2012 AD. It won't know what to do with that. It wants the date in a specific format. All we're doing here is giving it exactly that format, which is a year value and then a hyphen, a month value, another hyphen and the date value. And we also say we want the data to be output in JSON. Now we're making this AJAX request, and as I mentioned earlier, jQuery has this super useful AJAX function which all you need to do is specify a few parameters down here, and it will give you back exactly what you want. We're telling it that the URL we want it to go to is this CS50 Food API, which we got from the spec. We say that we want the data in JSON and that we're going to give it this data which we've defined up here. This is the day we want the food items for. And then all we have to do is define some sort of success function, which is basically what happens when the API returns that data. In other words, we've packaged up all of the parameters that we want, which in this case is the day that we want it and the fact that we want it in JSON, and we sent it off to the API, so now the API is saying, okay, here is your data, I got it back for you. We have the success function, which means given that the API successfully returns some data, what do we do with it? And it turns out that all we do is call this update menu function with whatever the API has returned, so we can search for that and see that all we're doing is using a bunch of new syntax here to update the HTML and insert this new data. What this allows is we have these arrows on either side, and we can click, and now we're looking at the data for the next day and again for the next day, and each time it's updating that date value and querying the API, getting back some data and putting it into the site. Again, you can see, super, super useful. This app took me a few hours to hack together, and I have a bit more experience, obviously, but your CS50 final project can look something very much like this. APIs are super powerful for the amount of effort that they take. The last thing I'm going to go over is a few more APIs broadly. I won't get as far into them as far as what they do specifically, but I'll give you an idea of what's out there. 2 really useful ones, if you're interested in data analysis or visualization or anything like that, are Freebase and Wikipedia. Wikipedia—presumably you all know—is a free online encyclopedia, and it actually has an API, so if you want to, for example, get all of the texts and the articles for octopus you can very easily do that. Just say hey, Wikipedia API, I'd like the data returned as this, and I'd like it in this format, and the article I'd like is octopus, and very quickly it will give you back that information. That can be really useful if you want to make some sort of site that's a better viewer for Wikipedia or something like that. Freebase is sort of similar, although it's a little bit harder as far as API. Freebase is like Wikipedia in that it's an online encyclopedia which contains lots and lots of different data about all sorts of different topics, but it's stored in a relational database, which is slightly different from Wikipedia. Wikipedia has its articles and articles linked to other articles, but for the most part, if you want the data for octopus, you go to the octopus article, get that data, and you have a bunch of text about octopuses, so that's great. Freebase works in a slightly more complicated manner in that everything is related to one another. In other words, if we're searching for octopus then it has a bunch of categories associated with it. For example, it's an animal, it lives underwater, it has a certain body temperature. I don't know. And all of these categories are links to other places where you can go to see things with that same category. In other words, the octopus data set would contain a link to the data set for all animals, and that would let me move around in the database really quickly. This can be very useful if you're doing something like comparisons. In other words, given a certain thing, you want to see what else it's related to and see what else it's not related to. That sort of thing. It can be useful in a number of ways. If you're looking for more of a challenge and to be able to do some more complex things I would consider taking a look at the Freebase API. But largely, Wikipedia is a very simple place to go as far as getting information. Another place that I'll look at is Last.fm, and I'm actually going to go to the site in case some people aren't familiar, but Last.fm is basically a music tastes and recommendations website. You can make an account. You can start uploading music from your music player to the website, and basically it will start giving you music recommendations based on what you listen to. For example, if you go to your profile page—this is mine— you can see you have a list of recently listened to tracks. You can see overall favorite artists, all of that sort of thing, and again, there's a big API behind Last.fm, and you can use it to do lots and lots of really cool things. For example, I'll go to a friend's page who has this Last.fm Tools website. This is actually another platform that's built on the Last.fm API, and it does a number of pretty interesting things. If I log in with my user name, for example, I can ask it to generate a tag cloud, for example, and what that's going to do is give me back an image of all the different genres and that sort of thing that I like to listen to. How is it doing this? Very basically it's saying to the Last.fm API here's this user. I'd like to know the genre of every song that they've ever listened to, and you can do that by making a pretty simple AJAX call to the Last.fm API. You'll get back a big list, and then obviously some other stuff is being done to turn it into a word cloud, but you can see overall it's very easy to access and very easy to use. Really nice for a number of things. I think that's about all I'll say overall. One last thing I'll mention about APIs in general is that you'll sometimes run into something called rate limiting, and the idea of rate limiting is you don't want to abuse APIs. In other words, it's really nice that a lot of these websites have APIs that you can go to and use for free. However, if you're making millions or billions of requests per day, for example, if you're stuck in an infinite loop that's infinitely querying some sort of API and getting back a huge amount of data, obviously that's not good, so what a lot of APIs do is have this rate limiting feature that says you can only make 1,000 requests per day per IP address or something like that. And if you're doing a lot of testing and that sort of thing, you'll sometimes run into that, and suddenly it will shut you off and say no, I'm not giving you any more data. What you want to do is play by the rules. You want to make sure that you read the API spec carefully. If it has certain rules attached to it, like you can only make X queries per day or you can only access a part of the database a certain number of times or something like that you want to make sure you stick to that. As long as you play within those rules you'll probably have a really nice time using APIs. Your overall takeaway is APIs are really, really useful. There's an API for almost any big web service out there. Pretty much any part of the Google Tools Suite, Google Maps, Google Earth, GMail, Google Calendar, all of those things have APIs. You can use them to both get data from the server and send data to the server. In other words, if you wanted to make a calendar app that can update someone's Google Calendar, there's an API for that. If you want to make something that's going to tell you where the location of a certain address is you can use the Google Maps API for that. APIs are fantastically useful, and they're everywhere. If you're interested in some sort of idea, there's probably a related API that you can use to get a lot of data very quickly and very simply. If you're still looking for a project or if you just want to play around with something in general, APIs are definitely worth doing. Thanks, and I'm happy to answer any questions that you guys may have. Okay, thanks a lot. [CS50.TV]