[MUSIC PLAYING] DAVID CHOUINARD: I'm David Chouinard, and this is D3. Welcome. We're going to learn about D3 today. D3 is a JavaScript framework for building a high quality interactive visualizations for the web. Things like what we're seeing in back of me, we're going to learn to make those things, kind of the basics of it. But it's going to be cool. Let's get started making pretty pictures. We've got more demos of prospects available. Let's do it. Act I, DOM manipulation-- we're going to start right away making cool things. First of all, on the left, we have code. On the right, we have the result of our code. Let's go through it. Let's make a circle. How does that sound? svg.append circle-- we just made a circle. You don't believe me, right? It's not there. So what we did right here is, SVG is scalable vector graphics. This is the way we tell the browser to make vector graphics in the browser. What we just did right now is added a circle to browse. The promise is that the circle requires a bit of basic attributes before we can actually see it. We need to tell it its x position, its y position, its radius. We didn't tell it any of that, so we're not seeing it right now. But let's tell it stuff. So first of all, you've got to give our circle a name. So let's call it circle. Our circle has a name now. And let's give it a few attributes. How about cx would center x, so the center of the x position. Let's say, 200 for 200 pixels. Let's give it a y of 200 pixels as well. And an r, a radius, of about 40 pixels. Now let's see. I cannot spell. There you go. We have a circle at position 200 pixels, 200 pixels, radius of 40 pixels. Kind of cool, right? We have a circle. Yeah. So no need to follow along. All these examples, all of the code I'm doing today will be provided online at the end in the form of interactive examples with checkpoints at every act, and so on. Let's do more stuff. This black circle is really ugly. I'm sorry for that error messages right there. There we go. Let's give it a color. How's that? I like to steel blue. Well, our circle changed color. That's great. Let's make it semi-transparent too-- semi-transparent. So these are attributes we're defining on the circle. The first thing we did is we put a circle on the page. And then we're defining a bunch of attributes. Some of these are required, like CX, CY, and Radius. And others are optional. There are a lot more attributes. There's a lot of them. For example, we could have a stroke as well, a stroke of red. But let's remove that. We're back to a circle, a blue circle. So let's make more circles. How's that? Let's make another circle. This is exciting, right? So say I just Copy-Pasted what we had already. Let's call it circle2. And let's do the exact same thing and give it attributes, given an x position of 300. Yay, we have two circles now. And of course, we could update these values. I could put it at 400, and now it moves. And since it's annoying, let's remove it, so circle2.remove. It's gone now. So what we're doing and is just very, very-- this is very similar to what you might do in jQuery, for example. We're just manipulating the DOM, it's called. You might have heard that word before. We're creating stuff, setting attributes on stuff, removing stuff. Now, here's where it gets interesting. So later in the code, we could still refer to the original circle here. So let's reset its attribute to cx. Let's say, its x position to 400. And I'm going to transition that, so it's obvious. There we go. So we added a circle. We set some attributes. We added another circle, removed it. And then we're modifying the original circle. But here's where it gets a lot more interesting. Not only can we set attributes as just values, we can say, hey, circle, go to position 200. We can also set them as functions. So instead of giving 400 here, we can make some calculation on the fly for what we want that attribute to be. So this is how you'd express that. We say, instead of 400, let me give you a function instead. And here, inside this function, we can make any crazy calculation. We could take the time and look at some other thing and dynamically decide for the circle what value we want. How about we just give it a random x position? So that's that. So what that says is, for every x, run this function. And what we're doing is calculating some things, a random times the width and returning that. So every time we run that, we get a circle that goes to a random place. It's kind of cool. I feel like I could look at this for a little. We're starting to get to something interesting here. Let's make this data driven now. There's no data here. Let's change that. Act II, Data Driven Documents-- So let's return to here. And let's just get rid of circle2, because we're just adding and removing it. So we don't really need it. We need to be a lot more clever here. Let's say, we have some data of some sort. One moment-- let's say, we had data of this form. We had an array, just a bunch of numbers. We have seven numbers here, whatever these represent-- amount in people's bank account, how much they weigh, god knows what. These are numbers, and we want to use our circles to represent those numbers somehow. We want to tie our circles to those numbers. So what we do. Let's say, we want a circle for every number. We could do the old thing we were doing-- circle append and circle2 and circle3. But this gets out of hand, and there's a lot of repeating logic. So let's get more clever with that. Instead of using the var circle svg.append that we were just using, we're going to use this little block here. I don't want to go in-depth into what all these parts do. And it's kind of an advanced topic. And I wish I could. But the key thing to recognize-- and you'll see is very often in D3 code. This block of text basic creates as many circles as there are data elements in this array right here. So this creates as many circles as there are elements. It's going to create us seven circles. And it does a really, really key thing. So let's run that. Let's remove our other circle. Let's just comment this part out and run this again. There we go. So our circle here is a lot darker, because we have seven circles, one on top of the other. We just created seven circles, one each for each of these data elements. But there's a key thing that happened with this snippet right here. It's that data was bound. So every single one of those data elements, 10, 45, 105, was bound to a particular circle. So these not only created a bunch of circles but ties those two things together. And in the future, because we created those circles with this D3 function, if I give you a circle, you can give me the data associated with it. So we can ask D3. Hey, D3, I have this circle. What's the data that the circle has? And D3 would tell us 10 or 45 or 105. These things are bound. That's a very, very fundamental concept. Let's look at that. So the way we'd ask D3-- so this is irrelevant for this, but just trust me on it. This is how we ask D3. Hey, D3, give me the first circle that you can find. Give me the first circle you can find. And then we could ask D3, what's the data on that, like this, 10. So we just ask D3, find me the first circle you can find. What's its data? 10, that is indeed our first data element. We could ask it, hey, D3, find us our third circle. 105. Why is this really important? So right here, I mentioned that we could use functions. And I mentioned that was a very powerful thing. So not only can our functions do things like do some computation, for example, return a random number, it can also do things based on the data. This is what data driven documents mean. That's what D3 stands for. So this x postition-- instead of just saying, all the circles, get x position 200, we could give it a function. And here, we can make some calculation. and d here stands in place for the data. So every time we have a circle, basically, D3 will create these seven circles. And then for every circle, it's going to go, hey, circle1 what's your x position. Previously, we were always answering 200. But now, every time D3 asks us what's your x position, it's going to give us-- we have that circle, so we have the data. It's going to give us the data and say, what do you want the exposition to be, based on that data. Let's just return the actual data. So if we run this, this gives us data driven documents. These circles are based in relation position-- they're bases as a function of the data. So for the first circle, D3 puts a circle. And then D3 asks us, what do you want the exposition to be. And we just say, whatever the data is. Make the exposition 10. Then it asks, what do you want the exposition to be for the second circle. And we answer, 45. And we, of course, can make some computation here. I find that those circles are kind of squished up. So multiply it by 3, multiply data by 3. Our circle just got expanded out. Our value was tripled. The circle is really on the edge, so let's maybe kind of offset it. Let's say, by 20. Here you go. This is a data visualization. It's a very basic one, but this gives us some insight into our data. It tells us that, for example, we have a little cluster of elements. And we have a big outlier here. This gives us some information about the distribution. If we were, for example, to change the data to 150 here and refresh, our visualization is changed. This document is data driven. So of course, all these elements, all these attributes here, we can use a function, not just the numbers, not just the x and y positions. So we can use a function for the color. So we'll do the same. We'll give it a function. And let's say, we could have conditionals in our function. This function can be hundred of lines long. It can do very, very complicated things. So let's put an if statement here. Let's say, if our data is less than 50, that's some threshold that we're interested in for some reason. Let's make it green. Otherwise, let's make it red. How's that? Nice. So our data visualization is starting to convey more interesting information on many channels. So now we know a bit about the distribution. And we know that there's some sort of cut off at 50 that we're interested in. We know that there are two data points below that threshold and most of them above. So as a final step, this data here, it's very rare to see this like that. So let's just move it out to a variable because that's cleaner, like this. And then we use that variable here. It's the exact same thing. It's just a bit cleaner. Next up, Act III, Scales-- So one problem right here is, if we change our data in this 200 value-- if we change it to 400 or something and refresh, then this value just went offscreen. So our logic right here of how we do the times 3 and 20, to spread it out and then offset it a bit is really clunky. What do those numbers mean? They're just hard coded there. And they're very much tied to the data. We want a data driven document. We want a very flexible document, that given data, adapts to it and represents it. What we basically need is, we have this range of numbers 10. 45, 105. And we want to map that out onto the width, the full width here. So we have the range of numbers going from 0 to 100. And we have this campus I goes from 20 to 700, in this case. We kind of want to map that on. We want to scale that up and then offset it a little bit. It turns out that D3 has these. It's called a scale. So let's use it. The way that works-- I'm going to type this up and then explain it. This is a scale. What it will do is, it will map out values from 1 to 200 on to 20 to 600. We can check that. We can see that here. So if I feed it 1-- one moment. Give me one second. I must have mistyped it. There you go. I'm sorry about that. So what a scale will do is, it will take a value and then convert that, expand that out, so it fills the full range you're asking for. So in this case, if we give it one, it's going to map that out onto 20. And if we give it 200, it's going to map that on to 600. And somewhere in between, if we get 100, it's going to be somewhere in between 20 and 600. And of course, now this is what we need to remove those hard coded things we have right there. So what we want to do is take the data that we're given, that individual data element, and pass it to scale first. So scale will scale it up. Well-- Oh, we have a little error here. We're missing data. There you go. And that expands it out. That gives us the same result we had before, but instead of having those hard coded constraints. And if the size of our canvas changes, for example, if we want to have this over 400 pixels and it squishes out, we can have it over-- we can expand it, or we can reduce this left margin to something less or more than 20. These numbers, these hard coded numbers now make sense to us. And we could do a lot more interesting things as well. So instead of having a linear scale, we might want to log a scale. And that will give us a log scale. So now our scale, instead of just expanding out that range, it's doing more sophisticated things. Instead of having this range hard coded, and instead of having that 600, we might want to just use the width, so from 20 to the width minus 40, 2 times the margin on the other side. And this makes a lot more sense to somebody who might look at the code. Interestingly, the scales get very, very sophisticated as well. They do a lot of interesting things. So scales don't necessarily have to operate just with numbers. Let's make a color scale. So our range could be-- our domain is 1 to 200. That's the input thing. But we might want to map from green to red, for example. And now, if we pass it 1, we're going to get green. If we give it 200, we'll get red. And if we pass it something in between, it's going to be some mix of that, somewhere on the gradient between green and red. And instead of having this kind of clunky logic we have here with the conditional right there, we could have something-- a linear scale between those. So we'd use the scale we just created, which we called color. And we'd give it d, which is our data element. And there we go. We have a color scale. So this is mapping. So the far left is completely green. The far right is completely red. And everything in between is a function of d. We have an interesting visualizations here. But our data was kind of boring. Let's see what we could do if we had more interesting data. Act IV, Working With Data-- the first thing we'll want to do to make our visualization more interesting is to move the data somewhere else. It's very clunky to have the data hard coded here. And generally, we'll be asking somebody else for the data. We'll be maybe asking the government, the Census Bureau, what's your data and then plotting that or asking some third-party entity for some data and then building a visualization on that. So the first thing we want to do is move that to somewhere else. So I'm going to create a file here called data.json. JSON is the data format. You don't have to know much about that. And we're going to copy the little data we have there, paste it in there verbatim, go back to our visualization code here, and use this function right here. You don't have to know the details. But what this will do is, it will find that file, fetch it, and return it to us. So what this does is, it goes and get the data.json file. And then all the code that's indented inside-- essentially, all the code we have there-- will run only when we get the data back. And then it's going to run that code with the data we have. Great, we have a visualization that queries for some code somewhere else, which is usually where it queries some data from somewhere else, which is usually how visualizations work. But I want to go back to the data. So the data fundamentally in D3-- D3 consumes data that's a list of things. D3 expects the data just be a list of things, an array of things. It doesn't matter what those things are, so long as it's an array of them. So here, for example, we could of course have floating point values. We could have negatives. D3 doesn't care, so long as it's a list of things. As interesting things we could have, we could also have a list of strings like that. So these are the Crimson headlines I picked up a few days ago. And maybe you can find some interesting things about these a headlines. So again, this is a list of things. D3 doesn't care. These happen to be a string. We've changed our data. Let's return to our visualization. Now, our visualization expects the input to be numbers. So we're going to have to make a few changes. So for example, first of all, maybe we want to put these circles along by the length of the headline, the number of characters in the headline. So what we'll do is-- every time our function is called with a string, we'll find it's length And then pass that to scale. The color, I'll return that to steel blue. And there we go. We have a visualization of Crimson headlines. Our scale is a bit off. Let's assume that the longest headline is 100 characters long, so span that out a bit. And we have a visualization. So it seems that most headlines are pretty close together, in terms of character line. But one there really stands out. We could build some tools to explore that more. But when I was working on this, I was curious whether, in this data set, headlines with a colon in them would be longer. I assumes they would. So let's find out. Let's use the color channel like we did before, to encode some about whether there's a colon or no. So we'll use a conditional again. You don't have to know the details of this, but this is how we check a string for a particular character in JavaScript, again, not relevant. But if we don't find a colon, we'll return green. And if we do, we'll return red. So again, headlines that have a colon will be red. This is what this means-- nice. So it seems that my hypothesis is bumped. There's only two. We only have six data points and only two had colons. But it seems a bit more on the lower end, in fact. Headlines with colons seem to generally be shorter, at least in our data set-- interesting. Let's return that to steel blue and then see what we can make with even more interesting data. So again, I mentioned that data in D3 is a list of things. We've seen numbers of many types. We've seen strings. But the things can also be objects. They can be complicated things that include a lot of things. To say that more clearly, in most cases, we want to build every data point as more complicated than just one value. If you'd imagine a database about students, there might be a student name, a student ID, and a lot of things associated with a particular record, not just a string or a number. So let's look at that. This is one such data set. This is a data set about earthquakes. So everything here on our list or array of things contains many things itself. So every data point has a magnitude and a coordinate. And coordinates themselves contain two things. So every day is now a lot more complicated and a lot more interesting and contains much more interesting information. Let's see we could build out of that. Returning back to here, again, using our histogram circle visualization we've built, let's see if we can build a visualization of magnitude distribution in our data set. So here, it's the same concept. But now, d contains more things. d contains many data elements. So we get d back. D3 gives us d. And we respond by finding the magnitude of d and then passing that to scale. And then we need to change our scale, of course. So magnitudes simply don't go much more than 10. Actually, there's never been a 10 magnitude earthquake. But that's kind of our upper end, our upper spectrum. Let's refresh. Nice, we have a visualization. It's interesting to note-- so there are two data points that are almost exactly on top of each other, in terms of magnitude. You see this by the opacity we're using. We have geographic data now. We have latitudes and longitude. Maybe we could do something a lot more interesting with that. Let's find some more interesting way to visualize this more complicated data we have access to. Act V, Mapping-- fundamentally, we want to put these on a map. I mean, this is where this is going. We want to encode information about the position of these earthquake readings, as well their magnitude, because we have that now. We understand how to consume more complicated data. The first thing we'll do is create a map, a background map. I'm going to go through this very quickly. This is tricky code. It's another one of those recipes you don't really have to understand fully for you to use. But this is code. This code right here creates a map. We're not going to go in detail. But superficially, what it does is, it queries this us.json file, which is a data file like the one we had before. It's more complex, of course. But in this case, everything, every data point is this state and has a list of latitudes and longitude that define the polygon, that form, that state. So what D3 will do is similar to what we did before. It will request that and bind that to an element. And there's a function that will map that element out, based on the latitudes and longitude. You can read more on that. And I recommend it. There are links at the end of this code posted. And the code is commented. In there are links for further on this. I recommend you look it up. But what we care about is this projection function. I want to go through that. First of all, let me show you that, yes, we have a map. Maps are cool. So let's look at this production function. Projection is very much like a scale, scales again. So what production for this projection function does is, we could pass it longitude and latitudes-- in this case, these values here are the lat-longs of the building we're sitting in right now-- to projection. And projection will convert that into x and y pixel values. So what projection is doing is very similar to our scale. It's taking our latitudes and longitude that represents a whole globe and shrinking and sizing that down to the square that we want, that we've given it. In this case, we're passing these values. And it's giving us, well, that on your screen means 640 pixels. This whole screen is 700 pixels wide, so that makes us about here, and 154 pixels down, which I would estimate is pretty much here. So taking those lat-longs, which represent something on the whole globe and squishing and moving that around to give us x and y pixel values, this is the first thing that's done in this mapping code. And then the rest of the code consumes the data and then maps those lat-longs onto something on your screen. But we're going to use this projection functions, because it turns out we have lat-longs longs as well. Looking back at our data, we have latitudes and longitude coordinates for every observation. So let's use projection. So looking at our exposition, we want our exposition-- we have a latitude and a longitude. But we want pixel values. And it turns out, we have exactly what we want-- projection. Very much like we were using scale right here, we're now going to use projection and pass it coordinates. So the first thing we're doing-- so we're getting d, which is an individual data element of an individual earthquake reading. The first thing we do is get the coordinates. All right, we have the coordinates. The second thing we do is pass that on to projection. Projection converts those coordinates into pixel values, x and y. And then the last thing we want to do is just get the x, which this case is the first one. It's the first of the two things that are returned by projection. We'll do the same for y. But instead, we'll return the second element, the y. Get ready to refresh. Ooh, extra character here-- nice, we have a data driven document that's concealing this JSON file of objects, making a map, and changing the attribute in relation to the data to project it on a map. This is really interesting. This is cool. Let's take it up a notch. I mean, we have two pieces of information with every data point. I mean, three. We have the coordinates, which is an x and y. And we have the magnitude. We need to encode magnitude somehow. We have a lot of channels. We can use color. We can use radius. We could use opacity. We could use many things in code. Any of these attributes and many more that are not listed there, because they're optional, we could use to encode this data, the stroke and all these things I've mentioned. Let's do radius. I think radius is the most intuitive. So again, we'll replace that hard-coded 40 and make some calculations. We'll use our favorite scale again. And we're past d. But not d because we want the magnitude of d. d is just the data point. We'll pass the magnitude to scale. Let's try that again. Ooh, it doesn't work. Why does it not work? So remember what scale does. Let's look at scale again. Scale maps from 1 to 10 on to 22 to 600, more or less. 600 is huge. This is why we're getting this. So we want to change our scale to something more reasonable. Let's say, we want 0 to 60. 60 is big, but 10 earthquakes are incredibly rare. In fact, they've never happened. So what this will do is, it'll take our magnitude that goes from 1 to 10 and map it on to expand it out. And map it on to 0 to 60. Let's refresh. Nice, we have a visualization. This is great. This is actual data. You'll notice, in my little toy example, the biggest earthquake is right on top of us. But that's it. We have a date driven visualization that consumes the data and gives us really interesting information. Yeah, let's add some interactivity to it. I mentioned that was the strong force of D3. So here, for every element, we're describing a bunch of attributes. But we can also describe what we want to happen with interactivity elements. For example, we could describe what happens when we mouse over. And very similar that, that'll take a function, very similar to the attributes we had before, where we do something to the element when we hover over it. So first thing we need to do is select that element, to find it basically, in the browser. and then we could set an attribute to it. So what I'm doing here is, when we hover over something, we'll get that element and then set its opacity back to 1, to completely opaque. Let's see what that looks like. It appears we have an extra semicolon here. So if we hover over here, it gets full. But now, of course, it stays full, because we have to describe what happens when remove our cursor. So let's do exactly that on mouseout, as opposed to mouseover. And we'll reset it to what we had before-- 0.5. And now, every time we hover, we get a full circle. It helps us see what we we're selecting essentially. And now let's make this really great. Let's connect this to real data. So let's ask could USGS about their data. So the US Geological Survey has data about earthquakes. They have a public API that's able to be consumed in JSON format. So let's do that. So this is a bit of code that connects to the USGS API. And there's a bit of processing on it. This is not relevant but simplifies it to a simple data format like the one we had before. So I get rid of our call to our fake data.json on file. And instead, I'm calling the USGS essentially. Let's refresh, nice. This is actual, real-life data from this week for earthquakes. This is really interesting. This is not surprising for us, but there are a lot of earthquakes on the West Coast in California. But I thought it was very interesting that there were so many earthquakes in Alaska, and apparently, here in the Midwest. I mean, interesting, and we're good. That's the conclusion. But fundamentally, this is what D3 helps us do. It helps us take data, bind it to elements in the DOM, and have those elements change as a function of the data, have those attributes, all the many attributes of the elements, all be useful for channels to convey information. D3 is an incredibly powerful library and amazingly well run. This is some powerful stuff. Data visualization is an incredibly powerful tool for conveying to people deep insights that gets to their core and helps them understand, in this profound and intuitive way, how data works and how data changes our life.