[MUSIC PLAYING] SPEAKER: Well, hello one and all, and welcome to our short on patterns. We will look at how we can use regular expressions to create some pattern we expect in some data we might have. Now, we've seen how to do this with emails, but one other kind of data we might be able to validate using a pattern is something called a hexadecimal color code. So it turns out that colors, like this shade over here, have a certain assigned hexadecimal color code, a way of representing this color, but in a computer's memory. So this particular color has this particular code, the hash symbol 0076BA. This corresponds to this particular shade of blue. It turns out there's a pattern to this pattern you see over here. In fact, they always will begin with this hash symbol, and they'll be followed by 6 characters which range from 0 to 9 or A to F, upper or lower case. And it turns out there's also a bit of a structure to them too. The very first two characters after, let's say, the hash symbol, well, that defines how much red is in this color on a scale of 00 to FF, the highest. There's also this second set of two characters-- in this case, 76-- which corresponds to the amount of green in the color. And then the final two-- in this case, BA-- correspond to the amount of blue that's in this particular color here. Again, each of these two sets of characters ranging from 00, the lowest, meaning no red, no blue, no green, or the highest being FF, meaning all the red, all the green, all the blue, for instance. Now, let's look at a few other examples here. So here, as I said, this is entirely red. The reddest red you can get. It is hash symbol FF0000, entirely red with no other colors involved. This here is green-- 00FF00. And this, let's say, is blue. The bluest blue you can get, at least on the web, using these particular color codes. 0000FF. Now we can combine these colors too, and get both black and white. So black, let's say-- or actually white being all the colors combined is FFFFFF. And black being the absence of color, well, that will be 000000. So this was our brief intro to hexadecimal color codes. Let's go and see how we can create a pattern to validate codes like these. So I have over here a program called code.py, and the goal, again, is to validate hexadecimal color codes. I might enter into this program. So up top I've imported this module called re, which stands for regular expressions-- allow me to use regular expressions in my code here. I then have a function called main, which asks the user for some input. It asks them to enter a hexadecimal color code, and I store that result with the user typed in this variable named code. But let's see how we could try to validate the input the user gives us using some kind of pattern. Well, the first thing to do might be to define the pattern we're looking for in this user's given code to say this is a valid hexadecimal color code. So I could maybe make myself here a variable called pattern. And because this pattern is a regular expression, I'll want to prefix it with this r character here, which means I'm going to create a raw string. Typical escape characters, like backslash n, for instance, won't be interpreted as backslash n, the newline character, literally be interpreted as backslash and then n. So this helps us here with regular expressions and the special syntax that those expressions have. I'll leave this blank here. Let's continue on. Let's say I want to search this particular code the user has given me for this pattern. Well, thankfully the re module comes with a function called search, and I can access it using re.search. Now the first argument to search is the pattern I might expect to find in the text that I am given, let's say, from the user here. So I'll type pattern as the first argument to search. I'll then type, in this case, the string I want to search for this pattern within. So I'll type code here just like this. And it turns out that re.search returns me something called a match object, which I'll store conveniently in a variable called match. Now, this only happens if search actually finds the pattern I'm looking for in the given input. So let's go ahead and maybe check this out. I'll say if match-- that is, if I actually find a match, well, I'll print something like this. I'll print valid. And just to be sure here, I might also try to have this match object show me what exactly it matched in this given input to search through. So I could get access to that saying maybe matched with, and then I can use match.group. This function here that will show me exactly what search found as a match given this pattern. So we'll see that in action in a little bit. Now, otherwise, if we didn't find a match, I might print something a little more simple just like this-- invalid. So this is our program, and a lot of it hinges on setting up this pattern to behave as we might expect. Well, this pattern, as we said before, will be a regular expression. And one thing we can maybe start with is the very easiest part of these hex color codes that they all begin with this hash symbol. In fact, if we go back to some of these slides here, we'll see that these will be our rules for what to look for. It should begin with this hash symbol and be composed of six characters after the hash symbol-- 0 through 9 and A through F, upper or lower case. So let's begin with just this first part here. It should begin with this hash symbol. So if I go back now to my code, I could type simply this hash symbol. And this is my very simple pattern. re.search will look through the entirety of the code the user has entered, and if it finds the hash, this hash symbol anywhere in that code, it will return to me a match object and say, yes, I did find a match for this particular pattern inside the user's input. So let's try it. I'll run Python of code.py. And I'll type in-- let's do a valid one. Let's say hashtag AAAAAA. Don't quite know what color that is, but probably some shade of gray given we have all the various red, green and blues aligning in the same level here. I'll hit Enter, and I'll see valid and also matched with this hash symbol. So it seems like the reason we said this code was valid is that re.search found that hash symbol inside of this text here. But of course, I can do something like this. I could say Python code.py-- hashtag-- let's do GG. Let's do II and KK. This is not a valid hexadecimal color code, but according to our pattern, it is, because it sees in this case that hash symbol. So we need to improve this, and we can do so by using other features of regular expressions. One of them is going to be called a character set. And a character set begins with these square brackets here, one opening and one closing. Now, within these square brackets, I could include all the characters that I could possibly match after, let's say, this hash symbol. So if we go back to our slides again, we said that it should begin with a hash symbol, which we already have in our pattern. But then we expect six characters in the range of 0 through 9 and A through F, upper or lower case. So let's begin, let's say, with these actual letters here. I'll go abcdef to say we could match any of these individually, either lowercase a, lowercase b, c, d, e, f. We also want to take in the capital ABCDEF, and then anything between 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. So this is our character set. After we find, let's say, the hash symbol, we should expect to find any character in this set. So let's try this out. I'll go ahead and run Python of code.py. And recall we gave-- I think, it was hashtag GGIIKK-- just random letters I made up. If I hit Enter now, we'll see invalid. Now, why do you think that is? Well, re.search is looking for a hashtag, which it finds, of course. But then immediately after the hashtag, we want to find a character in this character set-- lowercase a through f, capital A through F, or 0 through 9. And it seems like we don't find that here. We have #G, which is not in this character set. So let's go ahead and try this one. I'll try Python of code.py, and let's go ahead and do the valid color from before-- #AAAAAA. Hit Enter, and we'll see that this is valid, but perhaps not again for the right reason. I see matched with #A, which is an improvement. But I think I could still do this. I could still do #A, and then a GGGGGG, which is not a valid color code. And this, I think, will still be valid. So our problem here seems to be that we're expecting the right range of characters for the first character we see after the hash, but not for all of the 6 after, let's say, we have this hash symbol here. So how could we fix that? Well, we can use what's called a quantifier, and I can access a quantifier using these curly braces here. So this ensures whatever number I put in here, like 6-- this means that for this particular character set, whatever character or character set precedes this quantifier, I should expect that number of them, in this case, exactly. So I'll expect six of these particular characters inside of this set here. Let's try this out. I'm going to Python of code.py. And I think I'll demonstrate now that #AGGGGG, this is not going to work for us. I'll hit Enter, and I'll see that is invalid. Let's try this one now. Python of code.py. I'll do #AAAAAA, and that seems to be valid now for the right reason. We matched with the entire thing. All of these in this case fall into this range of valid characters. But I think there's still one more thing to go and improve here. Let's try this. I'll do Python of code.py. Let's get a little bit tricky. I'll do #AAAAAA, and then something like-- why don't we just do a 0 at the end? Now this is not a valid color code. This is more than 7 characters. I'll hit Enter here, and it's still valid. Now, this might be confusing, because I did say earlier that this quantifier here ensures that we get exactly 6 of the characters in this character set. And well, it seems like we've done that. I have here in this code hashtag-- hash symbol here, and then AAAAAA. That's certainly six of them. And so I seem to have found a match inside of this longer string. And it's a little more clear. I could even do this. Python of code.py. The code is #AAAAAA. Well, this is valid even though I typed in "the code is," which I don't want to include in a valid hexadecimal color code. So one thing I can use here is called an anchor. If I want to say that my pattern should start at the beginning of the string, I can do so by adding a caret here. So this will say that when I search for this pattern, this hash symbol must be the very first character that I see in the input text. So this would no longer be valid under this caret symbol here, this anchor. Let me try it again. I'll do Python of code.py. I'll say the code is #AAAAAA. Hit Enter. That now is invalid because our very first character in this input is not the hash symbol. Now similarly, we can do-- have an anchor at the end, which looks like dollar sign. And that means that the last character-- the last characters we expect should be these six in this range here. So this will exclude things like the hash #AAAAAA0, because that is more than, in this case, six. The last character we're matching is not within the six we expect here. So I'll run Python of code.py, and I'll type in hash symbol #AAAAAA0, and that, of course, is invalid as well. So I think this pattern works pretty well for us, but there is one way to improve how we write it. I could, instead of typing all this out, use a range. I can say, conveniently, a-f to insinuate or to imply I want a, b, c, d, e, and f as valid characters. I can do the same thing for capital A to capital F just like this. Capital A, dash, capital F. That will give me capital A, B, C, D, E, F. And again, same thing for 0 through 9. I could do hashtag here, giving me 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, all inclusive. So this, I believe, should do the same thing for us. I'll type Python of code.py. Type in #AAAAAA. And this seems to be a valid color code. So we've seen here how to use these things called regular expressions and the patterns that actually implement them to validate things that have a pattern to them. These are useful beyond just these. Anything has a pattern to it, you can probably try to validate using these things called patterns and regular expressions. This then was our short on patterns, and we'll see you next time.