1 00:00:00,000 --> 00:00:03,423 [MUSIC PLAYING] 2 00:00:03,423 --> 00:00:05,870 3 00:00:05,870 --> 00:00:09,900 SPEAKER: Well, hello one and all, and welcome to our short on patterns. 4 00:00:09,900 --> 00:00:14,240 We will look at how we can use regular expressions to create some pattern we 5 00:00:14,240 --> 00:00:17,150 expect in some data we might have. 6 00:00:17,150 --> 00:00:19,040 Now, we've seen how to do this with emails, 7 00:00:19,040 --> 00:00:22,880 but one other kind of data we might be able to validate using a pattern 8 00:00:22,880 --> 00:00:25,500 is something called a hexadecimal color code. 9 00:00:25,500 --> 00:00:28,490 So it turns out that colors, like this shade over here, 10 00:00:28,490 --> 00:00:31,580 have a certain assigned hexadecimal color 11 00:00:31,580 --> 00:00:35,490 code, a way of representing this color, but in a computer's memory. 12 00:00:35,490 --> 00:00:43,490 So this particular color has this particular code, the hash symbol 0076BA. 13 00:00:43,490 --> 00:00:47,150 This corresponds to this particular shade of blue. 14 00:00:47,150 --> 00:00:51,030 It turns out there's a pattern to this pattern you see over here. 15 00:00:51,030 --> 00:00:54,140 In fact, they always will begin with this hash symbol, 16 00:00:54,140 --> 00:00:56,960 and they'll be followed by 6 characters which 17 00:00:56,960 --> 00:01:02,820 range from 0 to 9 or A to F, upper or lower case. 18 00:01:02,820 --> 00:01:05,560 And it turns out there's also a bit of a structure to them too. 19 00:01:05,560 --> 00:01:09,690 The very first two characters after, let's say, the hash symbol, well, 20 00:01:09,690 --> 00:01:14,790 that defines how much red is in this color on a scale of 00 to FF, 21 00:01:14,790 --> 00:01:15,730 the highest. 22 00:01:15,730 --> 00:01:20,400 There's also this second set of two characters-- in this case, 76-- 23 00:01:20,400 --> 00:01:23,700 which corresponds to the amount of green in the color. 24 00:01:23,700 --> 00:01:26,340 And then the final two-- in this case, BA-- 25 00:01:26,340 --> 00:01:29,880 correspond to the amount of blue that's in this particular color here. 26 00:01:29,880 --> 00:01:34,470 Again, each of these two sets of characters ranging from 00, the lowest, 27 00:01:34,470 --> 00:01:39,690 meaning no red, no blue, no green, or the highest being FF, 28 00:01:39,690 --> 00:01:43,735 meaning all the red, all the green, all the blue, for instance. 29 00:01:43,735 --> 00:01:45,610 Now, let's look at a few other examples here. 30 00:01:45,610 --> 00:01:48,550 So here, as I said, this is entirely red. 31 00:01:48,550 --> 00:01:49,720 The reddest red you can get. 32 00:01:49,720 --> 00:01:56,170 It is hash symbol FF0000, entirely red with no other colors involved. 33 00:01:56,170 --> 00:01:58,230 This here is green-- 34 00:01:58,230 --> 00:02:00,940 00FF00. 35 00:02:00,940 --> 00:02:02,800 And this, let's say, is blue. 36 00:02:02,800 --> 00:02:05,340 The bluest blue you can get, at least on the web, 37 00:02:05,340 --> 00:02:07,020 using these particular color codes. 38 00:02:07,020 --> 00:02:10,020 0000FF. 39 00:02:10,020 --> 00:02:13,900 Now we can combine these colors too, and get both black and white. 40 00:02:13,900 --> 00:02:21,870 So black, let's say-- or actually white being all the colors combined is FFFFFF. 41 00:02:21,870 --> 00:02:27,460 And black being the absence of color, well, that will be 000000. 42 00:02:27,460 --> 00:02:30,340 So this was our brief intro to hexadecimal color codes. 43 00:02:30,340 --> 00:02:35,080 Let's go and see how we can create a pattern to validate codes like these. 44 00:02:35,080 --> 00:02:38,670 So I have over here a program called code.py, 45 00:02:38,670 --> 00:02:41,860 and the goal, again, is to validate hexadecimal color codes. 46 00:02:41,860 --> 00:02:44,350 I might enter into this program. 47 00:02:44,350 --> 00:02:47,260 So up top I've imported this module called 48 00:02:47,260 --> 00:02:50,820 re, which stands for regular expressions-- allow me to use 49 00:02:50,820 --> 00:02:53,140 regular expressions in my code here. 50 00:02:53,140 --> 00:02:57,820 I then have a function called main, which asks the user for some input. 51 00:02:57,820 --> 00:03:00,690 It asks them to enter a hexadecimal color code, 52 00:03:00,690 --> 00:03:06,190 and I store that result with the user typed in this variable named code. 53 00:03:06,190 --> 00:03:09,480 But let's see how we could try to validate the input the user gives 54 00:03:09,480 --> 00:03:12,600 us using some kind of pattern. 55 00:03:12,600 --> 00:03:15,300 Well, the first thing to do might be to define 56 00:03:15,300 --> 00:03:20,730 the pattern we're looking for in this user's given code to say 57 00:03:20,730 --> 00:03:23,410 this is a valid hexadecimal color code. 58 00:03:23,410 --> 00:03:28,690 So I could maybe make myself here a variable called pattern. 59 00:03:28,690 --> 00:03:31,590 And because this pattern is a regular expression, 60 00:03:31,590 --> 00:03:35,040 I'll want to prefix it with this r character here, which 61 00:03:35,040 --> 00:03:38,230 means I'm going to create a raw string. 62 00:03:38,230 --> 00:03:42,210 Typical escape characters, like backslash n, for instance, 63 00:03:42,210 --> 00:03:46,980 won't be interpreted as backslash n, the newline character, 64 00:03:46,980 --> 00:03:49,920 literally be interpreted as backslash and then n. 65 00:03:49,920 --> 00:03:53,430 So this helps us here with regular expressions and the special syntax 66 00:03:53,430 --> 00:03:55,210 that those expressions have. 67 00:03:55,210 --> 00:03:56,760 I'll leave this blank here. 68 00:03:56,760 --> 00:03:58,440 Let's continue on. 69 00:03:58,440 --> 00:04:03,270 Let's say I want to search this particular code the user has given me 70 00:04:03,270 --> 00:04:04,840 for this pattern. 71 00:04:04,840 --> 00:04:08,850 Well, thankfully the re module comes with a function called search, 72 00:04:08,850 --> 00:04:12,670 and I can access it using re.search. 73 00:04:12,670 --> 00:04:15,600 Now the first argument to search is the pattern 74 00:04:15,600 --> 00:04:20,690 I might expect to find in the text that I am given, let's say, from the user 75 00:04:20,690 --> 00:04:21,190 here. 76 00:04:21,190 --> 00:04:24,220 So I'll type pattern as the first argument to search. 77 00:04:24,220 --> 00:04:29,400 I'll then type, in this case, the string I want to search for this pattern 78 00:04:29,400 --> 00:04:30,100 within. 79 00:04:30,100 --> 00:04:32,740 So I'll type code here just like this. 80 00:04:32,740 --> 00:04:36,000 And it turns out that re.search returns me something 81 00:04:36,000 --> 00:04:41,640 called a match object, which I'll store conveniently in a variable called match. 82 00:04:41,640 --> 00:04:45,660 Now, this only happens if search actually 83 00:04:45,660 --> 00:04:49,548 finds the pattern I'm looking for in the given input. 84 00:04:49,548 --> 00:04:51,340 So let's go ahead and maybe check this out. 85 00:04:51,340 --> 00:04:52,650 I'll say if match-- 86 00:04:52,650 --> 00:04:56,200 that is, if I actually find a match, well, I'll print something like this. 87 00:04:56,200 --> 00:04:58,030 I'll print valid. 88 00:04:58,030 --> 00:05:02,520 And just to be sure here, I might also try 89 00:05:02,520 --> 00:05:09,620 to have this match object show me what exactly it matched in this given input 90 00:05:09,620 --> 00:05:10,590 to search through. 91 00:05:10,590 --> 00:05:14,090 So I could get access to that saying maybe matched with, 92 00:05:14,090 --> 00:05:18,080 and then I can use match.group. 93 00:05:18,080 --> 00:05:23,000 This function here that will show me exactly what search 94 00:05:23,000 --> 00:05:25,590 found as a match given this pattern. 95 00:05:25,590 --> 00:05:27,590 So we'll see that in action in a little bit. 96 00:05:27,590 --> 00:05:30,230 Now, otherwise, if we didn't find a match, 97 00:05:30,230 --> 00:05:33,150 I might print something a little more simple just like this-- 98 00:05:33,150 --> 00:05:36,060 invalid. 99 00:05:36,060 --> 00:05:41,150 So this is our program, and a lot of it hinges on setting up this pattern 100 00:05:41,150 --> 00:05:43,370 to behave as we might expect. 101 00:05:43,370 --> 00:05:48,360 Well, this pattern, as we said before, will be a regular expression. 102 00:05:48,360 --> 00:05:51,620 And one thing we can maybe start with is the very easiest part 103 00:05:51,620 --> 00:05:56,143 of these hex color codes that they all begin with this hash symbol. 104 00:05:56,143 --> 00:05:58,310 In fact, if we go back to some of these slides here, 105 00:05:58,310 --> 00:06:01,410 we'll see that these will be our rules for what to look for. 106 00:06:01,410 --> 00:06:06,210 It should begin with this hash symbol and be composed of six characters 107 00:06:06,210 --> 00:06:07,290 after the hash symbol-- 108 00:06:07,290 --> 00:06:10,513 0 through 9 and A through F, upper or lower case. 109 00:06:10,513 --> 00:06:12,430 So let's begin with just this first part here. 110 00:06:12,430 --> 00:06:14,680 It should begin with this hash symbol. 111 00:06:14,680 --> 00:06:19,480 So if I go back now to my code, I could type simply this hash symbol. 112 00:06:19,480 --> 00:06:22,530 And this is my very simple pattern. 113 00:06:22,530 --> 00:06:28,150 re.search will look through the entirety of the code the user has entered, 114 00:06:28,150 --> 00:06:33,240 and if it finds the hash, this hash symbol anywhere in that code, 115 00:06:33,240 --> 00:06:36,120 it will return to me a match object and say, 116 00:06:36,120 --> 00:06:40,430 yes, I did find a match for this particular pattern inside the user's 117 00:06:40,430 --> 00:06:40,930 input. 118 00:06:40,930 --> 00:06:41,770 So let's try it. 119 00:06:41,770 --> 00:06:44,250 I'll run Python of code.py. 120 00:06:44,250 --> 00:06:46,450 And I'll type in-- let's do a valid one. 121 00:06:46,450 --> 00:06:48,000 Let's say hashtag AAAAAA. 122 00:06:48,000 --> 00:06:51,120 123 00:06:51,120 --> 00:06:54,660 Don't quite know what color that is, but probably some shade of gray given 124 00:06:54,660 --> 00:06:58,990 we have all the various red, green and blues aligning in the same level here. 125 00:06:58,990 --> 00:07:04,150 I'll hit Enter, and I'll see valid and also matched with this hash symbol. 126 00:07:04,150 --> 00:07:08,370 So it seems like the reason we said this code was valid 127 00:07:08,370 --> 00:07:14,310 is that re.search found that hash symbol inside of this text here. 128 00:07:14,310 --> 00:07:16,840 But of course, I can do something like this. 129 00:07:16,840 --> 00:07:18,810 I could say Python code.py-- 130 00:07:18,810 --> 00:07:22,830 hashtag-- let's do GG. 131 00:07:22,830 --> 00:07:26,550 Let's do II and KK. 132 00:07:26,550 --> 00:07:32,010 This is not a valid hexadecimal color code, but according to our pattern, 133 00:07:32,010 --> 00:07:35,050 it is, because it sees in this case that hash symbol. 134 00:07:35,050 --> 00:07:38,190 So we need to improve this, and we can do so 135 00:07:38,190 --> 00:07:41,470 by using other features of regular expressions. 136 00:07:41,470 --> 00:07:44,740 One of them is going to be called a character set. 137 00:07:44,740 --> 00:07:49,590 And a character set begins with these square brackets here, one opening 138 00:07:49,590 --> 00:07:51,070 and one closing. 139 00:07:51,070 --> 00:07:56,070 Now, within these square brackets, I could include all the characters 140 00:07:56,070 --> 00:08:00,255 that I could possibly match after, let's say, this hash symbol. 141 00:08:00,255 --> 00:08:02,130 So if we go back to our slides again, we said 142 00:08:02,130 --> 00:08:05,680 that it should begin with a hash symbol, which we already have in our pattern. 143 00:08:05,680 --> 00:08:09,690 But then we expect six characters in the range of 0 144 00:08:09,690 --> 00:08:12,880 through 9 and A through F, upper or lower case. 145 00:08:12,880 --> 00:08:16,830 So let's begin, let's say, with these actual letters here. 146 00:08:16,830 --> 00:08:22,530 I'll go abcdef to say we could match any of these individually, 147 00:08:22,530 --> 00:08:26,520 either lowercase a, lowercase b, c, d, e, f. 148 00:08:26,520 --> 00:08:30,150 We also want to take in the capital ABCDEF, 149 00:08:30,150 --> 00:08:35,740 and then anything between 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. 150 00:08:35,740 --> 00:08:38,070 So this is our character set. 151 00:08:38,070 --> 00:08:41,190 After we find, let's say, the hash symbol, 152 00:08:41,190 --> 00:08:45,550 we should expect to find any character in this set. 153 00:08:45,550 --> 00:08:47,200 So let's try this out. 154 00:08:47,200 --> 00:08:49,920 I'll go ahead and run Python of code.py. 155 00:08:49,920 --> 00:08:51,120 And recall we gave-- 156 00:08:51,120 --> 00:08:55,030 I think, it was hashtag GGIIKK-- 157 00:08:55,030 --> 00:08:56,290 just random letters I made up. 158 00:08:56,290 --> 00:08:59,620 If I hit Enter now, we'll see invalid. 159 00:08:59,620 --> 00:09:00,880 Now, why do you think that is? 160 00:09:00,880 --> 00:09:06,090 Well, re.search is looking for a hashtag, which it finds, of course. 161 00:09:06,090 --> 00:09:08,640 But then immediately after the hashtag, we 162 00:09:08,640 --> 00:09:12,120 want to find a character in this character set-- 163 00:09:12,120 --> 00:09:17,230 lowercase a through f, capital A through F, or 0 through 9. 164 00:09:17,230 --> 00:09:19,510 And it seems like we don't find that here. 165 00:09:19,510 --> 00:09:23,730 # We have #G, which is not in this character set. 166 00:09:23,730 --> 00:09:25,960 So let's go ahead and try this one. 167 00:09:25,960 --> 00:09:30,060 I'll try Python of code.py, and let's go ahead and do the valid color from 168 00:09:30,060 --> 00:09:30,617 before-- 169 00:09:30,617 --> 00:09:31,117 #AAAAAA. 170 00:09:31,117 --> 00:09:33,690 171 00:09:33,690 --> 00:09:38,970 Hit Enter, and we'll see that this is valid, but perhaps not again 172 00:09:38,970 --> 00:09:40,000 for the right reason. 173 00:09:40,000 --> 00:09:44,313 # I see matched with #A, which is an improvement. 174 00:09:44,313 --> 00:09:45,730 But I think I could still do this. 175 00:09:45,730 --> 00:09:53,020 # I could still do #A, and then a GGGGGG, which is not a valid color code. 176 00:09:53,020 --> 00:09:55,770 And this, I think, will still be valid. 177 00:09:55,770 --> 00:09:58,500 So our problem here seems to be that we're 178 00:09:58,500 --> 00:10:02,460 expecting the right range of characters for the first character we 179 00:10:02,460 --> 00:10:05,770 see after the hash, but not for all of the 6 180 00:10:05,770 --> 00:10:08,560 after, let's say, we have this hash symbol here. 181 00:10:08,560 --> 00:10:09,950 So how could we fix that? 182 00:10:09,950 --> 00:10:12,790 Well, we can use what's called a quantifier, 183 00:10:12,790 --> 00:10:17,110 and I can access a quantifier using these curly braces here. 184 00:10:17,110 --> 00:10:21,100 So this ensures whatever number I put in here, like 6-- 185 00:10:21,100 --> 00:10:24,670 this means that for this particular character set, whatever 186 00:10:24,670 --> 00:10:27,520 character or character set precedes this quantifier, 187 00:10:27,520 --> 00:10:31,330 I should expect that number of them, in this case, exactly. 188 00:10:31,330 --> 00:10:37,120 So I'll expect six of these particular characters inside of this set here. 189 00:10:37,120 --> 00:10:38,320 Let's try this out. 190 00:10:38,320 --> 00:10:40,390 I'm going to Python of code.py. 191 00:10:40,390 --> 00:10:45,310 # And I think I'll demonstrate now that #AGGGGG, this 192 00:10:45,310 --> 00:10:46,610 is not going to work for us. 193 00:10:46,610 --> 00:10:49,550 I'll hit Enter, and I'll see that is invalid. 194 00:10:49,550 --> 00:10:50,660 Let's try this one now. 195 00:10:50,660 --> 00:10:52,550 Python of code.py. 196 00:10:52,550 --> 00:11:00,410 # I'll do #AAAAAA, and that seems to be valid now for the right reason. 197 00:11:00,410 --> 00:11:03,380 We matched with the entire thing. 198 00:11:03,380 --> 00:11:09,050 All of these in this case fall into this range of valid characters. 199 00:11:09,050 --> 00:11:12,600 But I think there's still one more thing to go and improve here. 200 00:11:12,600 --> 00:11:13,290 Let's try this. 201 00:11:13,290 --> 00:11:15,010 I'll do Python of code.py. 202 00:11:15,010 --> 00:11:16,260 Let's get a little bit tricky. 203 00:11:16,260 --> 00:11:21,830 # I'll do #AAAAAA, and then something like-- 204 00:11:21,830 --> 00:11:24,830 why don't we just do a 0 at the end? 205 00:11:24,830 --> 00:11:26,760 Now this is not a valid color code. 206 00:11:26,760 --> 00:11:29,100 This is more than 7 characters. 207 00:11:29,100 --> 00:11:33,080 I'll hit Enter here, and it's still valid. 208 00:11:33,080 --> 00:11:35,150 Now, this might be confusing, because I did 209 00:11:35,150 --> 00:11:38,510 say earlier that this quantifier here ensures 210 00:11:38,510 --> 00:11:43,910 that we get exactly 6 of the characters in this character set. 211 00:11:43,910 --> 00:11:46,730 And well, it seems like we've done that. 212 00:11:46,730 --> 00:11:50,330 I have here in this code hashtag-- 213 00:11:50,330 --> 00:11:54,720 hash symbol here, and then AAAAAA. 214 00:11:54,720 --> 00:11:56,280 That's certainly six of them. 215 00:11:56,280 --> 00:12:01,812 And so I seem to have found a match inside of this longer string. 216 00:12:01,812 --> 00:12:03,020 And it's a little more clear. 217 00:12:03,020 --> 00:12:03,895 I could even do this. 218 00:12:03,895 --> 00:12:05,700 Python of code.py. 219 00:12:05,700 --> 00:12:08,180 The code is #AAAAAA. 220 00:12:08,180 --> 00:12:11,580 221 00:12:11,580 --> 00:12:15,560 Well, this is valid even though I typed in "the code is," 222 00:12:15,560 --> 00:12:20,120 which I don't want to include in a valid hexadecimal color code. 223 00:12:20,120 --> 00:12:24,570 So one thing I can use here is called an anchor. 224 00:12:24,570 --> 00:12:30,470 If I want to say that my pattern should start at the beginning of the string, 225 00:12:30,470 --> 00:12:33,150 I can do so by adding a caret here. 226 00:12:33,150 --> 00:12:36,470 So this will say that when I search for this pattern, 227 00:12:36,470 --> 00:12:42,480 this hash symbol must be the very first character that I see in the input text. 228 00:12:42,480 --> 00:12:46,537 So this would no longer be valid under this caret symbol here, this anchor. 229 00:12:46,537 --> 00:12:47,370 Let me try it again. 230 00:12:47,370 --> 00:12:48,840 I'll do Python of code.py. 231 00:12:48,840 --> 00:12:54,350 I'll say the code is #AAAAAA. 232 00:12:54,350 --> 00:12:55,190 Hit Enter. 233 00:12:55,190 --> 00:12:59,870 That now is invalid because our very first character in this input 234 00:12:59,870 --> 00:13:02,660 is not the hash symbol. 235 00:13:02,660 --> 00:13:06,290 Now similarly, we can do-- have an anchor at the end, which 236 00:13:06,290 --> 00:13:08,090 looks like dollar sign. 237 00:13:08,090 --> 00:13:12,800 And that means that the last character-- the last characters we expect 238 00:13:12,800 --> 00:13:16,200 should be these six in this range here. 239 00:13:16,200 --> 00:13:22,070 # So this will exclude things like the hash #AAAAAA0, 240 00:13:22,070 --> 00:13:25,130 because that is more than, in this case, six. 241 00:13:25,130 --> 00:13:31,980 The last character we're matching is not within the six we expect here. 242 00:13:31,980 --> 00:13:40,020 # So I'll run Python of code.py, and I'll type in hash symbol #AAAAAA0, and that, 243 00:13:40,020 --> 00:13:42,000 of course, is invalid as well. 244 00:13:42,000 --> 00:13:45,990 So I think this pattern works pretty well for us, 245 00:13:45,990 --> 00:13:49,140 but there is one way to improve how we write it. 246 00:13:49,140 --> 00:13:52,820 I could, instead of typing all this out, use a range. 247 00:13:52,820 --> 00:14:00,170 I can say, conveniently, a-f to insinuate or to imply I want a, b, c, d, 248 00:14:00,170 --> 00:14:02,280 e, and f as valid characters. 249 00:14:02,280 --> 00:14:06,320 I can do the same thing for capital A to capital F just like this. 250 00:14:06,320 --> 00:14:11,960 Capital A, dash, capital F. That will give me capital A, B, C, D, E, F. 251 00:14:11,960 --> 00:14:14,610 And again, same thing for 0 through 9. 252 00:14:14,610 --> 00:14:18,860 I could do hashtag here, giving me 0, 1, 2, 3, 4, 5, 6, 253 00:14:18,860 --> 00:14:21,630 7, 8, and 9, all inclusive. 254 00:14:21,630 --> 00:14:24,060 So this, I believe, should do the same thing for us. 255 00:14:24,060 --> 00:14:25,910 I'll type Python of code.py. 256 00:14:25,910 --> 00:14:26,690 Type in #AAAAAA. 257 00:14:26,690 --> 00:14:29,250 258 00:14:29,250 --> 00:14:32,310 And this seems to be a valid color code. 259 00:14:32,310 --> 00:14:34,970 So we've seen here how to use these things 260 00:14:34,970 --> 00:14:37,370 called regular expressions and the patterns that 261 00:14:37,370 --> 00:14:41,640 actually implement them to validate things that have a pattern to them. 262 00:14:41,640 --> 00:14:43,740 These are useful beyond just these. 263 00:14:43,740 --> 00:14:45,740 Anything has a pattern to it, you can probably 264 00:14:45,740 --> 00:14:50,000 try to validate using these things called patterns and regular expressions. 265 00:14:50,000 --> 00:14:54,610 This then was our short on patterns, and we'll see you next time. 266 00:14:54,610 --> 00:14:56,000