1 00:00:00,000 --> 00:00:00,550 2 00:00:00,550 --> 00:00:04,810 SPEAKER: The last function that you'll implement in helpers.pi is substrings. 3 00:00:04,810 --> 00:00:07,270 And the role of this function is going to be to do this. 4 00:00:07,270 --> 00:00:11,170 You'll take in, not just string inputs a and b as you had before, 5 00:00:11,170 --> 00:00:15,670 but also a variable n, which will be an integer representing the substring 6 00:00:15,670 --> 00:00:17,790 length that you want to compare. 7 00:00:17,790 --> 00:00:19,510 What do you do with that number n? 8 00:00:19,510 --> 00:00:24,490 Well you'll split each string into all possible substrings of length n 9 00:00:24,490 --> 00:00:25,640 characters. 10 00:00:25,640 --> 00:00:28,600 So if you have a string that's 5 characters long 11 00:00:28,600 --> 00:00:31,124 and you're looking for all substrings of length 3, 12 00:00:31,124 --> 00:00:33,040 you might have a bunch of different substrings 13 00:00:33,040 --> 00:00:37,000 that might overlap with each other that are all of length 3, for example. 14 00:00:37,000 --> 00:00:39,880 And we'll take a look at an example of that shortly. 15 00:00:39,880 --> 00:00:43,750 After that you'll want to calculate a list of all of the substrings 16 00:00:43,750 --> 00:00:46,304 that appear in both files as usual. 17 00:00:46,304 --> 00:00:48,220 And then just like in our prior two functions, 18 00:00:48,220 --> 00:00:51,460 you'll return a list that contains all of the substrings that 19 00:00:51,460 --> 00:00:54,650 appear in both of the files. 20 00:00:54,650 --> 00:00:56,800 Let's take a look at an actual example of this. 21 00:00:56,800 --> 00:01:01,390 Let's say that n were 3 and therefore we want to find all substrings of length 3 22 00:01:01,390 --> 00:01:04,750 characters and we had input "Hello". 23 00:01:04,750 --> 00:01:07,270 What would it look like if we tried to take the string hello 24 00:01:07,270 --> 00:01:11,226 and turn it into all possible substrings of length 3? 25 00:01:11,226 --> 00:01:12,850 Well it would look something like this. 26 00:01:12,850 --> 00:01:18,820 "Hel" is a substring of length 3, as is "ell" as is "llo". 27 00:01:18,820 --> 00:01:20,500 These are all substrings of length 3. 28 00:01:20,500 --> 00:01:21,730 And they do overlap. 29 00:01:21,730 --> 00:01:23,560 But these are all the possible ways we can 30 00:01:23,560 --> 00:01:28,270 represent three contiguous characters out of the string "Hello". 31 00:01:28,270 --> 00:01:31,090 If on the other hand n had been 2, now we 32 00:01:31,090 --> 00:01:33,717 want to look for all substrings of length 2 33 00:01:33,717 --> 00:01:35,050 of which there are more of them. 34 00:01:35,050 --> 00:01:40,270 We can do "He" or "el" or "ll" or "lo" and those 35 00:01:40,270 --> 00:01:43,600 are all possible valid substrings of length 2. 36 00:01:43,600 --> 00:01:46,280 So what are you going to have to do here. 37 00:01:46,280 --> 00:01:50,500 The first thing you want to do is extract all the substrings 38 00:01:50,500 --> 00:01:53,740 from each of the strings a and b. 39 00:01:53,740 --> 00:01:57,360 And remember that in Python you can use syntax like this-- 40 00:01:57,360 --> 00:02:02,710 s and then in square brackets i colon j to get the substring of the string 41 00:02:02,710 --> 00:02:08,979 s from index i in the string, up to but not including, index j in the string. 42 00:02:08,979 --> 00:02:11,680 So that might be helpful for extracting substrings. 43 00:02:11,680 --> 00:02:15,160 And you may want to write some sort of helper function that can take a string 44 00:02:15,160 --> 00:02:19,150 and get you all of the substrings of a particular length, which may be helpful 45 00:02:19,150 --> 00:02:23,050 if you think about extracting substrings from both of your original string 46 00:02:23,050 --> 00:02:24,677 inputs. 47 00:02:24,677 --> 00:02:27,760 As in the prior two functions, make sure that at the end of your substring 48 00:02:27,760 --> 00:02:32,860 function, you return a list of all of the matches, the substrings of length 49 00:02:32,860 --> 00:02:36,670 n, that are present both in string a and string b, 50 00:02:36,670 --> 00:02:39,820 making sure as before to avoid duplicate substrings that 51 00:02:39,820 --> 00:02:41,920 are present in that final list. 52 00:02:41,920 --> 00:02:44,049 After that, you'll be done with writing substrings 53 00:02:44,049 --> 00:02:46,090 and as a result, done with three of the functions 54 00:02:46,090 --> 00:02:49,830 that you'll have to implement inside of helpers.pi. 55 00:02:49,830 --> 00:02:51,408