1 00:00:00,000 --> 00:00:00,500 2 00:00:00,500 --> 00:00:02,750 BRIAN YU: Let's start by taking a look at lines. 3 00:00:02,750 --> 00:00:05,660 In this part of the problem, here's what you'll have to do. 4 00:00:05,660 --> 00:00:08,180 First, the lines function is a function defined 5 00:00:08,180 --> 00:00:14,010 in helpers.pi which takes in inputs a and b, each of which are strings. 6 00:00:14,010 --> 00:00:16,760 The first thing you'll have to do is take each of those strings 7 00:00:16,760 --> 00:00:19,490 and split them up into individual lines. 8 00:00:19,490 --> 00:00:25,070 Then you'll need to figure out which lines appear both in a and in b. 9 00:00:25,070 --> 00:00:27,920 And then finally, you'll want to return a list that 10 00:00:27,920 --> 00:00:30,950 contains all of the lines that are present in both string 11 00:00:30,950 --> 00:00:34,040 a and also string b. 12 00:00:34,040 --> 00:00:36,860 So let's take a look at what that would actually look like. 13 00:00:36,860 --> 00:00:41,510 A string might look something like this, line one backslash n, line 2 backslash 14 00:00:41,510 --> 00:00:45,710 n, line 3 where the backslash n stands for new lines. 15 00:00:45,710 --> 00:00:48,140 If we want to take a string like this and split it up 16 00:00:48,140 --> 00:00:50,150 into lines, what that means is that we want 17 00:00:50,150 --> 00:00:55,580 to convert this string into separate lines, line one, line two, and line 18 00:00:55,580 --> 00:00:56,390 three. 19 00:00:56,390 --> 00:00:59,300 That way we can compare between file a and file 20 00:00:59,300 --> 00:01:03,860 b, which lines are present in both of those files. 21 00:01:03,860 --> 00:01:07,070 Know that Python strings do support methods, which 22 00:01:07,070 --> 00:01:09,440 may help you in the process of trying to take a string 23 00:01:09,440 --> 00:01:12,740 and extract out all of the lines that appear in that string. 24 00:01:12,740 --> 00:01:16,100 And you can go to this URL, part of Python's documentation, 25 00:01:16,100 --> 00:01:19,520 to look at the methods that you can use with Python strings. 26 00:01:19,520 --> 00:01:22,340 And some of them may help you to figure out how to take a string 27 00:01:22,340 --> 00:01:25,310 and split it up into its individual lines. 28 00:01:25,310 --> 00:01:29,810 After you've taken both a and b and split them up into their lines, 29 00:01:29,810 --> 00:01:31,640 the next step is going to be to figure out 30 00:01:31,640 --> 00:01:37,010 which lines appear in both the lines of a and also the lines of b. 31 00:01:37,010 --> 00:01:39,990 So how do we find the lines that are going to be in common? 32 00:01:39,990 --> 00:01:42,740 Well, you'll probably want to consider some sort of data structure 33 00:01:42,740 --> 00:01:45,590 to keep track of which lines are present in both. 34 00:01:45,590 --> 00:01:48,920 You might consider a list which just stores elements in order 35 00:01:48,920 --> 00:01:52,400 Or you might want to consider a set which stores elements not in order 36 00:01:52,400 --> 00:01:54,350 and just stores a collection of them. 37 00:01:54,350 --> 00:01:57,170 Or you might want to consider some other data structure altogether. 38 00:01:57,170 --> 00:01:58,431 It's up to you. 39 00:01:58,431 --> 00:02:01,430 But you might want to take a look at this part of Python's documentation 40 00:02:01,430 --> 00:02:04,460 which explains some of the common Python data structures that 41 00:02:04,460 --> 00:02:07,820 might prove helpful to you as you think about what algorithm to use 42 00:02:07,820 --> 00:02:09,830 and what data structure to consider, as you 43 00:02:09,830 --> 00:02:14,900 think about how to find which lines are present in both a and b. 44 00:02:14,900 --> 00:02:18,410 Key things to make sure of are that you are avoiding duplicate lines. 45 00:02:18,410 --> 00:02:23,150 If, for example, the same line appears twice in file a and three times in file 46 00:02:23,150 --> 00:02:26,997 b, you only want that to appear one time in your resulting list. 47 00:02:26,997 --> 00:02:29,330 Because the list that you return at the end of the lines 48 00:02:29,330 --> 00:02:31,640 function should be a list of all of the lines 49 00:02:31,640 --> 00:02:33,230 that the two files have in common. 50 00:02:33,230 --> 00:02:37,790 But each one of those lines in common should only appear once at maximum. 51 00:02:37,790 --> 00:02:40,130 So be sure to avoid duplicates there and think 52 00:02:40,130 --> 00:02:42,260 about how you might make sure your algorithm is 53 00:02:42,260 --> 00:02:45,070 careful to avoid that situation. 54 00:02:45,070 --> 00:02:46,021